IaC AWS

Terraform Module: AWS Auto Scaling Group — launch-template-driven fleets that self-heal and scale

Quick take — Reusable Terraform module for AWS Auto Scaling Groups on hashicorp/aws ~> 5.0: launch template, mixed instances policy, instance refresh, target tracking scaling, and ELB health checks. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "auto_scaling_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-auto-scaling-group?ref=v1.0.0"

  name               = "..."           # Base name for the ASG, launch template, and propagated …
  ami_id             = "..."           # AMI ID for launched instances.
  subnet_ids         = ["...", "..."]  # Subnets the ASG launches into; span >=2 AZs.
  security_group_ids = ["...", "..."]  # Security groups attached to instances.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An AWS Auto Scaling Group (ASG) keeps a fleet of EC2 instances at a desired size across one or more Availability Zones. It launches replacements when an instance fails a health check, distributes capacity across subnets, registers instances with load balancer target groups, and grows or shrinks the fleet in response to scaling policies. The ASG is the unit AWS uses to deliver elasticity and self-healing for stateful and stateless EC2 workloads alike.

The raw aws_autoscaling_group resource is deceptively simple to declare but tedious to get right in production. You also need a versioned launch template (never the deprecated launch configuration), a mixed instances policy if you want to blend On-Demand and Spot across instance types, an instance refresh block so AMI rollouts don’t require manual recycling, target tracking scaling policies so the group reacts to real load, and propagated tags so every launched instance is discoverable and billable. Hand-rolling all of that in every stack leads to drift: one team forgets instance_refresh, another health-checks on EC2 instead of ELB, a third forgets capacity_rebalance and gets surprised by Spot interruptions.

Wrapping the ASG in a reusable module fixes the contract once. The module exposes a small, var-driven surface — subnets, sizing, instance types, scaling target — and encodes the safe defaults (rolling instance refresh, ELB health checks with a grace period, Spot rebalancing, tag propagation) so every consuming team gets a production-grade fleet without copy-pasting 150 lines of HCL.

When to use it

Reach for a different tool when the workload is a better fit for ECS/EKS (containers), Fargate (serverless containers), or Lambda (event-driven). This module is for when you genuinely need EC2 instances under your control.

Module structure

terraform-module-aws-auto-scaling-group/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  name = var.name

  # Tags that must be stamped onto every launched instance, not just the ASG.
  propagated_tags = merge(
    var.tags,
    {
      Name      = local.name
      ManagedBy = "terraform"
    },
  )
}

# Versioned launch template — the modern replacement for launch configurations.
resource "aws_launch_template" "this" {
  name_prefix   = "${local.name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type

  vpc_security_group_ids = var.security_group_ids

  dynamic "iam_instance_profile" {
    for_each = var.iam_instance_profile_arn == null ? [] : [1]
    content {
      arn = var.iam_instance_profile_arn
    }
  }

  user_data = var.user_data_base64

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required" # enforce IMDSv2
    http_put_response_hop_limit = 1
  }

  monitoring {
    enabled = var.detailed_monitoring
  }

  dynamic "block_device_mappings" {
    for_each = var.root_block_device == null ? [] : [var.root_block_device]
    content {
      device_name = block_device_mappings.value.device_name
      ebs {
        volume_size           = block_device_mappings.value.volume_size
        volume_type           = block_device_mappings.value.volume_type
        encrypted             = true
        delete_on_termination = true
      }
    }
  }

  tag_specifications {
    resource_type = "instance"
    tags          = local.propagated_tags
  }

  tag_specifications {
    resource_type = "volume"
    tags          = local.propagated_tags
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "this" {
  name_prefix         = "${local.name}-"
  vpc_zone_identifier = var.subnet_ids

  min_size         = var.min_size
  max_size         = var.max_size
  desired_capacity = var.desired_capacity

  health_check_type         = var.health_check_type
  health_check_grace_period = var.health_check_grace_period
  default_cooldown          = var.default_cooldown
  capacity_rebalance        = var.capacity_rebalance

  target_group_arns = var.target_group_arns

  # When no Spot blend is requested, use the launch template directly.
  dynamic "launch_template" {
    for_each = length(var.spot_instance_types) == 0 ? [1] : []
    content {
      id      = aws_launch_template.this.id
      version = aws_launch_template.this.latest_version
    }
  }

  # Mixed instances policy: guaranteed On-Demand base + Spot for the rest.
  dynamic "mixed_instances_policy" {
    for_each = length(var.spot_instance_types) == 0 ? [] : [1]
    content {
      instances_distribution {
        on_demand_base_capacity                  = var.on_demand_base_capacity
        on_demand_percentage_above_base_capacity = var.on_demand_percentage_above_base
        spot_allocation_strategy                 = "price-capacity-optimized"
      }

      launch_template {
        launch_template_specification {
          launch_template_id = aws_launch_template.this.id
          version            = aws_launch_template.this.latest_version
        }

        dynamic "override" {
          for_each = var.spot_instance_types
          content {
            instance_type = override.value
          }
        }
      }
    }
  }

  # Zero-downtime AMI / launch-template rollouts.
  dynamic "instance_refresh" {
    for_each = var.enable_instance_refresh ? [1] : []
    content {
      strategy = "Rolling"
      preferences {
        min_healthy_percentage = var.refresh_min_healthy_percentage
        instance_warmup        = var.health_check_grace_period
      }
      triggers = ["launch_template"]
    }
  }

  dynamic "tag" {
    for_each = local.propagated_tags
    content {
      key                 = tag.key
      value               = tag.value
      propagate_at_launch = true
    }
  }

  lifecycle {
    create_before_destroy = true
    # desired_capacity drifts at runtime once scaling policies act on it.
    ignore_changes = [desired_capacity]
  }
}

# Target tracking scaling policy — the ASG keeps the metric at the target value.
resource "aws_autoscaling_policy" "target_tracking" {
  count = var.enable_target_tracking ? 1 : 0

  name                   = "${local.name}-target-tracking"
  autoscaling_group_name = aws_autoscaling_group.this.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = var.target_tracking_metric
    }
    target_value = var.target_tracking_value
  }
}

variables.tf

variable "name" {
  description = "Base name used for the ASG, launch template, and propagated Name tag."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9-]{1,200}$", var.name))
    error_message = "name must be lowercase alphanumeric/hyphens, max 200 chars."
  }
}

variable "ami_id" {
  description = "AMI ID for launched instances (e.g. an Amazon Linux 2023 AMI)."
  type        = string

  validation {
    condition     = can(regex("^ami-[0-9a-f]{8,17}$", var.ami_id))
    error_message = "ami_id must be a valid ami-xxxx identifier."
  }
}

variable "instance_type" {
  description = "Primary instance type for the launch template (used as Spot override base)."
  type        = string
  default     = "t3.medium"
}

variable "subnet_ids" {
  description = "Subnet IDs the ASG launches into; span >=2 AZs for resilience."
  type        = list(string)

  validation {
    condition     = length(var.subnet_ids) >= 1
    error_message = "Provide at least one subnet; two or more AZs are strongly recommended."
  }
}

variable "security_group_ids" {
  description = "Security group IDs attached to launched instances."
  type        = list(string)
}

variable "min_size" {
  description = "Minimum number of instances in the group."
  type        = number
  default     = 2
}

variable "max_size" {
  description = "Maximum number of instances in the group."
  type        = number
  default     = 6
}

variable "desired_capacity" {
  description = "Initial desired capacity. Drift is ignored after creation (scaling owns it)."
  type        = number
  default     = 2
}

variable "health_check_type" {
  description = "Health check source: EC2 or ELB. Use ELB when behind a target group."
  type        = string
  default     = "ELB"

  validation {
    condition     = contains(["EC2", "ELB"], var.health_check_type)
    error_message = "health_check_type must be EC2 or ELB."
  }
}

variable "health_check_grace_period" {
  description = "Seconds to wait after launch before health checks count (and refresh warmup)."
  type        = number
  default     = 300
}

variable "default_cooldown" {
  description = "Seconds between scaling activities for simple/step policies."
  type        = number
  default     = 300
}

variable "capacity_rebalance" {
  description = "Proactively replace Spot instances flagged for interruption."
  type        = bool
  default     = true
}

variable "target_group_arns" {
  description = "ALB/NLB target group ARNs to register instances with."
  type        = list(string)
  default     = []
}

variable "iam_instance_profile_arn" {
  description = "ARN of an IAM instance profile to attach to instances. Null to omit."
  type        = string
  default     = null
}

variable "user_data_base64" {
  description = "Base64-encoded user data / cloud-init for instance bootstrap. Null to omit."
  type        = string
  default     = null
}

variable "detailed_monitoring" {
  description = "Enable 1-minute CloudWatch detailed monitoring on instances."
  type        = bool
  default     = true
}

variable "root_block_device" {
  description = "Optional root EBS volume config; always encrypted when set."
  type = object({
    device_name = string
    volume_size = number
    volume_type = string
  })
  default = null
}

variable "spot_instance_types" {
  description = "Instance types for the mixed instances policy. Empty list = no Spot blend."
  type        = list(string)
  default     = []
}

variable "on_demand_base_capacity" {
  description = "Guaranteed On-Demand instances before Spot is used (mixed policy only)."
  type        = number
  default     = 1
}

variable "on_demand_percentage_above_base" {
  description = "Percent of capacity above the base that is On-Demand (0-100)."
  type        = number
  default     = 0

  validation {
    condition     = var.on_demand_percentage_above_base >= 0 && var.on_demand_percentage_above_base <= 100
    error_message = "on_demand_percentage_above_base must be between 0 and 100."
  }
}

variable "enable_instance_refresh" {
  description = "Enable rolling instance refresh triggered by launch template changes."
  type        = bool
  default     = true
}

variable "refresh_min_healthy_percentage" {
  description = "Minimum healthy percentage to keep in service during instance refresh."
  type        = number
  default     = 90
}

variable "enable_target_tracking" {
  description = "Create a target tracking scaling policy."
  type        = bool
  default     = true
}

variable "target_tracking_metric" {
  description = "Predefined metric for target tracking (e.g. ASGAverageCPUUtilization)."
  type        = string
  default     = "ASGAverageCPUUtilization"

  validation {
    condition = contains([
      "ASGAverageCPUUtilization",
      "ASGAverageNetworkIn",
      "ASGAverageNetworkOut",
      "ALBRequestCountPerTarget",
    ], var.target_tracking_metric)
    error_message = "Unsupported predefined metric type."
  }
}

variable "target_tracking_value" {
  description = "Target value the scaling policy holds the metric at (e.g. 50.0 for 50% CPU)."
  type        = number
  default     = 50
}

variable "tags" {
  description = "Tags applied to the ASG and propagated to all launched instances/volumes."
  type        = map(string)
  default     = {}
}

outputs.tf

output "autoscaling_group_id" {
  description = "ID of the Auto Scaling Group."
  value       = aws_autoscaling_group.this.id
}

output "autoscaling_group_name" {
  description = "Generated name of the Auto Scaling Group."
  value       = aws_autoscaling_group.this.name
}

output "autoscaling_group_arn" {
  description = "ARN of the Auto Scaling Group."
  value       = aws_autoscaling_group.this.arn
}

output "launch_template_id" {
  description = "ID of the launch template backing the group."
  value       = aws_launch_template.this.id
}

output "launch_template_latest_version" {
  description = "Latest version number of the launch template."
  value       = aws_launch_template.this.latest_version
}

output "target_tracking_policy_arn" {
  description = "ARN of the target tracking scaling policy, if enabled."
  value       = try(aws_autoscaling_policy.target_tracking[0].arn, null)
}

How to use it

module "auto_scaling_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-auto-scaling-group?ref=v1.0.0"

  name               = "web-api-prod"
  ami_id             = data.aws_ami.al2023.id
  instance_type      = "m6i.large"
  subnet_ids         = module.vpc.private_subnet_ids
  security_group_ids = [aws_security_group.web.id]

  min_size         = 3
  max_size         = 20
  desired_capacity = 3

  # Register with the ALB and health-check through it.
  health_check_type = "ELB"
  target_group_arns = [aws_lb_target_group.web.arn]

  # Blend Spot across types with a guaranteed On-Demand base of 2.
  spot_instance_types             = ["m6i.large", "m5.large", "m6a.large"]
  on_demand_base_capacity         = 2
  on_demand_percentage_above_base = 0

  iam_instance_profile_arn = aws_iam_instance_profile.web.arn
  user_data_base64         = base64encode(templatefile("${path.module}/bootstrap.sh.tftpl", {}))

  # Hold average CPU at 55% and roll new AMIs without downtime.
  enable_target_tracking = true
  target_tracking_metric = "ASGAverageCPUUtilization"
  target_tracking_value  = 55

  tags = {
    Environment = "production"
    Team        = "platform"
    CostCenter  = "cc-1042"
  }
}

# Downstream reference: alarm on the group name produced by the module.
resource "aws_cloudwatch_metric_alarm" "asg_low_healthy_hosts" {
  alarm_name          = "${module.auto_scaling_group.autoscaling_group_name}-low-healthy"
  namespace           = "AWS/AutoScaling"
  metric_name         = "GroupInServiceInstances"
  statistic           = "Minimum"
  comparison_operator = "LessThanThreshold"
  threshold           = 2
  period              = 60
  evaluation_periods  = 3

  dimensions = {
    AutoScalingGroupName = module.auto_scaling_group.autoscaling_group_name
  }

  alarm_actions = [aws_sns_topic.ops_alerts.arn]
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module configlive/prod/auto_scaling_group/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-auto-scaling-group?ref=v1.0.0"
}

inputs = {
  name = "..."
  ami_id = "..."
  subnet_ids = ["...", "..."]
  security_group_ids = ["...", "..."]
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/auto_scaling_group && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
name string yes Base name for the ASG, launch template, and propagated Name tag.
ami_id string yes AMI ID for launched instances.
instance_type string "t3.medium" no Primary instance type / Spot override base.
subnet_ids list(string) yes Subnets the ASG launches into; span >=2 AZs.
security_group_ids list(string) yes Security groups attached to instances.
min_size number 2 no Minimum instances in the group.
max_size number 6 no Maximum instances in the group.
desired_capacity number 2 no Initial desired capacity (drift ignored after creation).
health_check_type string "ELB" no Health check source: EC2 or ELB.
health_check_grace_period number 300 no Grace seconds after launch (also refresh warmup).
default_cooldown number 300 no Seconds between scaling activities.
capacity_rebalance bool true no Proactively replace Spot instances flagged for interruption.
target_group_arns list(string) [] no ALB/NLB target group ARNs to register with.
iam_instance_profile_arn string null no IAM instance profile ARN; null to omit.
user_data_base64 string null no Base64-encoded bootstrap user data.
detailed_monitoring bool true no Enable 1-minute CloudWatch detailed monitoring.
root_block_device object null no Optional encrypted root EBS volume config.
spot_instance_types list(string) [] no Instance types for the mixed instances policy; empty = no Spot.
on_demand_base_capacity number 1 no Guaranteed On-Demand instances before Spot (mixed policy).
on_demand_percentage_above_base number 0 no Percent On-Demand above the base (0-100).
enable_instance_refresh bool true no Enable rolling instance refresh on launch template change.
refresh_min_healthy_percentage number 90 no Min healthy percentage kept in service during refresh.
enable_target_tracking bool true no Create a target tracking scaling policy.
target_tracking_metric string "ASGAverageCPUUtilization" no Predefined metric for target tracking.
target_tracking_value number 50 no Target metric value the policy holds.
tags map(string) {} no Tags on the ASG, propagated to instances and volumes.

Outputs

Name Description
autoscaling_group_id ID of the Auto Scaling Group.
autoscaling_group_name Generated name of the Auto Scaling Group.
autoscaling_group_arn ARN of the Auto Scaling Group.
launch_template_id ID of the launch template backing the group.
launch_template_latest_version Latest version number of the launch template.
target_tracking_policy_arn ARN of the target tracking scaling policy, if enabled.

Enterprise scenario

A retail platform runs its product-catalog API as a stateless EC2 tier behind an ALB across three Availability Zones. Using this module, the platform team sets on_demand_base_capacity = 4 to guarantee a baseline that survives a full Spot drain, then blends three m6i/m5/m6a Spot types for the elastic headroom — cutting steady-state compute cost by roughly 60% during normal traffic. Target tracking holds CPU at 55%, so the fleet scales from 4 to 30 instances automatically during flash sales, and capacity_rebalance plus rolling instance refresh let the team ship a patched AMI mid-quarter with zero customer-facing downtime.

Best practices

TerraformAWSAuto Scaling GroupModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading