Terraform Module: AWS Auto Scaling Group — launch-template-driven fleets that self-heal and scale

Quick take — Reusable Terraform module for AWS Auto Scaling Groups on hashicorp/aws ~> 5.0: launch template, mixed instances policy, instance refresh, target tracking scaling, and ELB health checks. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "auto_scaling_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-auto-scaling-group?ref=v1.0.0"

  name               = "..."           # Base name for the ASG, launch template, and propagated …
  ami_id             = "..."           # AMI ID for launched instances.
  subnet_ids         = ["...", "..."]  # Subnets the ASG launches into; span >=2 AZs.
  security_group_ids = ["...", "..."]  # Security groups attached to instances.
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

An AWS Auto Scaling Group (ASG) keeps a fleet of EC2 instances at a desired size across one or more Availability Zones. It launches replacements when an instance fails a health check, distributes capacity across subnets, registers instances with load balancer target groups, and grows or shrinks the fleet in response to scaling policies. The ASG is the unit AWS uses to deliver elasticity and self-healing for stateful and stateless EC2 workloads alike.

The raw aws_autoscaling_group resource is deceptively simple to declare but tedious to get right in production. You also need a versioned launch template (never the deprecated launch configuration), a mixed instances policy if you want to blend On-Demand and Spot across instance types, an instance refresh block so AMI rollouts don’t require manual recycling, target tracking scaling policies so the group reacts to real load, and propagated tags so every launched instance is discoverable and billable. Hand-rolling all of that in every stack leads to drift: one team forgets instance_refresh, another health-checks on EC2 instead of ELB, a third forgets capacity_rebalance and gets surprised by Spot interruptions.

Wrapping the ASG in a reusable module fixes the contract once. The module exposes a small, var-driven surface — subnets, sizing, instance types, scaling target — and encodes the safe defaults (rolling instance refresh, ELB health checks with a grace period, Spot rebalancing, tag propagation) so every consuming team gets a production-grade fleet without copy-pasting 150 lines of HCL.

When to use it

You run stateless EC2 fleets (web tiers, API workers, batch consumers) behind an ALB/NLB and want them to self-heal and scale on CPU or request count.
You want to blend Spot and On-Demand capacity across multiple instance types to cut cost while keeping a guaranteed On-Demand baseline.
You need zero-downtime AMI rollouts via instance refresh instead of manually terminating instances.
You are standardizing many similar fleets across environments and teams and want one audited module rather than per-stack ASG blocks.

Reach for a different tool when the workload is a better fit for ECS/EKS (containers), Fargate (serverless containers), or Lambda (event-driven). This module is for when you genuinely need EC2 instances under your control.

Module structure

terraform-module-aws-auto-scaling-group/
├── versions.tf
├── main.tf
├── variables.tf
└── outputs.tf

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  name = var.name

  # Tags that must be stamped onto every launched instance, not just the ASG.
  propagated_tags = merge(
    var.tags,
    {
      Name      = local.name
      ManagedBy = "terraform"
    },
  )
}

# Versioned launch template — the modern replacement for launch configurations.
resource "aws_launch_template" "this" {
  name_prefix   = "${local.name}-"
  image_id      = var.ami_id
  instance_type = var.instance_type

  vpc_security_group_ids = var.security_group_ids

  dynamic "iam_instance_profile" {
    for_each = var.iam_instance_profile_arn == null ? [] : [1]
    content {
      arn = var.iam_instance_profile_arn
    }
  }

  user_data = var.user_data_base64

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required" # enforce IMDSv2
    http_put_response_hop_limit = 1
  }

  monitoring {
    enabled = var.detailed_monitoring
  }

  dynamic "block_device_mappings" {
    for_each = var.root_block_device == null ? [] : [var.root_block_device]
    content {
      device_name = block_device_mappings.value.device_name
      ebs {
        volume_size           = block_device_mappings.value.volume_size
        volume_type           = block_device_mappings.value.volume_type
        encrypted             = true
        delete_on_termination = true
      }
    }
  }

  tag_specifications {
    resource_type = "instance"
    tags          = local.propagated_tags
  }

  tag_specifications {
    resource_type = "volume"
    tags          = local.propagated_tags
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "this" {
  name_prefix         = "${local.name}-"
  vpc_zone_identifier = var.subnet_ids

  min_size         = var.min_size
  max_size         = var.max_size
  desired_capacity = var.desired_capacity

  health_check_type         = var.health_check_type
  health_check_grace_period = var.health_check_grace_period
  default_cooldown          = var.default_cooldown
  capacity_rebalance        = var.capacity_rebalance

  target_group_arns = var.target_group_arns

  # When no Spot blend is requested, use the launch template directly.
  dynamic "launch_template" {
    for_each = length(var.spot_instance_types) == 0 ? [1] : []
    content {
      id      = aws_launch_template.this.id
      version = aws_launch_template.this.latest_version
    }
  }

  # Mixed instances policy: guaranteed On-Demand base + Spot for the rest.
  dynamic "mixed_instances_policy" {
    for_each = length(var.spot_instance_types) == 0 ? [] : [1]
    content {
      instances_distribution {
        on_demand_base_capacity                  = var.on_demand_base_capacity
        on_demand_percentage_above_base_capacity = var.on_demand_percentage_above_base
        spot_allocation_strategy                 = "price-capacity-optimized"
      }

      launch_template {
        launch_template_specification {
          launch_template_id = aws_launch_template.this.id
          version            = aws_launch_template.this.latest_version
        }

        dynamic "override" {
          for_each = var.spot_instance_types
          content {
            instance_type = override.value
          }
        }
      }
    }
  }

  # Zero-downtime AMI / launch-template rollouts.
  dynamic "instance_refresh" {
    for_each = var.enable_instance_refresh ? [1] : []
    content {
      strategy = "Rolling"
      preferences {
        min_healthy_percentage = var.refresh_min_healthy_percentage
        instance_warmup        = var.health_check_grace_period
      }
      triggers = ["launch_template"]
    }
  }

  dynamic "tag" {
    for_each = local.propagated_tags
    content {
      key                 = tag.key
      value               = tag.value
      propagate_at_launch = true
    }
  }

  lifecycle {
    create_before_destroy = true
    # desired_capacity drifts at runtime once scaling policies act on it.
    ignore_changes = [desired_capacity]
  }
}

# Target tracking scaling policy — the ASG keeps the metric at the target value.
resource "aws_autoscaling_policy" "target_tracking" {
  count = var.enable_target_tracking ? 1 : 0

  name                   = "${local.name}-target-tracking"
  autoscaling_group_name = aws_autoscaling_group.this.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = var.target_tracking_metric
    }
    target_value = var.target_tracking_value
  }
}

variables.tf

variable "name" {
  description = "Base name used for the ASG, launch template, and propagated Name tag."
  type        = string

  validation {
    condition     = can(regex("^[a-z0-9-]{1,200}$", var.name))
    error_message = "name must be lowercase alphanumeric/hyphens, max 200 chars."
  }
}

variable "ami_id" {
  description = "AMI ID for launched instances (e.g. an Amazon Linux 2023 AMI)."
  type        = string

  validation {
    condition     = can(regex("^ami-[0-9a-f]{8,17}$", var.ami_id))
    error_message = "ami_id must be a valid ami-xxxx identifier."
  }
}

variable "instance_type" {
  description = "Primary instance type for the launch template (used as Spot override base)."
  type        = string
  default     = "t3.medium"
}

variable "subnet_ids" {
  description = "Subnet IDs the ASG launches into; span >=2 AZs for resilience."
  type        = list(string)

  validation {
    condition     = length(var.subnet_ids) >= 1
    error_message = "Provide at least one subnet; two or more AZs are strongly recommended."
  }
}

variable "security_group_ids" {
  description = "Security group IDs attached to launched instances."
  type        = list(string)
}

variable "min_size" {
  description = "Minimum number of instances in the group."
  type        = number
  default     = 2
}

variable "max_size" {
  description = "Maximum number of instances in the group."
  type        = number
  default     = 6
}

variable "desired_capacity" {
  description = "Initial desired capacity. Drift is ignored after creation (scaling owns it)."
  type        = number
  default     = 2
}

variable "health_check_type" {
  description = "Health check source: EC2 or ELB. Use ELB when behind a target group."
  type        = string
  default     = "ELB"

  validation {
    condition     = contains(["EC2", "ELB"], var.health_check_type)
    error_message = "health_check_type must be EC2 or ELB."
  }
}

variable "health_check_grace_period" {
  description = "Seconds to wait after launch before health checks count (and refresh warmup)."
  type        = number
  default     = 300
}

variable "default_cooldown" {
  description = "Seconds between scaling activities for simple/step policies."
  type        = number
  default     = 300
}

variable "capacity_rebalance" {
  description = "Proactively replace Spot instances flagged for interruption."
  type        = bool
  default     = true
}

variable "target_group_arns" {
  description = "ALB/NLB target group ARNs to register instances with."
  type        = list(string)
  default     = []
}

variable "iam_instance_profile_arn" {
  description = "ARN of an IAM instance profile to attach to instances. Null to omit."
  type        = string
  default     = null
}

variable "user_data_base64" {
  description = "Base64-encoded user data / cloud-init for instance bootstrap. Null to omit."
  type        = string
  default     = null
}

variable "detailed_monitoring" {
  description = "Enable 1-minute CloudWatch detailed monitoring on instances."
  type        = bool
  default     = true
}

variable "root_block_device" {
  description = "Optional root EBS volume config; always encrypted when set."
  type = object({
    device_name = string
    volume_size = number
    volume_type = string
  })
  default = null
}

variable "spot_instance_types" {
  description = "Instance types for the mixed instances policy. Empty list = no Spot blend."
  type        = list(string)
  default     = []
}

variable "on_demand_base_capacity" {
  description = "Guaranteed On-Demand instances before Spot is used (mixed policy only)."
  type        = number
  default     = 1
}

variable "on_demand_percentage_above_base" {
  description = "Percent of capacity above the base that is On-Demand (0-100)."
  type        = number
  default     = 0

  validation {
    condition     = var.on_demand_percentage_above_base >= 0 && var.on_demand_percentage_above_base <= 100
    error_message = "on_demand_percentage_above_base must be between 0 and 100."
  }
}

variable "enable_instance_refresh" {
  description = "Enable rolling instance refresh triggered by launch template changes."
  type        = bool
  default     = true
}

variable "refresh_min_healthy_percentage" {
  description = "Minimum healthy percentage to keep in service during instance refresh."
  type        = number
  default     = 90
}

variable "enable_target_tracking" {
  description = "Create a target tracking scaling policy."
  type        = bool
  default     = true
}

variable "target_tracking_metric" {
  description = "Predefined metric for target tracking (e.g. ASGAverageCPUUtilization)."
  type        = string
  default     = "ASGAverageCPUUtilization"

  validation {
    condition = contains([
      "ASGAverageCPUUtilization",
      "ASGAverageNetworkIn",
      "ASGAverageNetworkOut",
      "ALBRequestCountPerTarget",
    ], var.target_tracking_metric)
    error_message = "Unsupported predefined metric type."
  }
}

variable "target_tracking_value" {
  description = "Target value the scaling policy holds the metric at (e.g. 50.0 for 50% CPU)."
  type        = number
  default     = 50
}

variable "tags" {
  description = "Tags applied to the ASG and propagated to all launched instances/volumes."
  type        = map(string)
  default     = {}
}

outputs.tf

output "autoscaling_group_id" {
  description = "ID of the Auto Scaling Group."
  value       = aws_autoscaling_group.this.id
}

output "autoscaling_group_name" {
  description = "Generated name of the Auto Scaling Group."
  value       = aws_autoscaling_group.this.name
}

output "autoscaling_group_arn" {
  description = "ARN of the Auto Scaling Group."
  value       = aws_autoscaling_group.this.arn
}

output "launch_template_id" {
  description = "ID of the launch template backing the group."
  value       = aws_launch_template.this.id
}

output "launch_template_latest_version" {
  description = "Latest version number of the launch template."
  value       = aws_launch_template.this.latest_version
}

output "target_tracking_policy_arn" {
  description = "ARN of the target tracking scaling policy, if enabled."
  value       = try(aws_autoscaling_policy.target_tracking[0].arn, null)
}

How to use it

module "auto_scaling_group" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-auto-scaling-group?ref=v1.0.0"

  name               = "web-api-prod"
  ami_id             = data.aws_ami.al2023.id
  instance_type      = "m6i.large"
  subnet_ids         = module.vpc.private_subnet_ids
  security_group_ids = [aws_security_group.web.id]

  min_size         = 3
  max_size         = 20
  desired_capacity = 3

  # Register with the ALB and health-check through it.
  health_check_type = "ELB"
  target_group_arns = [aws_lb_target_group.web.arn]

  # Blend Spot across types with a guaranteed On-Demand base of 2.
  spot_instance_types             = ["m6i.large", "m5.large", "m6a.large"]
  on_demand_base_capacity         = 2
  on_demand_percentage_above_base = 0

  iam_instance_profile_arn = aws_iam_instance_profile.web.arn
  user_data_base64         = base64encode(templatefile("${path.module}/bootstrap.sh.tftpl", {}))

  # Hold average CPU at 55% and roll new AMIs without downtime.
  enable_target_tracking = true
  target_tracking_metric = "ASGAverageCPUUtilization"
  target_tracking_value  = 55

  tags = {
    Environment = "production"
    Team        = "platform"
    CostCenter  = "cc-1042"
  }
}

# Downstream reference: alarm on the group name produced by the module.
resource "aws_cloudwatch_metric_alarm" "asg_low_healthy_hosts" {
  alarm_name          = "${module.auto_scaling_group.autoscaling_group_name}-low-healthy"
  namespace           = "AWS/AutoScaling"
  metric_name         = "GroupInServiceInstances"
  statistic           = "Minimum"
  comparison_operator = "LessThanThreshold"
  threshold           = 2
  period              = 60
  evaluation_periods  = 3

  dimensions = {
    AutoScalingGroupName = module.auto_scaling_group.autoscaling_group_name
  }

  alarm_actions = [aws_sns_topic.ops_alerts.arn]
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module config — live/prod/auto_scaling_group/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-auto-scaling-group?ref=v1.0.0"
}

inputs = {
  name = "..."
  ami_id = "..."
  subnet_ids = ["...", "..."]
  security_group_ids = ["...", "..."]
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/auto_scaling_group && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
name	string	—	yes	Base name for the ASG, launch template, and propagated Name tag.
ami_id	string	—	yes	AMI ID for launched instances.
instance_type	string	`"t3.medium"`	no	Primary instance type / Spot override base.
subnet_ids	list(string)	—	yes	Subnets the ASG launches into; span >=2 AZs.
security_group_ids	list(string)	—	yes	Security groups attached to instances.
min_size	number	`2`	no	Minimum instances in the group.
max_size	number	`6`	no	Maximum instances in the group.
desired_capacity	number	`2`	no	Initial desired capacity (drift ignored after creation).
health_check_type	string	`"ELB"`	no	Health check source: `EC2` or `ELB`.
health_check_grace_period	number	`300`	no	Grace seconds after launch (also refresh warmup).
default_cooldown	number	`300`	no	Seconds between scaling activities.
capacity_rebalance	bool	`true`	no	Proactively replace Spot instances flagged for interruption.
target_group_arns	list(string)	`[]`	no	ALB/NLB target group ARNs to register with.
iam_instance_profile_arn	string	`null`	no	IAM instance profile ARN; null to omit.
user_data_base64	string	`null`	no	Base64-encoded bootstrap user data.
detailed_monitoring	bool	`true`	no	Enable 1-minute CloudWatch detailed monitoring.
root_block_device	object	`null`	no	Optional encrypted root EBS volume config.
spot_instance_types	list(string)	`[]`	no	Instance types for the mixed instances policy; empty = no Spot.
on_demand_base_capacity	number	`1`	no	Guaranteed On-Demand instances before Spot (mixed policy).
on_demand_percentage_above_base	number	`0`	no	Percent On-Demand above the base (0-100).
enable_instance_refresh	bool	`true`	no	Enable rolling instance refresh on launch template change.
refresh_min_healthy_percentage	number	`90`	no	Min healthy percentage kept in service during refresh.
enable_target_tracking	bool	`true`	no	Create a target tracking scaling policy.
target_tracking_metric	string	`"ASGAverageCPUUtilization"`	no	Predefined metric for target tracking.
target_tracking_value	number	`50`	no	Target metric value the policy holds.
tags	map(string)	`{}`	no	Tags on the ASG, propagated to instances and volumes.

Outputs

Name	Description
autoscaling_group_id	ID of the Auto Scaling Group.
autoscaling_group_name	Generated name of the Auto Scaling Group.
autoscaling_group_arn	ARN of the Auto Scaling Group.
launch_template_id	ID of the launch template backing the group.
launch_template_latest_version	Latest version number of the launch template.
target_tracking_policy_arn	ARN of the target tracking scaling policy, if enabled.

Enterprise scenario

A retail platform runs its product-catalog API as a stateless EC2 tier behind an ALB across three Availability Zones. Using this module, the platform team sets on_demand_base_capacity = 4 to guarantee a baseline that survives a full Spot drain, then blends three m6i/m5/m6a Spot types for the elastic headroom — cutting steady-state compute cost by roughly 60% during normal traffic. Target tracking holds CPU at 55%, so the fleet scales from 4 to 30 instances automatically during flash sales, and capacity_rebalance plus rolling instance refresh let the team ship a patched AMI mid-quarter with zero customer-facing downtime.

Best practices

Always health-check through the load balancer (ELB) for web tiers and set health_check_grace_period longer than your boot + warm-up time, or the ASG will kill healthy instances mid-bootstrap and thrash.
Keep a non-zero On-Demand base when using Spot. on_demand_base_capacity guarantees a floor that survives a Spot capacity event; pair it with capacity_rebalance = true so the ASG replaces interruption-flagged Spot instances before they’re reclaimed.
Let scaling own desired_capacity. The module’s ignore_changes = [desired_capacity] prevents every terraform apply from snapping the fleet back to its initial size and undoing autoscaling — never remove it.
Roll AMIs via instance refresh, not manual termination. triggers = ["launch_template"] with min_healthy_percentage = 90 gives gradual, safe replacement; lower the percentage only for large fleets that can absorb more churn.
Enforce IMDSv2 and encrypt root volumes (both baked into the launch template here) so leaked SSRF paths can’t harvest instance credentials and data-at-rest stays compliant.
Propagate consistent tags at launch (Environment, Team, CostCenter) so every instance is attributable for cost allocation and discoverable by ops tooling — untagged ASG instances are a recurring chargeback and incident-response gap.