IaC AWS

Terraform Module: AWS Cloud Map — service discovery your apps can resolve by name

Quick take — A reusable Terraform module for AWS Cloud Map: provisions a private DNS namespace plus an aws_service_discovery_service with SRV/A records, configurable routing policy, and Route 53 health-check wiring for ECS and EC2 workloads. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "cloud_map" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloud-map?ref=v1.0.0"

  namespace_name = "..."  # Private DNS namespace name; suffix of every discoverabl…
  vpc_id         = "..."  # VPC the private hosted zone is associated with; names r…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AWS Cloud Map is a service-discovery registry: instead of hardcoding IPs or load-balancer DNS names, your applications register the location of each backend (a task IP, an EC2 instance, an arbitrary endpoint) under a friendly name like payments.internal, and consumers resolve that name to a healthy instance at call time. A namespace is the discovery domain — a private DNS namespace (aws_service_discovery_private_dns_namespace) creates a Route 53 private hosted zone scoped to a VPC, so the names only resolve inside that network. A service (aws_service_discovery_service) is the named entry inside the namespace; it defines what DNS records get published for each registered instance (A for plain IPs, SRV for IP plus port), the record TTL, the multivalue-vs-weighted routing policy, and how instance health is judged.

On their own these resources are a handful of nested blocks, but the details are exactly where teams trip: getting dns_records right for the SRV case, deciding between health_check_config (Route 53 public health checks) and health_check_custom_config (the only option ECS service discovery supports), and remembering that the routing policy is fixed at create time. This module wraps the namespace and one or more discovery services into a single opinionated unit. You pass a namespace name, a VPC ID, and a typed map of services; the module returns the namespace ID and a map of service ARNs you wire straight into an ECS service_registries block or pass to your own register-instance logic — so every team gets consistent, custom-health-checked, correctly-typed service discovery instead of copy-pasting nested blocks they only half understand.

When to use it

If you only need one service behind a stable load-balancer DNS name and nothing else discovers it, an ALB alias record is simpler. Cloud Map earns its place the moment you have many services discovering each other, or you need SRV/port-aware records that DNS aliases cannot express.

Module structure

terraform-module-aws-cloud-map/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # private DNS namespace + aws_service_discovery_service (for_each)
├── variables.tf     # namespace + services map inputs, with validation
└── outputs.tf       # namespace id/arn, hosted zone id, service arns/ids

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  tags = merge(
    {
      "Namespace" = var.namespace_name
      "ManagedBy" = "terraform"
      "Module"    = "terraform-module-aws-cloud-map"
    },
    var.tags
  )
}

# ---------------------------------------------------------------------------
# Private DNS namespace — backed by a Route 53 private hosted zone in the VPC.
# Names registered under it resolve only inside this VPC.
# ---------------------------------------------------------------------------
resource "aws_service_discovery_private_dns_namespace" "this" {
  name        = var.namespace_name
  description = var.namespace_description
  vpc         = var.vpc_id

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Discovery services — one named entry per backend (orders, inventory, ...).
# Each publishes DNS records for every instance registered against it.
# ---------------------------------------------------------------------------
resource "aws_service_discovery_service" "this" {
  for_each = var.services

  name        = each.key
  description = try(each.value.description, null)

  dns_config {
    namespace_id   = aws_service_discovery_private_dns_namespace.this.id
    routing_policy = try(each.value.routing_policy, var.default_routing_policy)

    dns_records {
      type = each.value.record_type
      ttl  = try(each.value.ttl, var.default_ttl)
    }

    # SRV records additionally need a companion A/AAAA so the target resolves.
    dynamic "dns_records" {
      for_each = each.value.record_type == "SRV" ? [try(each.value.srv_companion_type, "A")] : []
      content {
        type = dns_records.value
        ttl  = try(each.value.ttl, var.default_ttl)
      }
    }
  }

  # ECS service discovery requires custom health checks: ECS reports task
  # health to Cloud Map, so leave use_route53_health_check = false for ECS.
  dynamic "health_check_custom_config" {
    for_each = try(each.value.use_route53_health_check, false) ? [] : [1]
    content {
      failure_threshold = try(each.value.failure_threshold, var.default_failure_threshold)
    }
  }

  # Route 53 health checks apply to instances registered with a public IP /
  # routable address (e.g. EC2), not to ECS-managed task registration.
  dynamic "health_check_config" {
    for_each = try(each.value.use_route53_health_check, false) ? [1] : []
    content {
      type              = try(each.value.health_check_protocol, "TCP")
      resource_path     = try(each.value.health_check_path, null)
      failure_threshold = try(each.value.failure_threshold, var.default_failure_threshold)
    }
  }

  force_destroy = var.force_destroy

  tags = local.tags
}

variables.tf

variable "namespace_name" {
  description = "Private DNS namespace (e.g. internal.kloudvin). Becomes the suffix of every discoverable name, like orders.internal.kloudvin."
  type        = string

  validation {
    condition     = can(regex("^([a-z0-9]([a-z0-9-]*[a-z0-9])?\\.)*[a-z0-9]([a-z0-9-]*[a-z0-9])?$", var.namespace_name))
    error_message = "namespace_name must be a valid lowercase DNS name (labels of letters/digits/hyphens, dot-separated)."
  }
}

variable "namespace_description" {
  description = "Description stored on the namespace."
  type        = string
  default     = "Service discovery namespace managed by Terraform"
}

variable "vpc_id" {
  description = "VPC the private DNS namespace (Route 53 private hosted zone) is associated with. Names resolve only inside this VPC."
  type        = string

  validation {
    condition     = can(regex("^vpc-[0-9a-f]{8,}$", var.vpc_id))
    error_message = "vpc_id must be a valid VPC ID, e.g. vpc-0a1b2c3d4e5f6a7b8."
  }
}

variable "default_routing_policy" {
  description = "Routing policy used for services that do not override it. MULTIVALUE returns up to 8 healthy instances; WEIGHTED returns one chosen by weight."
  type        = string
  default     = "MULTIVALUE"

  validation {
    condition     = contains(["MULTIVALUE", "WEIGHTED"], var.default_routing_policy)
    error_message = "default_routing_policy must be MULTIVALUE or WEIGHTED."
  }
}

variable "default_ttl" {
  description = "DNS record TTL (seconds) for services that do not set their own. Keep low so unhealthy instances drop out quickly."
  type        = number
  default     = 15

  validation {
    condition     = var.default_ttl >= 0 && var.default_ttl <= 86400
    error_message = "default_ttl must be between 0 and 86400 seconds."
  }
}

variable "default_failure_threshold" {
  description = "Consecutive health-check results required to change an instance's status (1-10)."
  type        = number
  default     = 1

  validation {
    condition     = var.default_failure_threshold >= 1 && var.default_failure_threshold <= 10
    error_message = "default_failure_threshold must be between 1 and 10."
  }
}

variable "services" {
  description = <<-EOT
    Map of discovery services keyed by the discoverable name (the DNS label
    under the namespace). For each service:
      record_type             - "A" (IP only) or "SRV" (IP + port; ECS bridge/multi-port).
      ttl                     - optional per-service TTL override.
      routing_policy          - optional "MULTIVALUE" or "WEIGHTED" override.
      srv_companion_type      - optional companion record for SRV ("A" or "AAAA"), default "A".
      use_route53_health_check- optional bool; false (default) = custom health (required for ECS).
      health_check_protocol   - "HTTP" | "HTTPS" | "TCP" when use_route53_health_check = true.
      health_check_path       - resource path for HTTP/HTTPS Route 53 health checks.
      failure_threshold       - optional per-service failure threshold (1-10).
      description             - optional service description.
    Example:
    {
      orders    = { record_type = "A" }
      inventory = { record_type = "SRV", ttl = 10 }
      legacy    = { record_type = "A", use_route53_health_check = true, health_check_protocol = "HTTP", health_check_path = "/healthz" }
    }
  EOT

  type = map(object({
    record_type              = string
    ttl                      = optional(number)
    routing_policy           = optional(string)
    srv_companion_type       = optional(string)
    use_route53_health_check = optional(bool)
    health_check_protocol    = optional(string)
    health_check_path        = optional(string)
    failure_threshold        = optional(number)
    description              = optional(string)
  }))

  default = {}

  validation {
    condition = alltrue([
      for s in values(var.services) : contains(["A", "AAAA", "SRV"], s.record_type)
    ])
    error_message = "Each service record_type must be one of A, AAAA, SRV."
  }

  validation {
    condition = alltrue([
      for s in values(var.services) :
      try(s.routing_policy, "MULTIVALUE") != "WEIGHTED" || s.record_type != "SRV"
    ])
    error_message = "WEIGHTED routing is not supported with SRV records; use MULTIVALUE for SRV services."
  }

  validation {
    condition = alltrue([
      for s in values(var.services) :
      !try(s.use_route53_health_check, false) || contains(["HTTP", "HTTPS", "TCP"], try(s.health_check_protocol, "TCP"))
    ])
    error_message = "When use_route53_health_check = true, health_check_protocol must be HTTP, HTTPS, or TCP."
  }
}

variable "force_destroy" {
  description = "Allow destroying a service that still has registered instances. Keep false in production so a destroy cannot orphan live traffic silently."
  type        = bool
  default     = false
}

variable "tags" {
  description = "Additional tags merged onto the namespace and all services."
  type        = map(string)
  default     = {}
}

outputs.tf

output "namespace_id" {
  description = "Cloud Map private DNS namespace ID. Pass to register-instance APIs and App Mesh virtual nodes."
  value       = aws_service_discovery_private_dns_namespace.this.id
}

output "namespace_arn" {
  description = "ARN of the private DNS namespace, for IAM policies scoping servicediscovery actions."
  value       = aws_service_discovery_private_dns_namespace.this.arn
}

output "namespace_name" {
  description = "The namespace DNS name (suffix of every discoverable service name)."
  value       = aws_service_discovery_private_dns_namespace.this.name
}

output "namespace_hosted_zone_id" {
  description = "Route 53 private hosted zone ID backing the namespace, for adding extra records or zone associations."
  value       = aws_service_discovery_private_dns_namespace.this.hosted_zone
}

output "service_arns" {
  description = "Map of service name to its Cloud Map service ARN — wire into an ECS service_registries block."
  value       = { for k, s in aws_service_discovery_service.this : k => s.arn }
}

output "service_ids" {
  description = "Map of service name to its Cloud Map service ID, used with the register-instance API."
  value       = { for k, s in aws_service_discovery_service.this : k => s.id }
}

output "service_fqdns" {
  description = "Map of service name to the fully-qualified discoverable name (e.g. orders.internal.kloudvin)."
  value = {
    for k, s in aws_service_discovery_service.this :
    k => "${s.name}.${aws_service_discovery_private_dns_namespace.this.name}"
  }
}

How to use it

# A private discovery namespace for internal microservices, with an A-record
# service for the orders API and an SRV service for a bridge-mode worker.
module "cloud_map" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloud-map?ref=v1.0.0"

  namespace_name        = "internal.kloudvin"
  namespace_description = "Internal service discovery for the prod VPC"
  vpc_id                = module.network.vpc_id

  services = {
    orders = {
      record_type = "A"
      ttl         = 10
    }

    inventory = {
      record_type = "SRV"
      ttl         = 10
    }
  }

  tags = {
    Environment = "prod"
    CostCenter  = "platform"
  }
}

# Downstream: register an ECS service into the "orders" discovery service so
# tasks are published as orders.internal.kloudvin automatically.
resource "aws_ecs_service" "orders" {
  name            = "orders"
  cluster         = module.ecs_cluster_service.cluster_id
  task_definition = aws_ecs_task_definition.orders.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets         = module.network.private_subnet_ids
    security_groups = [aws_security_group.orders_tasks.id]
  }

  service_registries {
    registry_arn = module.cloud_map.service_arns["orders"]
  }
}

# Surface the resolvable name to other stacks / app config.
output "orders_discovery_fqdn" {
  value = module.cloud_map.service_fqdns["orders"]
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root configlive/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module configlive/prod/cloud_map/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloud-map?ref=v1.0.0"
}

inputs = {
  namespace_name = "..."
  vpc_id = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/cloud_map && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name Type Default Required Description
namespace_name string Yes Private DNS namespace name; suffix of every discoverable name (validated DNS name).
namespace_description string "Service discovery namespace managed by Terraform" No Description stored on the namespace.
vpc_id string Yes VPC the private hosted zone is associated with; names resolve only inside it (validated).
default_routing_policy string "MULTIVALUE" No Routing policy for services that do not override it: MULTIVALUE or WEIGHTED.
default_ttl number 15 No Default DNS record TTL in seconds (0–86400).
default_failure_threshold number 1 No Default consecutive results to flip instance status (1–10).
services map(object) {} No Discovery services keyed by discoverable name; sets record_type plus optional TTL, routing, SRV companion, and health-check fields. See the variable doc for the object shape.
force_destroy bool false No Allow destroying a service that still has registered instances. Keep false in production.
tags map(string) {} No Tags merged onto the namespace and all services.

Outputs

Name Description
namespace_id Cloud Map namespace ID, for register-instance APIs and App Mesh virtual nodes.
namespace_arn ARN of the namespace, for IAM policies scoping servicediscovery actions.
namespace_name The namespace DNS name (suffix of every discoverable service name).
namespace_hosted_zone_id Route 53 private hosted zone ID backing the namespace.
service_arns Map of service name to Cloud Map service ARN — wire into ECS service_registries.
service_ids Map of service name to Cloud Map service ID, for the register-instance API.
service_fqdns Map of service name to its fully-qualified discoverable name.

Enterprise scenario

A logistics platform runs roughly 30 Fargate microservices in a shared prod VPC and replaced a sprawl of internal ALBs (one per service, each ~$16/month plus LCU charges) with a single Cloud Map namespace from this module. Each service team adds one entry to the services map and an service_registries block, so orders reaches inventory.internal.kloudvin directly over the task ENI with a 10-second TTL — east-west traffic skips the load balancers entirely, cutting both the per-ALB cost and an extra network hop on every internal call. Because ECS reports task health into Cloud Map via the module’s health_check_custom_config, a failing task is pulled from DNS within seconds, and App Mesh sidecars consume the same namespace_id to build their service graph without any duplicated discovery config.

Best practices

TerraformAWSCloud MapModuleIaC
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading