Terraform Module: AWS Cloud Map — service discovery your apps can resolve by name

Quick take — A reusable Terraform module for AWS Cloud Map: provisions a private DNS namespace plus an aws_service_discovery_service with SRV/A records, configurable routing policy, and Route 53 health-check wiring for ECS and EC2 workloads. New here? Jump to the Quickstart below to deploy it in minutes; read on for how it works and when to reach for it.

Quickstart (copy-paste)

Minimal, runnable configuration — drop this in a .tf file and fill in the "..." placeholders (each required input is commented):

provider "aws" {
  region = "us-east-1"
}

module "cloud_map" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloud-map?ref=v1.0.0"

  namespace_name = "..."  # Private DNS namespace name; suffix of every discoverabl…
  vpc_id         = "..."  # VPC the private hosted zone is associated with; names r…
}

Then terraform init && terraform apply. Every other input has a sensible default — see Inputs below to override behaviour.

What this module is

AWS Cloud Map is a service-discovery registry: instead of hardcoding IPs or load-balancer DNS names, your applications register the location of each backend (a task IP, an EC2 instance, an arbitrary endpoint) under a friendly name like payments.internal, and consumers resolve that name to a healthy instance at call time. A namespace is the discovery domain — a private DNS namespace (aws_service_discovery_private_dns_namespace) creates a Route 53 private hosted zone scoped to a VPC, so the names only resolve inside that network. A service (aws_service_discovery_service) is the named entry inside the namespace; it defines what DNS records get published for each registered instance (A for plain IPs, SRV for IP plus port), the record TTL, the multivalue-vs-weighted routing policy, and how instance health is judged.

On their own these resources are a handful of nested blocks, but the details are exactly where teams trip: getting dns_records right for the SRV case, deciding between health_check_config (Route 53 public health checks) and health_check_custom_config (the only option ECS service discovery supports), and remembering that the routing policy is fixed at create time. This module wraps the namespace and one or more discovery services into a single opinionated unit. You pass a namespace name, a VPC ID, and a typed map of services; the module returns the namespace ID and a map of service ARNs you wire straight into an ECS service_registries block or pass to your own register-instance logic — so every team gets consistent, custom-health-checked, correctly-typed service discovery instead of copy-pasting nested blocks they only half understand.

When to use it

You run ECS services that need to talk to each other by name (orders calling inventory) and want Cloud Map service discovery instead of standing up an internal ALB per service.
You need SRV records so callers learn both the IP and the dynamic host port of a bridge-mode container or a multi-port task.
You want a private DNS namespace whose names resolve only inside the VPC, never on the public internet — internal service discovery without leaking topology.
You are registering non-ECS endpoints (EC2 instances, on-prem hosts, or external services) into a discovery registry that App Mesh, Lambda, or other services can query through the Cloud Map API.
You want guardrails: a forced choice between Route 53 and custom health checks, validated TTLs and record types, and one routing-policy decision made once per service.

If you only need one service behind a stable load-balancer DNS name and nothing else discovers it, an ALB alias record is simpler. Cloud Map earns its place the moment you have many services discovering each other, or you need SRV/port-aware records that DNS aliases cannot express.

Module structure

terraform-module-aws-cloud-map/
├── versions.tf      # provider + Terraform version pins
├── main.tf          # private DNS namespace + aws_service_discovery_service (for_each)
├── variables.tf     # namespace + services map inputs, with validation
└── outputs.tf       # namespace id/arn, hosted zone id, service arns/ids

versions.tf

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

main.tf

locals {
  tags = merge(
    {
      "Namespace" = var.namespace_name
      "ManagedBy" = "terraform"
      "Module"    = "terraform-module-aws-cloud-map"
    },
    var.tags
  )
}

# ---------------------------------------------------------------------------
# Private DNS namespace — backed by a Route 53 private hosted zone in the VPC.
# Names registered under it resolve only inside this VPC.
# ---------------------------------------------------------------------------
resource "aws_service_discovery_private_dns_namespace" "this" {
  name        = var.namespace_name
  description = var.namespace_description
  vpc         = var.vpc_id

  tags = local.tags
}

# ---------------------------------------------------------------------------
# Discovery services — one named entry per backend (orders, inventory, ...).
# Each publishes DNS records for every instance registered against it.
# ---------------------------------------------------------------------------
resource "aws_service_discovery_service" "this" {
  for_each = var.services

  name        = each.key
  description = try(each.value.description, null)

  dns_config {
    namespace_id   = aws_service_discovery_private_dns_namespace.this.id
    routing_policy = try(each.value.routing_policy, var.default_routing_policy)

    dns_records {
      type = each.value.record_type
      ttl  = try(each.value.ttl, var.default_ttl)
    }

    # SRV records additionally need a companion A/AAAA so the target resolves.
    dynamic "dns_records" {
      for_each = each.value.record_type == "SRV" ? [try(each.value.srv_companion_type, "A")] : []
      content {
        type = dns_records.value
        ttl  = try(each.value.ttl, var.default_ttl)
      }
    }
  }

  # ECS service discovery requires custom health checks: ECS reports task
  # health to Cloud Map, so leave use_route53_health_check = false for ECS.
  dynamic "health_check_custom_config" {
    for_each = try(each.value.use_route53_health_check, false) ? [] : [1]
    content {
      failure_threshold = try(each.value.failure_threshold, var.default_failure_threshold)
    }
  }

  # Route 53 health checks apply to instances registered with a public IP /
  # routable address (e.g. EC2), not to ECS-managed task registration.
  dynamic "health_check_config" {
    for_each = try(each.value.use_route53_health_check, false) ? [1] : []
    content {
      type              = try(each.value.health_check_protocol, "TCP")
      resource_path     = try(each.value.health_check_path, null)
      failure_threshold = try(each.value.failure_threshold, var.default_failure_threshold)
    }
  }

  force_destroy = var.force_destroy

  tags = local.tags
}

variables.tf

variable "namespace_name" {
  description = "Private DNS namespace (e.g. internal.kloudvin). Becomes the suffix of every discoverable name, like orders.internal.kloudvin."
  type        = string

  validation {
    condition     = can(regex("^([a-z0-9]([a-z0-9-]*[a-z0-9])?\\.)*[a-z0-9]([a-z0-9-]*[a-z0-9])?$", var.namespace_name))
    error_message = "namespace_name must be a valid lowercase DNS name (labels of letters/digits/hyphens, dot-separated)."
  }
}

variable "namespace_description" {
  description = "Description stored on the namespace."
  type        = string
  default     = "Service discovery namespace managed by Terraform"
}

variable "vpc_id" {
  description = "VPC the private DNS namespace (Route 53 private hosted zone) is associated with. Names resolve only inside this VPC."
  type        = string

  validation {
    condition     = can(regex("^vpc-[0-9a-f]{8,}$", var.vpc_id))
    error_message = "vpc_id must be a valid VPC ID, e.g. vpc-0a1b2c3d4e5f6a7b8."
  }
}

variable "default_routing_policy" {
  description = "Routing policy used for services that do not override it. MULTIVALUE returns up to 8 healthy instances; WEIGHTED returns one chosen by weight."
  type        = string
  default     = "MULTIVALUE"

  validation {
    condition     = contains(["MULTIVALUE", "WEIGHTED"], var.default_routing_policy)
    error_message = "default_routing_policy must be MULTIVALUE or WEIGHTED."
  }
}

variable "default_ttl" {
  description = "DNS record TTL (seconds) for services that do not set their own. Keep low so unhealthy instances drop out quickly."
  type        = number
  default     = 15

  validation {
    condition     = var.default_ttl >= 0 && var.default_ttl <= 86400
    error_message = "default_ttl must be between 0 and 86400 seconds."
  }
}

variable "default_failure_threshold" {
  description = "Consecutive health-check results required to change an instance's status (1-10)."
  type        = number
  default     = 1

  validation {
    condition     = var.default_failure_threshold >= 1 && var.default_failure_threshold <= 10
    error_message = "default_failure_threshold must be between 1 and 10."
  }
}

variable "services" {
  description = <<-EOT
    Map of discovery services keyed by the discoverable name (the DNS label
    under the namespace). For each service:
      record_type             - "A" (IP only) or "SRV" (IP + port; ECS bridge/multi-port).
      ttl                     - optional per-service TTL override.
      routing_policy          - optional "MULTIVALUE" or "WEIGHTED" override.
      srv_companion_type      - optional companion record for SRV ("A" or "AAAA"), default "A".
      use_route53_health_check- optional bool; false (default) = custom health (required for ECS).
      health_check_protocol   - "HTTP" | "HTTPS" | "TCP" when use_route53_health_check = true.
      health_check_path       - resource path for HTTP/HTTPS Route 53 health checks.
      failure_threshold       - optional per-service failure threshold (1-10).
      description             - optional service description.
    Example:
    {
      orders    = { record_type = "A" }
      inventory = { record_type = "SRV", ttl = 10 }
      legacy    = { record_type = "A", use_route53_health_check = true, health_check_protocol = "HTTP", health_check_path = "/healthz" }
    }
  EOT

  type = map(object({
    record_type              = string
    ttl                      = optional(number)
    routing_policy           = optional(string)
    srv_companion_type       = optional(string)
    use_route53_health_check = optional(bool)
    health_check_protocol    = optional(string)
    health_check_path        = optional(string)
    failure_threshold        = optional(number)
    description              = optional(string)
  }))

  default = {}

  validation {
    condition = alltrue([
      for s in values(var.services) : contains(["A", "AAAA", "SRV"], s.record_type)
    ])
    error_message = "Each service record_type must be one of A, AAAA, SRV."
  }

  validation {
    condition = alltrue([
      for s in values(var.services) :
      try(s.routing_policy, "MULTIVALUE") != "WEIGHTED" || s.record_type != "SRV"
    ])
    error_message = "WEIGHTED routing is not supported with SRV records; use MULTIVALUE for SRV services."
  }

  validation {
    condition = alltrue([
      for s in values(var.services) :
      !try(s.use_route53_health_check, false) || contains(["HTTP", "HTTPS", "TCP"], try(s.health_check_protocol, "TCP"))
    ])
    error_message = "When use_route53_health_check = true, health_check_protocol must be HTTP, HTTPS, or TCP."
  }
}

variable "force_destroy" {
  description = "Allow destroying a service that still has registered instances. Keep false in production so a destroy cannot orphan live traffic silently."
  type        = bool
  default     = false
}

variable "tags" {
  description = "Additional tags merged onto the namespace and all services."
  type        = map(string)
  default     = {}
}

outputs.tf

output "namespace_id" {
  description = "Cloud Map private DNS namespace ID. Pass to register-instance APIs and App Mesh virtual nodes."
  value       = aws_service_discovery_private_dns_namespace.this.id
}

output "namespace_arn" {
  description = "ARN of the private DNS namespace, for IAM policies scoping servicediscovery actions."
  value       = aws_service_discovery_private_dns_namespace.this.arn
}

output "namespace_name" {
  description = "The namespace DNS name (suffix of every discoverable service name)."
  value       = aws_service_discovery_private_dns_namespace.this.name
}

output "namespace_hosted_zone_id" {
  description = "Route 53 private hosted zone ID backing the namespace, for adding extra records or zone associations."
  value       = aws_service_discovery_private_dns_namespace.this.hosted_zone
}

output "service_arns" {
  description = "Map of service name to its Cloud Map service ARN — wire into an ECS service_registries block."
  value       = { for k, s in aws_service_discovery_service.this : k => s.arn }
}

output "service_ids" {
  description = "Map of service name to its Cloud Map service ID, used with the register-instance API."
  value       = { for k, s in aws_service_discovery_service.this : k => s.id }
}

output "service_fqdns" {
  description = "Map of service name to the fully-qualified discoverable name (e.g. orders.internal.kloudvin)."
  value = {
    for k, s in aws_service_discovery_service.this :
    k => "${s.name}.${aws_service_discovery_private_dns_namespace.this.name}"
  }
}

How to use it

# A private discovery namespace for internal microservices, with an A-record
# service for the orders API and an SRV service for a bridge-mode worker.
module "cloud_map" {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloud-map?ref=v1.0.0"

  namespace_name        = "internal.kloudvin"
  namespace_description = "Internal service discovery for the prod VPC"
  vpc_id                = module.network.vpc_id

  services = {
    orders = {
      record_type = "A"
      ttl         = 10
    }

    inventory = {
      record_type = "SRV"
      ttl         = 10
    }
  }

  tags = {
    Environment = "prod"
    CostCenter  = "platform"
  }
}

# Downstream: register an ECS service into the "orders" discovery service so
# tasks are published as orders.internal.kloudvin automatically.
resource "aws_ecs_service" "orders" {
  name            = "orders"
  cluster         = module.ecs_cluster_service.cluster_id
  task_definition = aws_ecs_task_definition.orders.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets         = module.network.private_subnet_ids
    security_groups = [aws_security_group.orders_tasks.id]
  }

  service_registries {
    registry_arn = module.cloud_map.service_arns["orders"]
  }
}

# Surface the resolvable name to other stacks / app config.
output "orders_discovery_fqdn" {
  value = module.cloud_map.service_fqdns["orders"]
}

With Terragrunt

Terragrunt keeps this module DRY across environments — define the backend and provider once in a root config, then a thin terragrunt.hcl per environment supplies only the inputs that differ.

1. Root config — live/terragrunt.hcl (inherited by every module):

remote_state {
  backend = "s3"
  generate = { path = "backend.tf", if_exists = "overwrite" }
  config = {
    # ...s3 state bucket/container + key per path...
  }
}

2. Module config — live/prod/cloud_map/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://dev.azure.com/teknohut/kloudvin/_git/terraform-modules//terraform-module-aws-cloud-map?ref=v1.0.0"
}

inputs = {
  namespace_name = "..."
  vpc_id = "..."
}

3. Deploy one environment, or roll out all modules together:

cd live/prod/cloud_map && terragrunt apply        # this module
terragrunt run-all apply                      # every module under live/prod

Why Terragrunt here: the backend and provider live in one place instead of being copy-pasted into every module; inputs is overridden per environment (dev / stage / prod) without forking the module; and run-all orchestrates dependencies across modules. Reach for it once you have more than one environment or more than a handful of modules — for a single stack, the plain Quickstart above is enough.

Inputs

Name	Type	Default	Required	Description
`namespace_name`	`string`	—	Yes	Private DNS namespace name; suffix of every discoverable name (validated DNS name).
`namespace_description`	`string`	`"Service discovery namespace managed by Terraform"`	No	Description stored on the namespace.
`vpc_id`	`string`	—	Yes	VPC the private hosted zone is associated with; names resolve only inside it (validated).
`default_routing_policy`	`string`	`"MULTIVALUE"`	No	Routing policy for services that do not override it: `MULTIVALUE` or `WEIGHTED`.
`default_ttl`	`number`	`15`	No	Default DNS record TTL in seconds (0–86400).
`default_failure_threshold`	`number`	`1`	No	Default consecutive results to flip instance status (1–10).
`services`	`map(object)`	`{}`	No	Discovery services keyed by discoverable name; sets `record_type` plus optional TTL, routing, SRV companion, and health-check fields. See the variable doc for the object shape.
`force_destroy`	`bool`	`false`	No	Allow destroying a service that still has registered instances. Keep `false` in production.
`tags`	`map(string)`	`{}`	No	Tags merged onto the namespace and all services.

Outputs

Name	Description
`namespace_id`	Cloud Map namespace ID, for register-instance APIs and App Mesh virtual nodes.
`namespace_arn`	ARN of the namespace, for IAM policies scoping `servicediscovery` actions.
`namespace_name`	The namespace DNS name (suffix of every discoverable service name).
`namespace_hosted_zone_id`	Route 53 private hosted zone ID backing the namespace.
`service_arns`	Map of service name to Cloud Map service ARN — wire into ECS `service_registries`.
`service_ids`	Map of service name to Cloud Map service ID, for the register-instance API.
`service_fqdns`	Map of service name to its fully-qualified discoverable name.

Enterprise scenario

A logistics platform runs roughly 30 Fargate microservices in a shared prod VPC and replaced a sprawl of internal ALBs (one per service, each ~$16/month plus LCU charges) with a single Cloud Map namespace from this module. Each service team adds one entry to the services map and an service_registries block, so orders reaches inventory.internal.kloudvin directly over the task ENI with a 10-second TTL — east-west traffic skips the load balancers entirely, cutting both the per-ALB cost and an extra network hop on every internal call. Because ECS reports task health into Cloud Map via the module’s health_check_custom_config, a failing task is pulled from DNS within seconds, and App Mesh sidecars consume the same namespace_id to build their service graph without any duplicated discovery config.

Best practices

Keep TTLs short (5–15s) for service discovery. Cloud Map updates records as instances register and deregister, but a long TTL means callers cache a dead IP after a deployment or task failure; a low TTL trades a little extra query volume for fast convergence.
Use health_check_custom_config for ECS, health_check_config for everything else. ECS service discovery only supports custom health checks (ECS reports task health to Cloud Map); Route 53 health checks suit EC2/on-prem instances with routable addresses. Picking the wrong one is the most common Cloud Map registration failure.
Pick the routing policy deliberately and once. MULTIVALUE returns up to eight healthy instances and lets the client load-balance; WEIGHTED returns a single instance by weight (handy for canaries) but is incompatible with SRV. The policy is immutable after create — changing it forces resource replacement.
Use private DNS namespaces for internal services so names never resolve publicly. A private namespace lives in a VPC-scoped Route 53 hosted zone; it keeps inventory.internal.kloudvin off the public internet and out of reach of anyone outside the VPC.
Reach internal-only AWS APIs over endpoints, and scope IAM tightly. Grant servicediscovery:RegisterInstance/DeregisterInstance only on the specific namespace/service ARNs (use the namespace_arn output), not *, so a compromised task cannot register rogue endpoints under your discovery names.
Name services as bare labels and let the namespace supply the suffix. Register orders, not orders.internal.kloudvin; the module composes the FQDN via service_fqdns, so the same service map is portable across dev/staging/prod namespaces with no per-environment string edits.