Terraform Data Sources Reading Existing Infrastructure

Terraform does not always start from a blank slate. Sometimes you need to reference infrastructure that already exists — created manually, by another team, or by a separate Terraform project. Data sources let you read that existing infrastructure and use its values in your own configuration without managing or owning it.

Resource vs Data Source

Understanding the difference between a resource and a data source is fundamental.

ConceptResource BlockData Source Block
PurposeCreates and manages infrastructureReads existing infrastructure
Keywordresourcedata
Terraform controls it?Yes — creates, updates, destroysNo — read-only, never modifies
ExampleCreating a new VPCLooking up an existing VPC by name

Analogy: The Library vs the Workshop

A resource block is your workshop — you build new things there. A data source block is the library — you go there to look up information that already exists. You do not change anything in the library; you just read what you need and bring it back to your workshop.

Data Source Syntax

data "PROVIDER_DATA_TYPE" "LOCAL_NAME" {
  filter_argument = value
}

You reference a data source's attributes using:

data.PROVIDER_DATA_TYPE.LOCAL_NAME.ATTRIBUTE

Example 1: Look Up an Existing AWS AMI

Instead of hard-coding an AMI ID (which changes per region and goes out of date), use a data source to dynamically find the latest Amazon Linux AMI.

data "aws_ami" "latest_amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_amazon_linux.id
  instance_type = "t3.micro"
}

Diagram: Data Source Flow

data "aws_ami" "latest_amazon_linux" { ... }
         |
         v
  Terraform asks AWS API:
  "Find the newest AMI matching these filters"
         |
         v
  AWS returns: ami-0abcdef1234567890
         |
         v
  Terraform stores this in:
  data.aws_ami.latest_amazon_linux.id
         |
         v
  Your resource uses it:
  ami = data.aws_ami.latest_amazon_linux.id

Example 2: Look Up an Existing VPC

Another team already created a shared VPC. You need to deploy resources into it. Use a data source to find it by tag.

data "aws_vpc" "shared_network" {
  filter {
    name   = "tag:Name"
    values = ["shared-production-vpc"]
  }
}

resource "aws_subnet" "app_tier" {
  vpc_id     = data.aws_vpc.shared_network.id
  cidr_block = "10.0.10.0/24"
}

Terraform reads the VPC from AWS, extracts its ID, and uses it when creating the subnet — without touching or managing the VPC itself.

Example 3: Look Up AWS Account ID

The aws_caller_identity data source returns information about the AWS credentials currently in use — including the account ID, user ID, and ARN. This is useful when building IAM policies or resource names that must include the account ID.

data "aws_caller_identity" "current" {}

output "account_id" {
  value = data.aws_caller_identity.current.account_id
}

Notice this data source has no filter arguments — it simply reads the active credentials.

Example 4: Read a File from Disk

The built-in local_file data source reads a file from your local filesystem. This is handy for injecting scripts or certificates into resources.

data "local_file" "startup_script" {
  filename = "${path.module}/scripts/startup.sh"
}

resource "aws_instance" "app" {
  ami           = data.aws_ami.latest_amazon_linux.id
  instance_type = "t3.micro"
  user_data     = data.local_file.startup_script.content
}

When to Use Data Sources

  • You need a value from infrastructure managed by another team or another Terraform project
  • You want the latest version of something (AMI, certificate, secret) rather than a hard-coded value
  • You need account-level or region-level metadata (account ID, availability zones, regions)
  • You want to reference a manually created resource without importing it into state

Key Points

  • Data sources read existing infrastructure without creating or modifying anything.
  • Use the data block keyword; reference results with data.TYPE.NAME.ATTRIBUTE.
  • Data sources are read at plan time and their values are available to all resources in the same configuration.
  • Use data sources to avoid hard-coding values like AMI IDs, VPC IDs, and account numbers.
  • Every provider that has resource types also has corresponding data source types documented in the registry.

Leave a Comment