Terraform Best Practices for Production Environments
Learning Terraform syntax is one thing. Running it reliably for production infrastructure used by real customers — with a team, at scale, over years — is another. This final topic brings together the principles, patterns, and habits that experienced Terraform practitioners apply to keep production infrastructure stable, secure, and maintainable.
1. Structure Your Repository Clearly
File organisation has a big impact on how easily a team can navigate and safely change infrastructure. Two widely used patterns are the flat layout and the module layout.
Recommended Repository Structure
infrastructure/
modules/ # Reusable building blocks
networking/
main.tf
variables.tf
outputs.tf
compute/
database/
environments/ # One directory per environment
dev/
main.tf # Calls shared modules
variables.tf
terraform.tfvars
backend.tf
staging/
main.tf
...
prod/
main.tf
...
Separate directories per environment make it impossible to accidentally apply dev changes to prod. Each environment has its own backend, its own state file, and its own variable values.
2. Always Pin Versions
Pin three things: Terraform version, provider versions, and module versions. Unpinned versions invite surprise breaking changes on terraform init.
terraform {
required_version = ">= 1.9.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.31"
}
}
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.2.0" # exact pin
}
3. Use Remote State with Encryption and Locking
Every production project must use remote state. Never leave state on a developer's machine. Encrypt the bucket, enable versioning, and enable state locking.
terraform {
backend "s3" {
bucket = "company-terraform-state-prod"
key = "infrastructure/prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
4. Never Commit Secrets
Add these lines to every Terraform project's .gitignore:
.terraform/ terraform.tfstate terraform.tfstate.backup *.tfvars # If any tfvars file might contain secrets .terraform.lock.hcl # Some teams commit this; some do not
Store secrets in a vault (AWS Secrets Manager, HashiCorp Vault) and read them at apply time with data sources. Use environment variables or CI/CD secret storage for credentials.
5. Review Every Plan Before Applying
Never run terraform apply without first reading the plan output carefully. In production, make apply approval a human gate in your CI/CD pipeline — the plan runs automatically, but a team member must review and approve it before apply executes.
Look specifically for:
- Unexpected destroy operations (a
-in the plan output) - Resource replacements (
-/+) on critical resources like databases - More changes than expected — someone else may have committed code you did not notice
6. Tag Every Resource
Tags on cloud resources are essential for cost allocation, security audits, and incident response. Define a standard tagging strategy and enforce it through a central locals block or a Sentinel policy.
locals {
required_tags = {
Environment = var.environment
Project = var.project_name
Owner = var.team_name
ManagedBy = "Terraform"
CostCenter = var.cost_center
}
}
Apply local.required_tags to every resource that supports tags. When an alert fires at 3 AM, tags tell you which team owns the resource, which environment it runs in, and which project it belongs to.
7. Use Modules for Every Repeatable Pattern
If you write the same group of resources more than once, make a module. A good module:
- Has a clear, single responsibility (networking, compute, database)
- Accepts all variable inputs it needs — no hard-coded values inside
- Exposes all useful outputs
- Has a
README.mdexplaining its inputs, outputs, and usage - Has tests that verify its core behaviour
8. Keep Modules Small and Focused
A module that creates everything — VPC, subnets, EC2, RDS, S3, IAM — is hard to test, hard to reuse, and dangerous to change. Break it into focused modules and compose them at the environment level.
Diagram: Composing Small Modules
prod/main.tf | |---> module "network" (creates VPC + subnets) |---> module "database" (creates RDS, uses network outputs) |---> module "compute" (creates EC2/ECS, uses network + db outputs) |---> module "monitoring" (creates CloudWatch dashboards + alarms)
9. Implement Drift Detection
Schedule a regular terraform plan run (daily is common) to detect drift — changes made to real infrastructure outside of Terraform. Alert when the plan shows unexpected differences.
# GitHub Actions scheduled drift detection
on:
schedule:
- cron: '0 6 * * *' # Run daily at 6 AM UTC
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Terraform Plan (drift check)
run: terraform plan -detailed-exitcode
# Exit code 2 = changes detected → trigger alert
10. Document Your Infrastructure
Write a README.md for every module and every environment directory. Include what the code creates, what variables are required, what the outputs mean, and any operational notes. Future you — or your teammate at 2 AM — will be grateful.
Use terraform-docs, an open-source tool that automatically generates Markdown documentation from your Terraform variable and output declarations:
terraform-docs markdown table . > README.md
Key Points Summary
- Separate environment directories prevent accidental cross-environment changes and make blast radius smaller.
- Pin Terraform, provider, and module versions to prevent surprise breakage from upstream changes.
- Remote state with encryption, versioning, and locking is non-negotiable for any production project.
- Every resource must be tagged with environment, project, owner, and managed-by information.
- Plan review is a human gate — never auto-apply to production without a person reviewing the plan output.
- Schedule daily drift detection to catch out-of-band changes before they cause incidents.
- Document every module and environment with clear READMEs; use
terraform-docsto automate input/output documentation.
