In This Article
- What Infrastructure as Code Is and Why It Matters
- Terraform vs Pulumi vs CloudFormation vs CDK
- HCL Syntax Basics
- Providers: AWS, Azure, and GCP
- State Management: Local vs Remote Backend
- Modules and Reusable Infrastructure
- Terraform Cloud for Teams
- Terraform for AI Infrastructure
- CI/CD with Terraform
- The HashiCorp Licensing Change and OpenTofu Fork
- Frequently Asked Questions
Key Takeaways
- Is Terraform still worth learning in 2026? Yes — Terraform remains the most widely adopted Infrastructure as Code tool in 2026, used by the majority of cloud engineering teams across AWS, Az...
- What is the difference between Terraform and OpenTofu? OpenTofu is a community-maintained, open-source fork of Terraform created in response to HashiCorp's 2023 decision to change Terraform's license fr...
- Should I use Terraform or Pulumi in 2026? For most cloud engineers, Terraform is the better starting point in 2026 due to its dominant market share, more comprehensive job market demand, an...
- How do I manage Terraform state safely in a team environment? Managing Terraform state safely in a team requires a remote backend with state locking.
Infrastructure as Code is no longer an advanced specialty reserved for platform engineering teams at large companies. In 2026, it is a baseline expectation for any cloud engineer, DevOps practitioner, or backend developer who deploys to the cloud with any regularity. And in that world, Terraform is the most important tool to understand.
This guide covers everything you need to go from zero to productive with Terraform: what it does and why it exists, how it compares to every major competitor, the HCL syntax you will write every day, state management done right, modules for reusable infrastructure, Terraform Cloud for team workflows, provisioning AI infrastructure including GPU instances, CI/CD integration, and the licensing fork that split the community in 2023 and what it means for you today.
What Infrastructure as Code Is and Why It Matters
Infrastructure as Code (IaC) means your cloud resources — servers, databases, networks, load balancers — are defined in version-controlled text files, so every environment can be reproduced exactly, every change is reviewed and auditable, and standing up a new copy of your entire infrastructure takes minutes instead of days of console clicking. Instead of clicking through a cloud console to provision resources, you write a declaration of what you want, and the IaC tool figures out what API calls to make to create it.
The benefits are not theoretical. They are operational:
- Reproducibility: Your staging environment and production environment are defined by the same code, with only variable values differing. No more "it works in staging but not prod" mysteries caused by configuration drift.
- Version control: Every infrastructure change is a commit. You can see who changed what, when, and why. Rolling back a bad change means reverting a pull request.
- Collaboration: Infrastructure changes go through the same code review process as application changes. Your team reviews a Terraform plan before anything touches production.
- Speed: Spinning up a new environment — a complete copy of your infrastructure for a new customer, a new region, or a load test — takes minutes instead of days.
- Documentation: Your infrastructure config is the documentation. The code describes exactly what exists, which no wiki page ever does accurately.
Terraform specifically uses a declarative approach: you describe the desired end state of your infrastructure, not the steps to get there. If a resource already exists and matches your config, Terraform does nothing. If it needs to change, Terraform plans the minimal set of changes. This is fundamentally different from writing imperative shell scripts that create resources, which fail unpredictably when run a second time.
Terraform vs Pulumi vs CloudFormation vs CDK
Choose Terraform (HCL, multi-cloud, 3,000+ providers) for most teams; Pulumi (TypeScript/Python/Go) when your team wants general-purpose programming languages instead of a DSL; CloudFormation or CDK when you are AWS-only and want deep native integration with no state file management; OpenTofu as the MIT-licensed Terraform fork if the BSL license is a concern. Here is a clear breakdown of the four tools most engineering teams evaluate:
| Tool | Language | Multi-Cloud | State Mgmt | Learning Curve | Best For |
|---|---|---|---|---|---|
| Terraform | HCL (declarative DSL) | Yes | Self-managed or Terraform Cloud | Low–Medium | Multi-cloud teams, most orgs |
| Pulumi | TypeScript, Python, Go, C# | Yes | Pulumi Cloud or self-hosted | Medium (depends on language) | Teams wanting full programming languages |
| CloudFormation | JSON or YAML | AWS only | Fully managed (AWS) | Medium–High | AWS-only shops, compliance-heavy orgs |
| AWS CDK | TypeScript, Python, Java, Go | AWS only | Via CloudFormation | Medium | AWS-native teams with dev backgrounds |
The practical answer for most teams is Terraform. It has the largest community, the most comprehensive provider ecosystem, and the most demand in the job market. The declarative HCL syntax is easier to read and review than YAML-heavy CloudFormation, and it works across AWS, Azure, GCP, and dozens of SaaS providers simultaneously.
Pulumi is worth evaluating if your team has deep TypeScript or Python expertise and needs complex conditional logic, loops, or dynamic resource generation. Pulumi's programming language integration is genuinely more powerful for certain advanced use cases. But the majority of infrastructure does not need that complexity, and for most teams, Terraform's explicit declarative syntax is clearer in code review.
CloudFormation in 2026
CloudFormation is not going away — AWS uses it under the hood for CDK and many managed services. But writing raw CloudFormation YAML in 2026 when Terraform or CDK are available is a choice most teams have moved away from. The verbosity is substantial and the debugging experience is worse than either alternative. If you are AWS-only and prefer a programming language, CDK is the better choice over raw CloudFormation.
HCL Syntax Basics
HCL is Terraform's declarative DSL — you write resource blocks that declare what you want (aws_instance, aws_s3_bucket, aws_rds_cluster), variable blocks that parameterize values, output blocks that expose results, and data blocks that reference existing resources. Run terraform plan to see what will change, terraform apply to execute, and terraform destroy to tear down. HCL is readable, concise, and designed specifically for infrastructure descriptions.
Resources — The Core Building Block
A resource block declares a single infrastructure object. The block type is resource, followed by the provider resource type and a local name you give it.
# Declare an EC2 instance
resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = var.instance_type
tags = {
Name = "web-server-${var.environment}"
Environment = var.environment
ManagedBy = "terraform"
}
}
# Reference the instance in an output
output "web_server_ip" {
value = aws_instance.web_server.public_ip
description = "Public IP of the web server"
}
Variables — Parameterizing Your Config
variable "environment" {
description = "Deployment environment (dev, staging, prod)"
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Environment must be dev, staging, or prod."
}
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
Data Sources — Reading Existing Resources
# Read an existing VPC by tag
data "aws_vpc" "main" {
tags = {
Name = "main-vpc"
}
}
# Look up the latest Ubuntu 22.04 AMI
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-*"]
}
}
# Reference in a resource
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
vpc_id = data.aws_vpc.main.id
}
The three core workflow commands you run constantly: terraform init (download providers and set up backends), terraform plan (show what will change without making any changes), and terraform apply (execute the plan after reviewing it). Always review the plan before applying. Always.
Providers: AWS, Azure, and GCP
Providers are plugins that allow Terraform to interact with specific cloud platforms and services. Each provider exposes resource types and data sources specific to that platform. Configuring a provider is the first thing in any Terraform config.
terraform {
required_version = ">= 1.7.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "aws" {
region = var.aws_region
}
provider "azurerm" {
features {}
}
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
}
Version constraints on providers matter. Pinning to a major version with ~> 5.0 allows patch and minor updates but prevents breaking major version upgrades from running unexpectedly. Always pin provider versions in production configs.
Learn Terraform, AI tools, and cloud engineering hands-on.
Three days. Real infrastructure. From IaC fundamentals to deploying AI model endpoints — with engineers who do this work, not just teach it.
Reserve Your SeatState Management: Local vs Remote Backend
Never use local Terraform state in team environments — use an S3 remote backend with DynamoDB state locking. The S3 backend stores terraform.tfstate encrypted in an S3 bucket; the DynamoDB table prevents two engineers from running terraform apply simultaneously, which would corrupt state. Local state is only acceptable for solo experimentation. Terraform state is the mechanism by which Terraform tracks what resources it manages, comparing config against current state to determine what needs to change on every apply.
By default, Terraform writes state to a local file. This works fine for learning and solo projects. It is a serious problem in team environments for two reasons: multiple engineers running terraform apply simultaneously will corrupt state, and anyone who does not have the state file cannot manage the infrastructure.
Remote Backend with S3 and DynamoDB (AWS)
terraform {
backend "s3" {
bucket = "my-company-terraform-state"
key = "prod/main/terraform.tfstate"
region = "us-east-1"
encrypt = true
# DynamoDB table for state locking
dynamodb_table = "terraform-state-locks"
}
}
# The DynamoDB table (created separately, before backend config)
# aws_dynamodb_table with hash_key = "LockID" and billing_mode = "PAY_PER_REQUEST"
Never Commit terraform.tfstate to Version Control
State files contain sensitive information: passwords, connection strings, private keys embedded in resource attributes. They also contain resource IDs that, if corrupted, can cause Terraform to lose track of existing resources and attempt to recreate them. Add *.tfstate and *.tfstate.backup to your .gitignore immediately. Use a remote backend for any shared environment.
State Commands You Need to Know
- terraform state list — List all resources in the current state
- terraform state show aws_instance.web — Show the full state for a specific resource
- terraform state mv — Move a resource to a new address (used when refactoring)
- terraform import — Import existing infrastructure into Terraform management
- terraform state rm — Remove a resource from state without destroying it (used carefully)
Modules and Reusable Infrastructure
Modules are the primary mechanism for reuse in Terraform. A module is simply a directory of .tf files with defined inputs and outputs. You call a module from another configuration, pass it variables, and it provisions a consistent set of resources.
The practical value is significant: instead of writing an S3 bucket configuration with all the right encryption, access control, and lifecycle policies from scratch in every project, you write it once as a module, publish it, and call it everywhere.
# Call the official AWS VPC module
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "my-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
enable_vpn_gateway = false
tags = local.common_tags
}
# Reference module outputs in other resources
resource "aws_eks_cluster" "main" {
name = "my-cluster"
role_arn = aws_iam_role.eks.arn
vpc_config {
subnet_ids = module.vpc.private_subnets
}
}
The Terraform Registry contains thousands of community and official modules for common patterns: EKS clusters, RDS instances, VPCs, Lambda functions, and more. Using battle-tested registry modules for standard infrastructure patterns is almost always faster and more reliable than writing everything from scratch.
Terraform Cloud for Teams
Terraform Cloud (now rebranded as HCP Terraform under HashiCorp's product consolidation) is a managed platform that solves the operational problems of running Terraform in a team: remote state storage with locking, a collaborative run interface, policy as code via Sentinel, private module registry, and audit logging.
For small teams, Terraform Cloud's free tier is the easiest path to a production-safe workflow. You get remote state with locking, a shared run history, and VCS integration without managing any backend infrastructure. Larger teams with compliance requirements will want the Sentinel policy framework, which lets you write rules like "all S3 buckets must have encryption enabled" and enforce them before any plan can apply.
Self-Hosted Alternative: Atlantis
Atlantis is an open-source Terraform pull request automation tool that you host yourself. It runs terraform plan on every PR and posts the output as a comment, then runs terraform apply when you merge. Many teams use Atlantis instead of Terraform Cloud when they need to keep everything on-premise or want to avoid vendor lock-in. It integrates with GitHub, GitLab, and Bitbucket and is widely used at organizations that cannot use SaaS tooling for compliance reasons.
Terraform for AI Infrastructure
One of the most practically important use cases for Terraform in 2026 is provisioning AI and ML infrastructure: GPU-backed instances for training, managed model endpoints, vector databases, and the networking required to connect them. This infrastructure is expensive, configuration-sensitive, and exactly the kind of thing you want version-controlled and repeatable.
Provisioning GPU Instances on AWS
# p3.2xlarge: 1x NVIDIA V100, 16GB GPU memory
# p4d.24xlarge: 8x NVIDIA A100, 320GB GPU memory (training at scale)
resource "aws_instance" "gpu_trainer" {
ami = data.aws_ami.deep_learning.id
instance_type = "g5.xlarge" # NVIDIA A10G, cost-effective for fine-tuning
subnet_id = module.vpc.private_subnets[0]
vpc_security_group_ids = [aws_security_group.gpu.id]
iam_instance_profile = aws_iam_instance_profile.gpu.name
root_block_device {
volume_size = 200
volume_type = "gp3"
encrypted = true
}
tags = {
Name = "gpu-trainer-${var.run_id}"
Purpose = "model-training"
AutoStop = "true" # Tag for automated shutdown lambda
}
}
# SageMaker endpoint for model inference
resource "aws_sagemaker_endpoint" "inference" {
name = "llm-inference-${var.environment}"
endpoint_config_name = aws_sagemaker_endpoint_configuration.llm.name
}
Terraform for AI Platform Infrastructure Patterns
- Model training clusters: Provisioning spot instance fleets for cost-effective training, with automatic checkpointing to S3 on interruption
- Inference endpoints: SageMaker endpoints, Azure ML endpoints, or Vertex AI endpoints with auto-scaling policies defined in code
- Vector databases: Provisioning Pinecone via Terraform provider, or managing self-hosted Qdrant or Weaviate on Kubernetes
- Feature stores: S3 + Glue catalogs for structured feature data, or Feast configurations managed as code
- MLflow tracking servers: RDS backend and S3 artifact store for experiment tracking at team scale
- Cost controls: Lambda functions and EventBridge rules to automatically stop idle GPU instances — critical for keeping AI infrastructure bills in check
The GPU Cost Problem
A single p4d.24xlarge instance costs roughly $32/hour. An 8xA100 training run left running over a weekend by accident costs over $6,000. Terraform helps because infrastructure-as-code makes it easy to include auto-shutdown mechanisms, enforce maximum instance lifetimes via IAM conditions, and audit exactly what is running and for how long. Teams that manage GPU infrastructure manually (clicking in the console) consistently pay more than those using IaC with enforced cost controls.
CI/CD with Terraform
The standard Terraform CI/CD workflow in 2026 is: run terraform plan on every pull request, post the plan output as a PR comment for review, then run terraform apply automatically on merge to the main branch. This ensures every infrastructure change is reviewed before it reaches production and creates a complete audit trail of what changed and when.
name: Terraform
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.7.0"
- name: Terraform Init
run: terraform init
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
- name: Terraform Plan (on PR)
if: github.event_name == 'pull_request'
run: terraform plan -no-color -out=tfplan
- name: Terraform Apply (on merge)
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve
Key practices for production Terraform CI/CD:
- Never use long-lived access keys in CI: Use OIDC authentication to assume an IAM role from GitHub Actions rather than storing static AWS credentials as secrets
- Separate plan and apply jobs: The plan job runs on every PR; the apply job runs only on merge to main, and should require the plan to have been reviewed and approved
- State locking in CI: Your remote backend must support locking so that simultaneous CI runs cannot corrupt state
- Post plan to PR: Use the
github-scriptaction to post the Terraform plan output as a PR comment so reviewers see exactly what will change - Tag resources with commit SHA: Pass the Git commit SHA as a variable and tag resources with it for traceability
The HashiCorp Licensing Change and the OpenTofu Fork
In August 2023, HashiCorp announced that Terraform — which had been licensed under the Mozilla Public License (MPL 2.0), a true open-source license — would be relicensed under the Business Source License (BSL). The BSL is not an open-source license. It prohibits using Terraform to build products that compete with HashiCorp's commercial offerings.
The response from the community was immediate. Within weeks, a group of companies and individual contributors announced OpenTofu, a fork of Terraform under the MPL 2.0 license. OpenTofu is now governed by the Linux Foundation and actively maintained by a consortium of companies including Spacelift, env0, Scalr, and others.
Where OpenTofu Stands in 2026
OpenTofu 1.8 and beyond have diverged meaningfully from HashiCorp's Terraform in some areas, adding features like early variable evaluation and provider-defined functions that Terraform has not shipped. The two tools remain highly compatible for the vast majority of configurations — standard HCL, providers, modules, and state files work with both. The choice between them is primarily driven by organizational policy (BSL compliance requirements) and tooling ecosystem preference, not technical capability differences in most cases.
For most engineers learning IaC today, the practical answer is straightforward: the HCL you write works on both Terraform and OpenTofu. Learn the concepts and syntax — they transfer completely. If your employer or a government contract requires open-source licensing, OpenTofu is the choice. If you are in a commercial environment with no BSL restrictions and your organization is already on Terraform, there is no urgent reason to migrate.
"The fork did not fracture the community — it validated it. The fact that thousands of engineers could fork the project, find a foundation to host it, and ship production-quality releases within months is a testament to how mature the IaC ecosystem has become."
The longer-term competitive pressure from the OpenTofu fork has also influenced HashiCorp's behavior — the company has been more careful about licensing terms and more transparent about its roadmap since the fork. Competition, even within open-source governance, is healthy.
From IaC fundamentals to production deployments.
The Precision AI Academy bootcamp covers Terraform, cloud infrastructure, AI tooling, and deployment workflows in three intensive days. Small cohort. Hands-on from day one. Real engineers as instructors.
Reserve Your Seat — $1,490The bottom line: Terraform is the highest-ROI IaC skill in 2026 — learn HCL, use S3+DynamoDB remote state with locking from day one, organize infrastructure into reusable modules, and run terraform plan in CI before every merge. The OpenTofu fork is fully compatible with existing configs if BSL licensing is a concern. At minimum, every cloud engineer should be able to write a Terraform module that provisions a VPC, an RDS instance, and an ECS service — that combination covers 80% of what most teams deploy.
Frequently Asked Questions
Is Terraform still worth learning in 2026?
Yes — Terraform remains the most widely adopted Infrastructure as Code tool in 2026, used by the majority of cloud engineering teams across AWS, Azure, and GCP. Despite competition from Pulumi and CDK, Terraform's declarative HCL syntax, massive provider ecosystem (over 3,000 providers), and deep community knowledge base make it the safest and most valuable IaC skill to invest in. The OpenTofu fork has also resolved most concerns about the licensing change, giving teams a fully open-source alternative that is compatible with existing Terraform configurations.
What is the difference between Terraform and OpenTofu?
OpenTofu is a community-maintained, open-source fork of Terraform created in response to HashiCorp's 2023 decision to change Terraform's license from the Mozilla Public License (MPL) to the Business Source License (BSL). OpenTofu is governed by the Linux Foundation and is fully compatible with existing Terraform HCL configurations, state files, and providers for the vast majority of use cases. The choice between them is primarily a matter of organizational policy: teams that cannot use BSL-licensed software — including many government agencies and open-source projects — use OpenTofu. Teams without that constraint often continue with HashiCorp's Terraform.
Should I use Terraform or Pulumi in 2026?
For most cloud engineers, Terraform is the better starting point in 2026 due to its dominant market share, more comprehensive job demand, and the fact that HCL is easier to read and review than infrastructure written in a general-purpose language. Pulumi is the stronger choice when your team has deep TypeScript or Python expertise and needs complex conditional logic, loops, or dynamic resource generation that is cumbersome in HCL. Pulumi's programming language integration is genuinely more powerful for advanced use cases — but the majority of infrastructure does not need that complexity, and Terraform handles it with less overhead and more readability in code review.
How do I manage Terraform state safely in a team environment?
Managing Terraform state safely in a team requires a remote backend with state locking. The most common approach on AWS is an S3 bucket for state storage with DynamoDB for distributed locking — this prevents two engineers from running terraform apply simultaneously and corrupting the state file. On Azure, Azure Blob Storage with native locking works equivalently. Terraform Cloud and HCP Terraform provide managed state storage with locking, access controls, and run history out of the box. The critical rule is: never store terraform.tfstate locally in a team environment, and never commit it to version control. State files contain sensitive credentials and resource mappings that, if corrupted, can cause Terraform to recreate existing resources.
Sources: AWS Documentation, Gartner Cloud Strategy, CNCF Annual Survey
Explore More Guides
- AWS App Runner in 2026: Deploy Web Apps Without Managing Servers
- AWS Bedrock Explained: Build AI Apps with Amazon's Foundation Models
- AWS Lambda and Serverless in 2026: Complete Guide to Event-Driven Architecture
- AI Agents Explained: What They Are & Why They're the Biggest Shift in Tech (2026)
- AI Career Change: Transition Into AI Without a CS Degree