Working with External Data in Terraform

Introduction
Using data Blocks to Access Existing Resources
Comparing data Blocks and Terraform import
Reading Data and Values from External Files
Using the external Data Source
Conclusion

Introduction

In real-world infrastructure deployments, it's common to rely on data that exists outside of Terraform's state. Whether you need to reference existing cloud resources, read configuration values from files, or fetch dynamic input from scripts, Terraform provides powerful features to integrate with external sources of data. This capability is essential for making your infrastructure-as-code setup flexible, reusable, and adaptable to different environments.

In this article, we explore several ways to work with external data in Terraform, including:

Using the data block to reference existing resources in cloud environments (e.g., pre-created networks, machine images, or service accounts)
Reading structured configuration from external files using functions like file(), jsondecode(), and yamldecode()
Calling external scripts (e.g., Python or shell) to fetch input data at runtime via the external data source

These features help you decouple infrastructure definitions from hardcoded values, reuse components across multiple environments, and integrate with systems that already manage parts of your infrastructure.

In the sections that follow, we'll walk through some practical examples to demonstrate how you can retrieve and use external data in your Terraform configurations.

Using `data` Blocks to Access Existing Resources

What Is a `data` Block?

Terraform's data blocks allow you to reference existing infrastructure that was created outside of your current Terraform project, whether manually, by other teams, or by other automation tools. These data sources are read-only and do not create or modify any infrastructure. Instead, they retrieve information about external resources (such as names, IDs, URLs, or configuration metadata), which you can then use within your Terraform configuration.

This is particularly useful when working in environments where resources are managed by multiple teams, or where certain infrastructure components (like networks, firewall rules, or service accounts) ar provisioned outside Terraform or by other modules.

`data` Block Syntax and Usage

Terraform's data blocks have the following structure:

data "<resource_type>" "<name>"{ # resource lookup or reference: name = "existing_resource_name" ... }

Where:

data "<resource_type>" "<name>" defines an existing resource (data source) to query.
Inside the block, you provide lookup parameters such as resource name, ID, or other identifying properties. The actual parameters used inside the data block vary by resource types.
Once defined, you can reference the retrieved data using the reference data.<resource_type>.<name>.<attribute> anywhere in your configuration.

Example: Referencing an Existing GCS Bucket

This example illustrates how to use data block to retrieve information about an existing Google Cloud Storage (GCS) bucket:

# main.tf data "google_storage_bucket" "logs" { name = "my-centralized-logs" } resource "google_storage_bucket_object" "log_file" { name = "app-log.txt" content = "Example log content" bucket = data.google_storage_bucket.logs.name }

How It Works:

The data "google_storage_bucket" "logs" block utilizes the google_storage_bucket data source to query a pre-existing GCS bucket named my-centralized-logs.

The google_storage_bucket_object resource then creates and uploads a file (app-log.txt) to the storage bucket with the name returned by the data block.

The data.google_storage_bucket.logs.name expression references the name attribute of the existing bucket.

While this example is useful as a demonstration of how a data block works, it has limited practical value. You can achieve the same result by simply providing the bucket name directly in the google_storage_bucket_object resource block, without using a data block.

Below we provide a more practical example that checks whether the bucket exists before attempting to upload an object.

Example: Checking if a Bucket Exists

In this example we use the google_storage_buckets data source (which retrieves all GCS buckets in a project) plus a bit of Terraform logic to check for bucket's existence before trying to upload an object.

# main.tf # Variables variable "project" { type = string description = "GCP project to look for buckets in" } variable "bucket_name" { type = string description = "Name of the bucket to upload into" } # Get all buckets in the project data "google_storage_buckets" "all" { project = var.project # prefix = "my" # Update to filter buckets by name } # Determine if our target bucket exists locals { bucket_exists = contains(data.google_storage_buckets.all.buckets[*].name, var.bucket_name) } # Conditionally upload only if bucket_exists is true resource "google_storage_bucket_object" "log_file" { count = local.bucket_exists ? 1 : 0 bucket = var.bucket_name name = "app-log.txt" content = "Example log content" }

How It Works:

Get bucket list: The "google_storage_buckets" "all" data source retrieves a list of storage buckets in the given project. Optionally, you can use the prefix attribute to filter buckets by name.

Get bucket names: data.google_storage_buckets.all.buckets[*].name returns a list of bucket names.

Check bucket's existence: We use the Terraform built-in function contains() to set local.bucket_exists to true if our target bucket is in that list.

Conditional upload: The google_storage_bucket_object.log_file resource has a count = 1 only when bucket_exists is true, so Terraform will only attempt the upload if the bucket is really there. Terraform completely ignores the google_storage_bucket_object resource block if count = 0.

⚠️ If you are trying this example in your GCP project, do not forget to delete the test bucket afterwards to avoid unexpected charges.

Common Use Cases for `data` Blocks

Referencing shared or manually created infrastructure - e.g., using a centrally managed or shared VPC, service accounts, or Cloud Storage buckets.
Getting default or current values from a cloud provider - e.g., retrieving the latest base image for a VM or the current region settings.
Accessing metadata or IDs required for dependent resources - e.g., getting the subnet ID to attach a VM.

By using data blocks, you can build Terraform configurations that are modular, environment-agnostic, and compatible with centrally managed infrastructure. This promotes better collaboration and reuse across teams and projects.

Comparing `data` Blocks and Terraform import

When working with infrastructure that exists outside of your Terraform project, you typically have two main options to integrate it: using a data block or using Terraform import. While both allow you to reference external resources, they serve very different purposes.

data Block:

A data block allows you to read information about an existing resource without taking ownership of its lifecycle.
It is declarative - you define the type of resource you want to look up, provide search criteria (such as a name or ID), and Terraform queries the resource for its attributes.
Terraform does track the data retrieved from data blocks in the state file but it does not manage the lifecycle of the underlying resources.
It is ideal for referencing shared or external infrastructure like existing networks, images, buckets, or service accounts.

Terraform import:

The Terraform import command brings an existing resource into your Terraform state, allowing Terraform to manage it moving forward.
You must already have a matching resource block defined in your configuration with the correct settings.
Once imported, Terraform assumes full responsibility for the resource's lifecycle, meaning future changes in configuration may modify or delete the resource.
It is best suited when you want to adopt and manage existing infrastructure under Terraform control.

Summary Comparison Table:

Feature	`data` Block	Terraform import
Purpose	Read metadata from existing resources	Manage existing resources with Terraform
Modifies infrastructure	No	Yes (after import)
Resource required in config	Yes (data block)	Yes (resource block)
Adds to Terraform state	Yes (read-only)	Yes (full resource management)
Manages resource lifecycle	No	Yes
Use case	Referencing shared/external resources	Taking ownership of manually created infrastructure
Suitable for shared resources	Ideal	Risky if multiple owners

Which Should You Use?

Use data blocks when:

You want to reference an external resource without modifying it.
The resource is owned or managed by another project, team, or module.
You're building reusable modules that depend on shared infrastructure.

Use Terraform import when:

You want to bring an existing resource under Terraform management.
You plan to modify or track the resource over time within your configuration.

Reading Data and Values from External Files

Terraform provides several ways to load external data into your configuration, enabling more flexible, reusable, and environment-specific infrastructure setups. Common approaches include reading raw file contents, decoding structured data formats like JSON or YAML, and using .tfvars.json files to supply variables.

This section covers:

Reading file contents using file()
Parsing structured data with jsondecode()
Using of yamldecode() for YAML files
Supplying variables with .tfvars.json
Best practices for managing external data inputs

Reading File Contents with `file()`

The file() function reads the contents of a file and returns it as a string. This function is useful for reading text-based data such as templates, scripts, or configuration files.

Example:

# main.tf locals { startup_script = file("${path.module}/startup.sh") } resource "google_compute_instance" "vm" { name = "vm-with-script" machine_type = "e2-medium" zone = "us-central1-a" metadata = { startup-script = local.startup_script } boot_disk { initialize_params { image = "debian-cloud/debian-11" } } network_interface { network = "default" } }

In this example:

file("${path.module}/startup.sh") loads a startup script (startup.sh) form the module's directory (path.module) into a local variable (local.startup_script).

startup-script = local.startup_script injects the script into the VM's metadata.

Parsing JSON with `jsondecode()`

In this example we are going to use the jsondecode() function to parse a JSON string into a native Terraform structured type. jsondecode() is s commonly paired with file() to read structured data from disk.

Input file (config.json):

{ "region": "us-central1", "vm_settings": { "machine_type": "e2-micro", "zone": "us-central1-a" } }

Terraform code:

# main.tf locals { config = jsondecode(file("${path.module}/config.json")) } output "local_config" { value = local.config } resource "google_compute_instance" "vm" { name = "my-vm" machine_type = local.config.vm_settings.machine_type zone = local.config.vm_settings.zone boot_disk { initialize_params { image = "debian-cloud/debian-11" } } network_interface { network = "default" } }

How It Works:

file("${path.module}/config.json") reads the contents of the file named config.json form the current module's directory.

config = jsondecode(...) parses the file's contents as JSON and assigns the resulting data structure to a local variable named config.

After Terraform processes the locals block, we could then reference this data elsewhere in the code like this: local.config.vm_settings.machine_type.

Importing YAML with `yamldecode()`

This example demonstrates how to load a YAML file with the help of the yamldecode() and file() functions.

Input file (config.yaml):

region: us-central1 vm_settings: machine_type: e2-micro zone: us-central1-a

Terraform code:

# main.tf locals { config = yamldecode(file("${path.module}/config.yaml")) } output "local_config" { value = local.config }

The code performs these actions:

It locates a file named config.yaml in the current module's directory.
file("${path.module}/config.yaml") reads the entire contents of that file into a string.
jsondecode(...) parses that string as YAML.
It assigns the resulting data structure to a local variable named config.

After Terraform processes the locals block, you could then reference this data elsewhere in your code like this: local.config.region.

Supplying Variable Values via `.tfvars.json`

Another way to pass input values to Terraform is to use variable definitions files with the .tfvars and .tfvars.json extensions. Terraform automatically loads all files with the following names found in the working directory: terraform.tfvars, terraform.tfvars.json, *.auto.tfvars, and *.auto.tfvars.json. In addition, you can specify variable definitions files using the -var-file CLI parameter.

Comparing to decoding configuration files with jsondecode() or yamldecode(), .tfvars files provide a simpler approach when the data structure is fixed and maps directly to input variables.

Input file (myvars.tfvars.json):

{ "region": "us-central1", "bucket_name": "shared-logs-bucket", "vm_settings": { "machine_type": "e2-micro", "zone": "us-central1-a" } }

Terraform code:

# main.tf variable "region" {} variable "bucket_name" {} variable "vm_settings" {} output "input_vars" { value = { region = var.region bucket = var.bucket_name params = var.vm_settings } }

Usage in CLI:
terraform apply --var-file="myvars.tfvars.json"

Such approach eliminates the need for data load and transformation with file() and jsondecode() but requires variable declarations in the configuration code.

Best Practices for External File Usage

Use file() and jsondecode() / yamldecode() when you need to dynamically load structured data that's not easily mapped to variables.
Use .tfvars / .tfvars.json files when values can be passed cleanly into declared variables.
Keep configuration files small and well-structured; document their format clearly.
Avoid placing secrets or credentials in plain-text files; use secure variable injection instead (e.g., environment variables, secrets manager).
Validate the structure and contents of your external files to avoid runtime errors.

Reading from external sources with file(), jsondecode(), or using .tfvars files adds powerful flexibility to your Terraform configurations. These tools allow you to separate logic from data, reuse templates across environments, and reduce duplication. Choose the approach that best fits your use case and keep your infrastructure code both maintainable and scalable.

Using the `external` Data Source

The external data source is a variation of the Terraform data block structure that retrieves data by running a local program rather than querying a cloud provider API. It supports external programs or scripts, such as shell scripts, Python programs, or any executables that produce JSON-formatted output. In terms of structure and usage, external behaves like any other Terraform data sources. It provides a powerful mechanism for integration with systems that Terraform does not natively support, or for computing input values dynamically at runtime.

Here is how the external data source works:

Terraform executes a local script or binary identified by the program attribute of the data "external" block
The script/program outputs a valid JSON object to stdout
Terraform parses the JSON-formatted input via stdin and creates corresponding result object
Received values are available using the reference data.external.<name>.result
Terraform will re-run the script/program every time it performs a plan or apply operation.
It can use any executable binary or script (including Bash, Python, Go, etc.), as long as it can be executed from the command line and meets Terraform's requirements for input and output

Use external data source when:

You need to integrate with an external system (e.g., third-party APIs, CMDB, configuration tools, credential stores, CI/CD pipelines, etc.)
You want to generate data dynamically via scripts
You need runtime access to data not available via Terraform providers or built-in functions
You need to use a script or program to implement custom logic or complex operations

Example: Calling a Python Script

This example illustrate how to use external data block to call a Python script from Terraform code.

Python script (get_config.py):

#!/usr/bin/env python3 import json import sys # Read input from stdin input_data = json.load(sys.stdin) # Generate default value if needed env = input_data.get("env", "dev") # Output JSON output = { "bucket_name": f"{env}-app-logs", "region": "us-central1" } print(json.dumps(output))

Make sure this script is executable:
chmod +x get_config.py

Terraform code:

# main.tf data "external" "app_config" { program = ["${path.module}/get_config.py"] query = { env = "prod" } } output "bucket_name" { value = data.external.app_config.result["bucket_name"] } output "region" { value = data.external.app_config.result["region"] }

What this does:

Terraform calls the Python script with env = "prod" passed in as input.
The script returns a JSON object with a region and bucket name.
The returned values are available in data.external.app_config.result.

Here is a more detailed Python script analysis.

The script receives a JSON object via standard input, uses a value from that input to build a new JSON object, and then prints the new object to standard output.

#!/usr/bin/env python3: This is a "shebang" line. It tells the operating system to execute this script using the python3 interpreter found in the system's PATH
import json and import sys: These lines import necessary standard libraries. json is for encoding and decoding JSON data, and sys is for accessing system-specific parameters, specifically sys.stdin (standard input)
input_data = json.load(sys.stdin): This is the input mechanism
- sys.stdin is a file-like object representing the data piped to the script
- json.load() reads from this stream and parses the text as JSON, converting it into a Python dictionary
env = input_data.get("env", "dev"): This line extracts data from the input
- It looks for the key "env" in the input_data dictionary
- .get() is a way to access a key. If "env" is not found, it returns the default value "dev" instead of raising an error
output = { ... }: A new Python dictionary is created. It uses an f-string f"{env}-app-logs" to dynamically construct a bucket name based on the env variable
print(json.dumps(output)): This is the output mechanism
- json.dumps() takes the Python output dictionary and serializes it into a JSON-formatted string
- print() writes this JSON string to standard output, where the calling program can read it

Example: Invoking a Shell Script

The following example functions similarly to the previous one, but uses a shell script.

Shell script (get_config.sh):

#!/bin/bash # Exit if any of the script steps fail set -e # Extract "env" arguments from the input into # ENV shell variables. eval "$(jq -r '@sh "ENV=\(.env)"')" # Data-generating logic REGION="us-central1" if [ $ENV == "prod" ] then REGION="us-east1" fi # Produce a JSON object containing the result jq -n --arg region "$REGION" '{"region":$region}'

Terraform code:

# main.tf variable "env" { type = string default = "dev" } data "external" "config_data" { program = ["${path.module}/get_config.sh"] query = { env = var.env } } output "environment" { value = var.env } output "region" { value = data.external.config_data.result["region"] }

What this does:

Terraform calls the shell script with env = "prod" passed in as input.
The shell script receives a JSON object via standard input, extracts the env value from that input, and sets a REGION variable based on whether the env is "prod". Finally, it outputs a JSON object containing the determined region.
The returned value is available in data.external.config_data.result.

Lets analyze the shell script step-by-step:

#!/bin/bash: Specifies that the script should be executed using bash
set -e: Ensures that the script will exit immediately if any command fails
eval "$(jq -r '@sh "ENV=\(.env)"')": This line extracts the value of the env field from the input JSON and assigns it to the ENV shell variable. It uses jq to transform the JSON into a shell assignment
- jq -r '@sh "ENV=\(.env)"': This part uses jq to extract the value associated with the key env from the JSON input. The @sh operator formats the output as a shell assignment (ENV="value"), and -r ensures that raw string output is produced (without extra quotes or escaping)
- eval "...": The eval command executes the string generated by jq as a shell command. This effectively sets the ENV variable in the script's environment
REGION="us-central1": Sets a default value for the REGION variable
if [ $ENV == "prod" ] then REGION="us-east1" fi: Checks if the ENV variable is equal to "prod". If it is, the REGION variable is updated to "us-east1"
jq -n --arg region "$REGION" '{"region":$region}': This line uses jq to create a JSON object with a single field named "region". The value of this field is taken from the $REGION shell variable
- -n: Tells jq not to read any input
- --arg region "$REGION": Passes the value of the $REGION shell variable to jq as a variable named region
- '{"region":$region}': This is the jq filter that constructs the JSON object. It creates an object with a "region" key, and its value is the value of the region variable passed in via --arg

`external` Best Practices & Security Considerations

Validate external data sources carefully - Terraform does not verify the script's logic.
Avoid non-deterministic behavior (e.g., random values, changing responses) unless intended.
Don't use external scripts for secrets or credentials unless handled securely (e.g., via environment variables or secret stores).
Be aware of performance impacts - Terraform must wait for the script to execute during each plan or apply.
Document expected input/output structure.
Use path.module to ensure script paths are relative to your module.
Handle input validation and error reporting within your script.

Conclusion

Managing infrastructure effectively often requires accessing data that exists outside your Terraform configuration - whether it's cloud resources, structured files, or dynamic runtime inputs. Terraform offers several powerful tools to handle these use cases in a modular, repeatable, and maintainable way:

data blocks allow you to reference and retrieve information from existing cloud resources. These resources are managed by external processes or code, but their attributes can be used as inputs for your infrastructure configuration.
External files can be loaded using functions like file(), jsondecode(), and yamldecode() to bring in data from JSON or YAML files, enabling dynamic and environment-specific configuration. Alternatively, .tfvars.json files offer a simpler, built-in way to supply variable values.
The external data source enables integration with custom scripts and external systems, returning structured data to Terraform by executing programs that supply JSON-formatted data through standard input/output.
Terraform import allows you to bring existing infrastructure under Terraform management, adding it to the state file so that future changes are tracked and applied. Unlike data blocks, imported resources are fully managed by Terraform after import.

By choosing the right method for the right task, you can build infrastructure code that is clean, flexible, and adaptable to real-world environments, whether you're referencing shared cloud components, reusing structured config files, or pulling in dynamic data from other systems.

Do you want to learn more about Terraform? Please, check our other tutorials:
Handling Terraform State in Multi-Environment Deployments
Understanding Terraform Variable Precedence
Terraform Value Types Tutorial
Terraform count Explained with Practical Examples
Terraform for_each Tutorial with Practical Examples
Exploring Terraform dynamic Blocks with GCP Examples
Handling Sensitive and Ephemeral Data in Terraform
Terraform Modules FAQ

Working with External Data in Terraform

Contents

Introduction

Using data Blocks to Access Existing Resources

What Is a data Block?

data Block Syntax and Usage

Example: Referencing an Existing GCS Bucket

Example: Checking if a Bucket Exists

Common Use Cases for data Blocks

Comparing data Blocks and Terraform import

Reading Data and Values from External Files

Reading File Contents with file()

Parsing JSON with jsondecode()

Importing YAML with yamldecode()

Supplying Variable Values via .tfvars.json

Best Practices for External File Usage

Using the external Data Source

Example: Calling a Python Script

Example: Invoking a Shell Script

external Best Practices & Security Considerations

Conclusion

Using `data` Blocks to Access Existing Resources

What Is a `data` Block?

`data` Block Syntax and Usage

Common Use Cases for `data` Blocks

Comparing `data` Blocks and Terraform import

Reading File Contents with `file()`

Parsing JSON with `jsondecode()`

Importing YAML with `yamldecode()`

Supplying Variable Values via `.tfvars.json`

Using the `external` Data Source

`external` Best Practices & Security Considerations