In real-world infrastructure deployments, it's common to rely on data that exists outside of Terraform's state. Whether you need to reference existing cloud resources, read configuration values from files, or fetch dynamic input from scripts, Terraform provides powerful features to integrate with external sources of data. This capability is essential for making your infrastructure-as-code setup flexible, reusable, and adaptable to different environments.
In this article, we explore several ways to work with external data in Terraform, including:
data
block to reference existing resources in cloud environments (e.g.,
pre-created networks, machine images, or service accounts)file()
,
jsondecode()
, and yamldecode()
external
data sourceThese features help you decouple infrastructure definitions from hardcoded values, reuse components across multiple environments, and integrate with systems that already manage parts of your infrastructure.
In the sections that follow, we'll walk through some practical examples to demonstrate how you can retrieve and use external data in your Terraform configurations.
data
Blocks to Access Existing Resourcesdata
Block?
Terraform's data
blocks allow you to reference existing infrastructure that was created outside of your
current Terraform project, whether manually, by other teams, or by other automation tools. These data sources
are read-only and do not create or modify any infrastructure. Instead, they retrieve information about
external resources (such as names, IDs, URLs, or configuration metadata), which you can then use within your
Terraform configuration.
This is particularly useful when working in environments where resources are managed by multiple teams, or where certain infrastructure components (like networks, firewall rules, or service accounts) ar provisioned outside Terraform or by other modules.
data
Block Syntax and UsageTerraform's data blocks have the following structure:
Where:
data "<resource_type>" "<name>"
defines an existing resource (data source)
to query.data
block vary by resource types.data.<resource_type>.<name>.<attribute>
anywhere in your configuration.
This example illustrates how to use data
block to retrieve information about an existing
Google Cloud Storage (GCS) bucket:
How It Works:
The data "google_storage_bucket" "logs"
block utilizes the
google_storage_bucket
data source to query a pre-existing GCS bucket named
my-centralized-logs
.
The google_storage_bucket_object
resource then creates and uploads a file (app-log.txt
)
to the storage bucket with the name returned by the data
block.
The data.google_storage_bucket.logs.name
expression references the name attribute of the existing bucket.
While this example is useful as a demonstration of how a data
block works, it has limited practical value.
You can achieve the same result by simply providing the bucket name directly in the google_storage_bucket_object
resource block, without using a data
block.
Below we provide a more practical example that checks whether the bucket exists before attempting to upload an object.
In this example we use the
google_storage_buckets
data source (which retrieves all GCS buckets in a project) plus a bit of
Terraform logic to check for bucket's existence before trying to upload an object.
How It Works:
Get bucket list: The "google_storage_buckets" "all"
data source retrieves a list of storage buckets in
the given project. Optionally, you can use the prefix
attribute to filter buckets by name.
Get bucket names: data.google_storage_buckets.all.buckets[*].name
returns a list of bucket names.
Check bucket's existence: We use the Terraform built-in function contains()
to set
local.bucket_exists
to true if our target bucket is in that list.
Conditional upload: The google_storage_bucket_object.log_file
resource has a count = 1
only
when bucket_exists
is true, so Terraform will only attempt the upload if the bucket is really there.
Terraform completely ignores the google_storage_bucket_object
resource block if count = 0
.
⚠️ If you are trying this example in your GCP project, do not forget to delete the test bucket afterwards to avoid unexpected charges.
data
Blocks
By using data
blocks, you can build Terraform configurations that are modular, environment-agnostic,
and compatible with centrally managed infrastructure. This promotes better collaboration and reuse across
teams and projects.
data
Blocks and Terraform import
When working with infrastructure that exists outside of your Terraform project, you typically have
two main options to integrate it: using a data
block or using Terraform import. While both
allow you to reference external resources, they serve very different purposes.
data
Block:
data
block allows you to read information about an existing resource without taking ownership of
its lifecycle.data
blocks in the state file but it does not manage
the lifecycle of the underlying resources.Terraform import:
Summary Comparison Table:
Feature | data Block |
Terraform import |
---|---|---|
Purpose | Read metadata from existing resources | Manage existing resources with Terraform |
Modifies infrastructure | No | Yes (after import) |
Resource required in config | Yes (data block) | Yes (resource block) |
Adds to Terraform state | Yes (read-only) | Yes (full resource management) |
Manages resource lifecycle | No | Yes |
Use case | Referencing shared/external resources | Taking ownership of manually created infrastructure |
Suitable for shared resources | Ideal | Risky if multiple owners |
Which Should You Use?
Use data
blocks when:
Use Terraform import when:
Terraform provides several ways to load external data into your configuration, enabling more flexible, reusable,
and environment-specific infrastructure setups. Common approaches include reading raw file contents, decoding
structured data formats like JSON or YAML, and using .tfvars.json
files to supply variables.
This section covers:
file()
jsondecode()
yamldecode()
for YAML files.tfvars.json
file()
The file()
function reads the contents of a file and returns it as a string. This function is
useful for reading text-based data such as templates, scripts, or configuration files.
Example:
In this example:
file("${path.module}/startup.sh")
loads a startup script (startup.sh
) form the module's
directory (path.module
) into a local variable (local.startup_script
).
startup-script = local.startup_script
injects the script into the VM's metadata.
jsondecode()
In this example we are going to use the jsondecode()
function to parse a JSON string into a native
Terraform structured type.
jsondecode()
is s commonly paired with file()
to read structured data from disk.
Input file (config.json
):
Terraform code:
How It Works:
file("${path.module}/config.json")
reads the contents of the file named config.json
form the current module's directory.
config = jsondecode(...)
parses the file's contents as JSON and assigns the resulting data structure
to a local variable named config
.
After Terraform processes the locals
block, we could then reference this data elsewhere in the code
like this: local.config.vm_settings.machine_type
.
yamldecode()
This example demonstrates how to load a YAML file with the help of the yamldecode(
) and
file()
functions.
Input file (config.yaml
):
Terraform code:
The code performs these actions:
config.yaml
in the current module's directory.file("${path.module}/config.yaml")
reads the entire contents of that file into a string.jsondecode(...)
parses that string as YAML.config
.
After Terraform processes the locals block, you could then reference this data elsewhere in your code
like this: local.config.region
.
.tfvars.json
Another way to pass input values to Terraform is to use variable definitions files with the .tfvars
and .tfvars.json
extensions. Terraform automatically loads all files with the following names
found in the working directory: terraform.tfvars
, terraform.tfvars.json
,
*.auto.tfvars
, and *.auto.tfvars.json
.
In addition, you can specify variable definitions files using the -var-file
CLI parameter.
Comparing to decoding configuration files with jsondecode()
or yamldecode()
,
.tfvars
files provide a simpler approach when the data structure is fixed and maps directly
to input variables.
Input file (myvars.tfvars.json
):
Terraform code:
Usage in CLI:
terraform apply --var-file="myvars.tfvars.json"
Such approach eliminates the need for data load and transformation with file()
and
jsondecode()
but requires variable declarations in the configuration code.
file()
and jsondecode()
/ yamldecode()
when you need to
dynamically load structured data that's not easily mapped to variables..tfvars
/ .tfvars.json
files when values can be passed cleanly into
declared variables.
Reading from external sources with file()
, jsondecode()
, or using .tfvars
files adds powerful flexibility to your Terraform configurations. These tools allow you to separate logic from data,
reuse templates across environments, and reduce duplication. Choose the approach that best fits your use
case and keep your infrastructure code both maintainable and scalable.
external
Data Source
The external
data source is a variation of the Terraform data
block structure that retrieves data
by running a local program rather than querying a cloud provider API. It supports external programs or scripts,
such as shell scripts, Python programs, or any executables that produce JSON-formatted output. In terms of
structure and usage, external
behaves like any other Terraform data
sources. It provides a powerful
mechanism for integration with systems that Terraform does not natively support, or for
computing input values dynamically at runtime.
Here is how the external
data source works:
program
attribute of the
data "external"
blockresult
objectdata.external.<name>.result
Use external
data source when:
This example illustrate how to use external
data block to call a Python script from Terraform code.
Python script (get_config.py
):
Make sure this script is executable:
chmod +x get_config.py
Terraform code:
What this does:
env = "prod"
passed in as input.data.external.app_config.result
.Here is a more detailed Python script analysis.
The script receives a JSON object via standard input, uses a value from that input to build a new JSON object, and then prints the new object to standard output.
#!/usr/bin/env python3
: This is a "shebang" line. It tells the operating system to execute
this script using the python3
interpreter found in the system's PATHimport json
and import sys
: These lines import necessary standard libraries.
json
is for encoding and decoding JSON data, and sys
is for accessing system-specific
parameters, specifically sys.stdin
(standard input)input_data = json.load(sys.stdin)
: This is the input mechanism
sys.stdin
is a file-like object representing the data piped to the scriptjson.load()
reads from this stream and parses the text as JSON, converting it into
a Python dictionaryenv = input_data.get("env", "dev")
: This line extracts data from the input
"env"
in the input_data
dictionary.get()
is a way to access a key. If "env"
is not found, it returns the default
value "dev"
instead of raising an erroroutput = { ... }
: A new Python dictionary is created. It uses an f-string f"{env}-app-logs"
to dynamically construct a bucket name based on the env
variableprint(json.dumps(output))
: This is the output mechanism
json.dumps()
takes the Python output dictionary and serializes it into a JSON-formatted stringprint()
writes this JSON string to standard output, where the calling program can read itThe following example functions similarly to the previous one, but uses a shell script.
Shell script (get_config.sh
):
Terraform code:
What this does:
env = "prod"
passed in as input.env
value from that
input, and sets a REGION
variable based on whether the env is "prod"
.
Finally, it outputs a JSON object containing the determined region.
data.external.config_data.result
.Lets analyze the shell script step-by-step:
#!/bin/bash
: Specifies that the script should be executed using bashset -e
: Ensures that the script will exit immediately if any command failseval "$(jq -r '@sh "ENV=\(.env)"')"
: This line extracts the value of the env
field from
the input JSON and assigns it to the ENV
shell variable. It uses jq
to transform the JSON
into a shell assignment
jq -r '@sh "ENV=\(.env)"'
: This part uses jq
to extract the value associated
with the key env
from the JSON input. The @sh
operator formats the output as a
shell assignment (ENV="value"
), and -r
ensures that raw string output is produced
(without extra quotes or escaping)eval "..."
: The eval
command executes the string generated by jq
as a
shell command. This effectively sets the ENV
variable in the script's environmentREGION="us-central1"
: Sets a default value for the REGION
variableif [ $ENV == "prod" ] then REGION="us-east1" fi
: Checks if the ENV
variable is equal
to "prod"
. If it is, the REGION
variable is updated to "us-east1"
jq -n --arg region "$REGION" '{"region":$region}'
: This line uses jq
to create a
JSON object with a single field named "region"
. The value of this field is taken from the $REGION
shell variable
-n
: Tells jq
not to read any input--arg region "$REGION"
: Passes the value of the $REGION
shell variable to
jq
as a variable named region
'{"region":$region}'
: This is the jq
filter that constructs the JSON object.
It creates an object with a "region"
key, and its value is the value of the region
variable
passed in via --arg
external
Best Practices & Security Considerationspath.module
to ensure script paths are relative to your module.Managing infrastructure effectively often requires accessing data that exists outside your Terraform configuration - whether it's cloud resources, structured files, or dynamic runtime inputs. Terraform offers several powerful tools to handle these use cases in a modular, repeatable, and maintainable way:
data
blocks allow you to reference and retrieve information from existing cloud resources.
These resources are managed by external processes or code, but their attributes can be used as inputs for
your infrastructure configuration.file()
, jsondecode()
, and
yamldecode()
to bring in data from JSON or YAML files, enabling dynamic and environment-specific
configuration. Alternatively, .tfvars.json
files offer a simpler, built-in way to supply variable
values.external
data source enables integration with custom scripts and external systems, returning
structured data to Terraform by executing programs that supply JSON-formatted data through standard input/output.By choosing the right method for the right task, you can build infrastructure code that is clean, flexible, and adaptable to real-world environments, whether you're referencing shared cloud components, reusing structured config files, or pulling in dynamic data from other systems.
Do you want to learn more about Terraform? Please, check our other tutorials:
Handling Terraform State in Multi-Environment Deployments
Understanding Terraform Variable Precedence
Terraform Value Types Tutorial
Terraform count
Explained with Practical Examples
Terraform for_each
Tutorial with Practical Examples
Exploring Terraform dynamic
Blocks with GCP Examples
Handling Sensitive and Ephemeral Data in Terraform
Terraform Modules FAQ