WintelGuy.com

Terraform Provisioners: Usage, Limitations, and Practical Alternatives

Contents

Introduction to Provisioners

Terraform is designed as a declarative infrastructure-as-code tool, where users describe the desired end state of their infrastructure and Terraform determines how to reach it. In this model, Terraform excels at creating, updating, and destroying infrastructure resources in a predictable and repeatable way. For scenarios that require actions outside Terraform's normal resource management workflow, Terraform offers a mechanism known as a provisioner.

Provisioners allow Terraform to execute arbitrary commands or transfer files during the resource management operations. These actions can run locally on the machine executing Terraform or remotely on the provisioned resource itself. Typical examples include running shell commands, uploading configuration files, or performing one-time bootstrapping steps after a resource is created.

While provisioners can be powerful, they also introduce imperative behavior into an otherwise declarative system. Provisioner execution is tightly coupled to resource lifecycle events, yet the actions they perform are not fully represented in Terraform state. As a result, Terraform cannot reliably determine the outcome of these actions or safely reapply them in a predictable manner.

This article explores Terraform provisioners in detail - how they work, how to use them, and their limitations that often lead to operational complexity. It also highlights practical alternatives that align more closely with Terraform's declarative model, helping you make informed decisions about when provisioners are appropriate and when other approaches are a better fit.

Back to Top

Defining Provisioners

Provisioners are defined within a resource block and are integrated in that resource's lifecycle. Unlike most Terraform configuration elements, provisioners do not declare infrastructure resources themselves; instead, they describe actions that Terraform should perform when a resource is created or destroyed.

provisioner blocks are nested inside a resource definition:

resource "<TYPE>" "<LABEL>" { # ... connection { type = <"ssh" or "winrm"> host = <EXPRESSION> <DEFAULT_CONNECTION_SETTINGS> } provisioner "<TYPE>" { # One of: 'file', 'local-exec', 'remote-exec' <PROVISIONER-SPECIFIC_ARGUMENTS> # Determined by the provisioner's type on_failure = <continue || fail> # Action to take when provisioner fails connection { type = <"ssh" or "winrm"> host = <EXPRESSION> <PROVISIONER-SPECIFIC_CONNECTION_SETTINGS> } } }

Multiple provisioner blocks may be included within a single resource block. Provisioners are executed in the order they appear in the parent resource block.

Terraform provides three built-in provisioner types:

  • local-exec - Executes commands on the machine running Terraform.
  • remote-exec - Executes commands on the provisioned resource over a remote connection (SSH or WinRM).
  • file - Transfers files or content from the local machine to the remote resource.

Each provisioner type supports its own set of arguments, with connection and on_failure available to all three. The following table lists the supported arguments for each type:

file local-exec remote-exec
source
content
destination
command
working_dir
interpreter
environment
when
quiet
inline
script
scripts

In the sections that follow we will review in more details the configuration and behavior of each provisioner type.

Back to Top

local-exec Provisioner

The local-exec provisioner executes commands on the machine running Terraform, rather than on the provisioned resource itself. It is the simplest provisioner type and does not require any remote connectivity, authentication, or target-side configuration. As a result, local-exec is the easiest to use and often the first provisioner users adopt.

resource "<TYPE>" "<LABEL>" { # ... provisioner "local-exec" { command = "<COMMAND>" # Command to run (required) working_dir = "<PATH>" # Working directory interpreter = [ # Command interpreter and a list of args. "<PATH_TO_INTERPRETER>", "<COMMAND_ARGUMENT>" ] environment = { # Environment variables "<KEY>" = "<VALUE>" } when = <create || destroy> # When to run, defaults to 'create' quiet = <true || false> # Disable printing the command and arguments on_failure = <continue || fail> # Action to take when provisioner fails } }

The local-exec provisioner supports the following arguments:

  • command - The command to execute. This can be a single shell command or a script invocation.
  • working_dir - Sets the working directory from which the command is executed, defaults to the current working directory.
  • interpreter - Specifies the command interpreter to use (for example, ["bash", "-c"] or ["PowerShell", "-Command"]).
  • environment - A map of environment variables passed to the command at runtime.
  • when - Controls whether the command runs during resource creation (create, default) or destruction (destroy).
  • quiet - Suppresses printing of the command and its arguments in log output, which can be useful for masking sensitive values. Terraform still prints the output of the command.
  • on_failure - Specifies an action for Terraform to take when provisioner fails:
    • continue - Terraform ignores the error and continues the creation or destruction operation.
    • fail - Raise an error and stops applying the configuration. Terraform taints the resource when this value is set for a creation provisioner. This is the default.
  • connection - The local-exec provisioner ignores an included connection block.

Behavior and Execution Model

By default, a local-exec provisioner runs after Terraform determines that the parent resource has been created successfully, but before the apply operation is considered complete. Setting when = destroy configures local-exec to run before the parent resource is destroyed.

It is also important to note that local-exec is not automatically re-run during subsequent terraform apply operations, unless Terraform detects a change that requires the parent resource to be recreated. To force a local-exec re-run, initiate the replacement of the parent resource using the -replace CLI option.

If a provisioner that runs during resource creation fails, Terraform, by default, marks the resource as tainted indicating that it should be destroyed and recreated on the next apply. This behavior can be overridden by setting the on_failure attribute to continue, in which case Terraform ignore the provisioner error and continue with the remaining operations.

If the destroy-time provisioner (when = destroy) fails, Terraform returns an error and reruns the provisioners on the next terraform apply.

Note: Enabling the create_before_destroy lifecycle argument on the resource prevents destroy-time provisioners from running.

Common Use Cases

Typical use cases for local-exec include:

  • Generating local artifacts such as configuration files or templates
  • Invoking external APIs or command-line tools not available through Terraform providers
  • Triggering notifications or integration hooks

Pros and Cons

Pros: Cons:
  • Simple and easy to understand
  • No remote connectivity or credentials required
  • Works well for quick, one-off tasks
  • Strongly tied to the local execution environment
  • Not portable across machines or CI systems without additional setup
  • Not state-aware
  • Can obscure failures if output is suppressed or poorly handled

Example

The following example utilizes local-exec to run a local command after a resource is created:

variable "input_data" { type = string default = "test" } resource "terraform_data" "res_local-exec" { input = var.input_data provisioner "local-exec" { interpreter = ["bash", "-c"] command = "echo \"Hello $${PROVISIONER}!\"" # command = "echo \"Hello $${PROVISIONER}!\" && exit 1" environment = { "PROVISIONER" = "local-exec" } when = create # one of: 'create' or 'destroy' quiet = false # one of: 'true' or 'false' on_failure = fail # one of: 'continue' or 'fail' } }
$ terraform apply -auto-approve Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # terraform_data.res_local-exec will be created + resource "terraform_data" "res_local-exec" { + id = (known after apply) + input = "test" + output = (known after apply) } Plan: 1 to add, 0 to change, 0 to destroy. terraform_data.res_local-exec: Creating... terraform_data.res_local-exec: Provisioning with 'local-exec'... terraform_data.res_local-exec (local-exec): Executing: ["bash" "-c" "echo \"Hello $PROVISIONER\"!"] terraform_data.res_local-exec (local-exec): Hello local-exec! terraform_data.res_local-exec: Creation complete after 0s [id=210049dd-0a05-3056-0784-09038399d44e] Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

To further examine local-exec behavior, modify its attributes or introduce a deliberate failure with exit 1 and re-run terraform apply.

The local-exec provisioner is well-suited for small tasks where no Terraform-native alternative exists. However, when the solution grows more complex, becomes stateful, or transforms into a critical part of your infrastructure, it is a strong indication that a different approach should be adopted.

Back to Top

local-exec Alternatives

While the local-exec provisioner is a flexible tool suitable for many configurations, there are numerous scenarios where tighter integration with external systems requires more structured and focused mechanisms.

This section explores three alternatives to local-exec that provide comparable functionality while integrating more cleanly with Terraform's dependency graph and execution model, allowing for more predictable, maintainable, and declarative workflows.

Back to Top

local_command (Action)

The local_command action provides a way to execute local commands using Terraform's action construct. Actions may be invoked either directly from the Terraform CLI or via the action_trigger mechanism.

A local_command action is defined using an action block:

action "local_command" "<LABEL>" { config { command = "<COMMAND>" # Command to run (required) arguments = [ "<COMMAND_ARGUMENT>" ] # List of arguments stdin = "<INPUT_DATA>" # Standard input working_directory = "<PATH>" # Working directory } }

To associate a local_command action with the lifecycle events of a specific resource and coordinate command execution with its creation or updates, add action trigger rules to the resource's lifecycle block:

resource "<TYPE>" "<LABEL>" { # ... lifecycle { # ... action_trigger { events = [ <EVENT> ] # One or more lifecycle events condition = <EXPRESSION> # Conditional expression (optional) actions = [ action.<TYPE>.<LABEL> ] # List of action references } } }

The action_trigger rule is a block that supports the following arguments:

  • events - A list of lifecycle events that invoke the action; may include one or more of the following events:
    • before_create
    • after_create
    • before_update
    • after_update
  • condition - An optional expression that must evaluate to true to invoke the action(s)
  • actions - An ordered list of actions to run when the requirements specified in the events and condition arguments are met

More than one action_trigger rule can be included in a lifecycle block.

Behavior and Execution Model

A local_command action can be invoked directly from the Terraform CLI or executed as part of the terraform apply workflow if the corresponding action trigger rule is satisfied.

When invoked, local_command executes the specified command on the local machine running Terraform. The command inherits the execution environment of the Terraform process, including all environment variables. After successful execution, the command's standard output (stdout) is captured and displayed by Terraform.

If the command exits with a non-zero status code, Terraform treats this as an error and aborts the current operation, making failures explicit and immediately visible.

Simple Example

variable "input_act" { type = string default = "test" } resource "terraform_data" "trigger_local_command" { input = var.input_act lifecycle { action_trigger { events = [ after_create, after_update ] actions = [ action.local_command.triggered ] } } } action "local_command" "triggered" { config { command = "bash" arguments = ["-c", "cat - "] # Display text from 'stdin' # arguments = ["-c", "cat - && exit 1"] stdin = "local_command: Test Message" } }
$ terraform apply Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # terraform_data.trigger_local_command will be created + resource "terraform_data" "trigger_local_command" { + id = (known after apply) + input = "test" + output = (known after apply) } # Actions to be invoked after this change in order: action "local_command" "triggered" { config { arguments = [ "-c", "cat - ", ] command = "bash" stdin = "local_command: Test Message" } } Plan: 1 to add, 0 to change, 0 to destroy. Actions: 1 to invoke. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes terraform_data.trigger_local_command: Creating... terraform_data.trigger_local_command: Creation complete after 0s [id=c9d20838-bf88-ab6c-2e55-f5a81eafb4b9] Action started: action.local_command.triggered (triggered by terraform_data.trigger_local_command) Action action.local_command.triggered (triggered by terraform_data.trigger_local_command): local_command: Test Message Action complete: action.local_command.triggered (triggered by terraform_data.trigger_local_command) Apply complete! Resources: 1 added, 0 changed, 0 destroyed. Actions: 1 invoked.

Comparison with local-exec

Unlike local-exec provisioner, local_command actions:

  • Can be invoked directly from the Terraform CLI
  • Can be triggered before an update, after an update, and before creation
  • Do not support destroy-time execution
  • Can be associated with (i.e., triggered by) more than one resource
  • Do not taint the associated resource on failure

local_command actions are particularly useful for orchestrating simple tasks that must run at a specific point in a workflow but can not be tied to the destruction of a resource.

Back to Top

local_command (Data Source)

The local_command data source executes a local command and also captures its output as structured data that can be consumed elsewhere in the configuration.

A local_command data source is defined using a data block:

data "local_command" "<LABEL>" { command = "<COMMAND>" # Command to run (required) arguments = [ "<COMMAND_ARGUMENT>" ] # List of arguments stdin = "<INPUT_DATA>" # Standard input working_directory = "<PATH>" # Working directory allow_non_zero_exit_code = <true || false> # Ignore errors depends_on = [ <RESOURCE.ADDRESS> ] }

In addition to its own arguments, the local_command data source supports all standard data block meta-arguments, including count, for_each, depends_on, and lifecycle.

The local_command data source exposes the following read-only attributes, populated by the most recent command execution:

  • exit_code - The exit code returned by the command. By default, a non-zero exit code is treated as a failure, unless allow_non_zero_exit_code is set to true.
  • stderr - Data returned from the command's standard error stream.
  • stdout - Data returned from the command's standard output stream.

These attributes can be referenced by other resources, data sources, outputs, or expressions in the configuration.

Behavior and Dependency Management

As with any other data sources, Terraform evaluates local_command data source during refresh, plan, and apply operations. When evaluated, the local_command executes the specified command on the local machine running Terraform and updates its data attributes with the execution results.

Terraform persists the local_command's data in state and refreshes those values during every apply (or an explicit terraform refresh), even if no other resources have changed.

When Terraform detects dependencies between a local_command data source and other resources, either implicit or explicit, it ensures that all upstream resources are processed first. In some cases, this may defer the evaluation of the local_command data source until a subsequent apply cycle, after all planned changes to the upstream resources are applied.

As a result, the local_command data source is typically evaluated during every apply or refresh operation, as well as during any plan that does not introduce changes to the upstream resources.

Explicit dependency can be declared using the depends_on meta-argument to ensure that the local_command data source is evaluated only after the specific resources have been fully processed.

Simple Example

resource "terraform_data" "res_data" { } data "local_command" "data" { command = "bash" arguments = ["-c", "echo \"Current Date: \" && date +%F_%T.%N"] # arguments = ["-c", "echo \"Command Failed\" 1>&2 && exit 1"] # allow_non_zero_exit_code = true # 'true' or 'false' depends_on = [ terraform_data.res_data ] }
$ terraform apply Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create <= read (data resources) Terraform will perform the following actions: # data.local_command.data will be read during apply # (depends on a resource or a module with changes pending) <= data "local_command" "data" { + arguments = [ + "-c", + "echo \"Current Date: \" && date +%F_%T.%N", ] + command = "bash" + exit_code = (known after apply) + stderr = (known after apply) + stdout = (known after apply) } # terraform_data.res_data will be created + resource "terraform_data" "res_data" { + id = (known after apply) + input = "Test" + output = (known after apply) } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes terraform_data.res_data: Creating... terraform_data.res_data: Creation complete after 0s [id=975f525b-5e86-755f-9d83-3de652ea6010] data.local_command.data: Reading... data.local_command.data: Read complete after 0s Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Comparison with local-exec

Unlike the local-exec provisioner, local_command data sources:

  • Are invoked during every apply and refresh operations
  • Can be invoked during plan operations if there are no changes to the upstream resources
  • Can be associated with multiple resource through explicit dependency
  • Do not taint resources on failure
  • Persist command results in Terraform state

This local_command data source is best suited for scenarios where Terraform needs to trigger an external action and consume the returned information.

Back to Top

external (Data Source)

The external data source allows Terraform to call an external program and exchange data using a strict JSON-based interface.

A external data source is defined using a data block:

data "external" "<LABEL>" { program = [ "<PROGRAM>", # Program to run "<ARGUMENT>" # List of arguments (optional) ] working_dir = "<PATH>" # Working directory query = { # Arbitrary map of strings, passed # to the external program on 'stdin' <KEY> = "<VALUE>" } depends_on = [ <RESOURCE.ADDRESS> ] }

The external data source supports the following arguments:

  • program - A list of strings, whose first element is the program to run and whose subsequent elements are optional command line arguments for the program.
  • query - An optional map of string values to pass to the external program on stdin as the query arguments. The program must read the data passed to it on stdin and parse it as a JSON object. The JSON object contains the contents of the query argument.
  • working_dir - Working directory of the program; defaults to the current directory.

In addition to its own arguments, the external data source supports all standard data block meta-arguments, including count, for_each, depends_on, and lifecycle.

The external data source exposes the following read-only attributes, populated after successful program execution:

  • id - The id of the data source. This will always be set to -.
  • result - A map of string values returned from the external program. The program must produce a valid JSON object on stdout, which will be used to populate the result attribute. This JSON object must have all of its values as strings.

The result attribute can be referenced elsewhere in the Terraform configuration by other resources, data sources, outputs, or expressions.

Behavior and Dependency Management

Similarly to the local_command data source, Terraform evaluates the external data source and re-runs the specified program each time the state is refreshed during the plan and apply phases.

On execution, the program inherits all environment variables visible to the Terraform process. The program receives input via stdin and returns JSON output via stdout. Terraform parses and validates the program's output and populates the result attribute.

If the program encounters an error and is unable to produce a result, it must print a human-readable error message to stderr and exit with a non-zero status. Terraform ignores any data on stdout if the program returns a non-zero status.

Explicit dependency can be controlled via the depends_on meta-argument.

Simple Example

resource "terraform_data" "res_data_external" { } data "external" "data" { program = ["bash", "-c", "jq --arg date \"$(date)\" '{\"date\":$date, \"env\":.env}'"] query = { env = "test" } depends_on = [ terraform_data.res_data_external ] }
$ terraform apply Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create <= read (data resources) Terraform will perform the following actions: # data.external.data will be read during apply # (depends on a resource or a module with changes pending) <= data "external" "data" { + id = (known after apply) + program = [ + "bash", + "-c", + "jq --arg date \"$(date)\" '{\"date\":$date, \"env\":.env}'", ] + query = { + "env" = "test" } + result = (known after apply) } # terraform_data.res_data_external will be created + resource "terraform_data" "res_data_external" { + id = (known after apply) } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes terraform_data.res_data_external: Creating... terraform_data.res_data_external: Creation complete after 0s [id=2490fdf0-1690-5da1-e535-4d59fa43d8fe] data.external.data: Reading... data.external.data: Read complete after 0s [id=-] Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Comparison with local-exec

Compared to the local-exec provisioner, external data sources:

  • Are invoked during every apply and refresh operations
  • Can be invoked during plan operations if there are no changes to the upstream resources
  • Can be associated with multiple resource through explicit dependency
  • Do not taint resources on failure
  • Persist command results in Terraform state
  • Enforce structured data exchange

This external data source is best suited for scenarios where interaction with external systems must follow a strict JSON-based protocol.

Back to Top

file Provisioner

The file provisioner is used to transfer files or content from the local machine to a remote resource. Unlike local-exec, which runs commands locally, the file provisioner operates over a remote connection and requires the target resource to be reachable via SSH or WinRM.

resource "<TYPE>" "<LABEL>" { # ... provisioner "file" { source = "<PATH>" # Source file or directory destination = "<PATH>" # Target file or directory content = "<FILE_CONTENT>" on_failure = <continue || fail> # Action to take when provisioner fails } }

The file provisioner supports the following arguments:

  • source - Path to a local file or directory to be uploaded to the remote resource.
  • content - Inline content to write directly to a file on the remote resource. This argument is mutually exclusive with source.
  • destination - Path on the remote system where the file or directory will be written.

In addition, the file provisioner requires a connection block that defines how Terraform connects to the remote resource (for example, SSH user name , host, and authentication details).

Behavior and Execution Model

The file provisioner runs after Terraform determines that the parent resource has been created successfully, but before the apply operation completes. Execution of the file provisioner depends on a successful remote connection.

On subsequent plan or apply operations, Terraform does not verify whether the transferred files are subsequently used or modified, and the contents of the remote filesystem are not tracked in Terraform state.

If the file transfer fails, Terraform treats the provisioner as failed and applies the same error-handling rules as other provisioners, including optional resource tainting during creation.

Common Use Cases

Typical use cases for the file provisioner include:

  • Uploading initialization scripts or configuration files
  • Transferring application artifacts prior to execution
  • Staging files for use by a remote-exec provisioner

Pros and Cons

Pros: Cons:
  • Simple mechanism for transferring files to remote resources
  • No additional tooling required beyond Terraform
  • Works with both SSH and WinRM connections
  • Requires reliable network connectivity and access to the target resource
  • Not tracked in state
  • File contents changes are not tracked by Terraform
  • Failures can block operations

The file provisioner can be acceptable for small, one-time file transfers in tightly controlled environments, particularly during development or prototyping.

For production systems, provider-native alternatives, such as cloud-init, startup scripts, or pre-baked images usually provide for more reliable and easier to manage workflows.

As with other remote provisioners, the file provisioner should be treated as a last resort rather than a default solution.

Back to Top

remote-exec Provisioner

The remote-exec provisioner executes commands directly on a remote resource after it has been created. Unlike local-exec, which runs on the machine executing Terraform, remote-exec requires an active remote connection, typically over SSH for Linux-based systems or WinRM for Windows hosts.

resource "<TYPE>" "<LABEL>" { # ... provisioner "remote-exec" { # Only one can be used: inline = [ "<COMMAND>" ] # or script = "<PATH_TO_SCRIPT>" # or scripts = [ "<PATH_TO_SCRIPT>" ] on_failure = <continue || fail> # Action to take when provisioner fails } }

The remote-exec provisioner supports the following arguments:

  • inline - A list of commands executed sequentially on the remote system.
  • script - A local script that Terraform uploads and executes remotely.
  • scripts - A list of local scripts uploaded and executed in order.

Any given remote-exec provisioner may contain only one of the three attributes.

Behavior and Execution Model

By default, a remote-exec provisioner runs after Terraform determines that the parent resource has been successfully created but before the apply operation is considered complete.

Terraform assumes the resource is "created" as soon as the provider reports success. However, this does not guarantee that the resource is reachable or fully initialized. As a result, remote-exec frequently encounters timing issues where the resource exists but is not yet accessible over the network.

Terraform does not track the effects of remote command execution in state. It has no visibility into whether the commands have been already executes, are idempotent, or safe to reapply.

Common Use Cases

Typical use cases for the remote-exec provisioner include:

  • Installing system packages
  • Applying configuration changes
  • Running one-time bootstrap scripts
  • Enabling or starting services

These use cases are common during prototyping, in early Terraform adoption or transitional architectures, particularly when migrating from manually configured systems.

Pros and Cons

Pros: Cons:
  • Immediate access to the remote system
  • No additional tooling required beyond Terraform
  • Familiar imperative workflow
  • Strong dependency on network availability and timing
  • Complex authentication and credential handling
  • Not tracked in state
  • Difficult to debug and reproduce failures

Example

The following minimalistic example demonstrates how remote-exec can be used to install Apache web server on an AWS EC2 instance after it is created.

provider "aws" { region = "us-east-1" } resource "aws_key_pair" "ec2_key_pub" { key_name = "ec2_key_pub" public_key = file("~/.ssh/ec2_key.pub") } resource "aws_instance" "test_remote-exec" { ami = "ami-0b6c6ebed2801a5cb" instance_type = "t2.micro" key_name = aws_key_pair.ec2_key_pub.key_name tags = { Name = "test_remote-exec" } provisioner "remote-exec" { inline = [ "#!/bin/bash -x", "sudo apt-get update", "sudo apt-get install -y apache2", "sudo systemctl enable apache2", "sudo systemctl start apache2", "curl -I http://127.0.0.1" ] connection { type = "ssh" user = "ubuntu" private_key = file("~/.ssh/ec2_key") host = self.public_ip } } } output "instance_ip" { value = aws_instance.test_remote-exec.public_ip }

To establish a connection, the remote-exec provisioner relies on SSH-based authentication. Terraform itself does not manage SSH keys; instead, it references existing key pair created outside of Terraform, for example, by using ssh-keygen:

ssh-keygen -t rsa -f ~/.ssh/ec2_key -C "ubuntu@example.com"

Then, the public key is registered in AWS via aws_key_pair and injected into the new EC2 instance. The private key is used only in the remote-exec connection block. No SSH material is hardcoded in the configuration.

The example also assumes that the default VPC is present in us-east-1 with the default security group permitting SSH connectivity between the machine running Terraform and the new instance.

Applying the configuration:

$ terraform apply Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # aws_instance.test_remote-exec will be created + resource "aws_instance" "test_remote-exec" { + ami = "ami-0b6c6ebed2801a5cb" + arn = (known after apply) ... + root_block_device (known after apply) } # aws_key_pair.ec2_key_pub will be created + resource "aws_key_pair" "ec2_key_pub" { + arn = (known after apply) + id = (known after apply) + key_name = "ec2_key_pub" ... + public_key = "ssh-rsa AAAAB3Nza...AEDU= ubuntu@example.com" + region = "us-east-1" + tags_all = (known after apply) } Plan: 2 to add, 0 to change, 0 to destroy. Changes to Outputs: + instance_ip = (known after apply) Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes aws_key_pair.ec2_key_pub: Creating... aws_key_pair.ec2_key_pub: Creation complete after 1s [id=ec2_key_pub] aws_instance.test_remote-exec: Creating... aws_instance.test_remote-exec: Still creating... [00m10s elapsed] aws_instance.test_remote-exec: Provisioning with 'remote-exec'... aws_instance.test_remote-exec (remote-exec): Connecting to remote host via SSH... aws_instance.test_remote-exec (remote-exec): Host: 98.81.132.65 aws_instance.test_remote-exec (remote-exec): User: ubuntu aws_instance.test_remote-exec (remote-exec): Password: false aws_instance.test_remote-exec (remote-exec): Private key: true aws_instance.test_remote-exec (remote-exec): Certificate: false aws_instance.test_remote-exec (remote-exec): SSH Agent: false aws_instance.test_remote-exec (remote-exec): Checking Host Key: false aws_instance.test_remote-exec (remote-exec): Target Platform: unix aws_instance.test_remote-exec (remote-exec): Connected! aws_instance.test_remote-exec (remote-exec): + sudo apt-get update aws_instance.test_remote-exec (remote-exec): 0% [Working] ... aws_instance.test_remote-exec (remote-exec): + sudo systemctl enable apache2 aws_instance.test_remote-exec (remote-exec): Synchronizing state of apache2.service with SysV service script with /usr/lib/systemd/systemd-sysv-install. aws_instance.test_remote-exec (remote-exec): Executing: /usr/lib/systemd/systemd-sysv-install enable apache2 aws_instance.test_remote-exec (remote-exec): + sudo systemctl start apache2 aws_instance.test_remote-exec (remote-exec): + curl -I http://127.0.0.1 aws_instance.test_remote-exec (remote-exec): HTTP/1.1 200 OK aws_instance.test_remote-exec (remote-exec): Date: Thu, 29 Jan 2026 21:34:34 GMT aws_instance.test_remote-exec (remote-exec): Server: Apache/2.4.58 (Ubuntu) aws_instance.test_remote-exec (remote-exec): Last-Modified: Thu, 29 Jan 2026 21:34:25 GMT aws_instance.test_remote-exec (remote-exec): ETag: "29af-6498d9ee7d5e4" aws_instance.test_remote-exec (remote-exec): Accept-Ranges: bytes aws_instance.test_remote-exec (remote-exec): Content-Length: 10671 aws_instance.test_remote-exec (remote-exec): Vary: Accept-Encoding aws_instance.test_remote-exec (remote-exec): Content-Type: text/html aws_instance.test_remote-exec: Creation complete after 1m5s [id=i-0aee3deb6e8d0bc47] Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs: instance_ip = "98.81.132.65"

To avoid unexpected charges, make sure to delete the instance and the keys when no longer needed by running terraform destroy. Confirm when prompted, and Terraform will remove all the created resources.

The remote-exec provisioner works well for temporary or transitional scenarios, such as initial experimentation or migration projects where no alternative is immediately available.

However, in most production environments, remote-exec introduces fragility and operational risk. Cloud-native initialization mechanisms (such as cloud-init or startup scripts), image-based approaches (such as Packer), or configuration management tools provide more reliable and maintainable solutions.

Back to Top

Alternatives to remote-exec & file Provisioners

Terraform's remote-exec and file provisioners are often the first tools users reach for when they need to configure a virtual machine after it has been created. At the same time, these provisioners introduce additional complexity that make them a poor fit for most real-world infrastructure workflows.

At a fundamental level, connectivity-dependent provisioners tightly couple infrastructure deployment to network reachability and credential management. Terraform must be able to reach the target instance over the network and authenticate successfully. Any issue, such as delayed IP assignment, firewall misconfiguration, key rotation, or transient network failures, can cause the entire terraform apply operation to fail.

In addition, provisioner errors may leave resources in a partially configured state and may require user intervention to re-provision or re-create the failed resource.

As infrastructure grows in size and complexity, the following two deployment models become more preferable:

  • Instance-base initialization, where configuration is performed by the instance itself during the first boot using mechanisms such as cloud-init or platform-specific startup scripts.
  • Image-base customization, where virtual machine images are pre-built using tools such as Packer, and Terraform is responsible only for deploying those custom images.

This section focuses primarily on Linux virtual machine provisioning, where cloud-init is the de facto standard across AWS, Azure, and Google Cloud. Windows provisioning follows different patterns and tooling and is therefore mentioned only briefly where relevant.

Back to Top

What Is cloud-init?

cloud-init is a widely adopted Linux instance initialization framework used to perform system configuration during the early stages of a virtual machine's boot process. It is responsible for tasks such as creating users, installing packages, writing configuration files, and enabling or starting services, all before the system becomes fully operational.

It is designed primarily for first-boot initialization. It consumes a YAML-based configuration (commonly referred to as cloud-config) that describes the desired system state and may also prescribe a series of imperative steps or commands execute.

Today, cloud-init is enabled by default on most Linux images provided by major cloud platforms, including AWS, Azure, and Google Cloud. With cloud-init, Terraform's role is limited to supplying configuration data; the interpretation and execution of that configuration are managed entirely by the instance.

Back to Top

Provisioning an EC2 Instance with user_data

On AWS, configuration data can be provided to an EC2 instance through the user data mechanism.

resource "aws_instance" "test" { ami = "ami-0b6c6ebed2801a5cb" instance_type = "t2.micro" key_name = aws_key_pair.ec2_key_pub.key_name user_data = <<-EOF #cloud-config package_update: true packages: - apache2 runcmd: - systemctl enable apache2 - systemctl start apache2 EOF # user_data_base64 = <BINARY_DATA> # Use instead of 'user_data' # user_data_replace_on_change = <true || false> # Defaults to 'false' if not set }

Supported user data attributes:

  • user_data - Supports several formats, including cloud config YAML directives (#cloud-config) as shown above, plain-text shell script (begins with: #!, e.g., #!/bin/bash), or MIME multi-part payload, etc. Updates to this field will trigger a stop/start of the EC2 instance by default, unless user_data_replace_on_change is set to true.
  • user_data_base64 - Can be used instead of user_data to pass base64-encoded binary data directly, for example, whenever the value is not a valid UTF-8 string. Updates to this field will trigger a stop/start of the EC2 instance by default, unless ser_data_replace_on_change is set to true.
  • user_data_replace_on_change - When used in combination with user_data or user_data_base64 will trigger a destroy and recreate of the EC2 instance when set to true. Defaults to false if not set, in which case changes to user_data or user_data_base64 will trigger a stop/start.

When a Linux EC2 instance boots, cloud-init retrieves the user data payload from the EC2 instance metadata service and processes it as part of the boot sequence.

From Terraform's perspective, cloud-init configuration is passed to EC2 using the user_data attribute of the aws_instance resource. Terraform does not execute or interpret this configuration; it simply delivers the content to the instance at creation time.

By default, user data scripts and cloud config directives run only during the first boot cycle when an EC2 instance is launched. Changes to user_data typically require instance replacement to take effect. However, you can include your user data commands and cloud config directives into a MIME multi-part file specifically configured to run at every boot. In this case, scripts and commands should be idempotent to allow safe re-execution during reboot scenarios.

Using user_data with EC2 Linux instances provides a clean, reliable alternative to SSH-based provisioners and aligns well with Terraform's declarative infrastructure model.

For more details, see the AWS EC2 User Guide.

Back to Top

Configuring a GCE Instance Using Instance Metadata

On Google Cloud Compute Engine (GCE), cloud-init integrates with the platform through instance metadata. When a Linux instance boots, cloud-init retrieves configuration data from the metadata server and applies it during the operating system initialization process.

resource "google_compute_instance" "test-vm" { name = "test-vm" machine_type = "f1-micro" zone = "us-central1-a" boot_disk { initialize_params { image = "ubuntu-os-cloud/ubuntu-2404-lts-amd64" } } network_interface { network = "default" # Using 'default' VPC access_config {} # Provision ephemeral public IP } metadata = { # Metadata key/value pairs user-data = <<-EOF #cloud-config package_update: true packages: - apache2 runcmd: - systemctl enable apache2 - systemctl start apache2 EOF } # startup-script = "<DATA>" # startup-script-url = "<DATA>" # shutdown-script = "<DATA>" # shutdown-script-url = "<DATA>" # metadata_startup_script = "<DATA>" }

Google Cloud Platform (GCP) supports cloud-init through several OS-specific metadata keys. On Linux instances the following metadata keys have special meanings:

  • user-data - Supports multiple formats, including cloud config YAML (#cloud-config) shown above, plain-text shell script (begins with: #!, e.g., #!/bin/bash), or MIME multi-part payload, etc. For more details, see Cloud-init User-data formats.
  • startup-script - Specifies a script that will be executed by the instances on every boot.
  • startup-script-url - Same as startup-script except that the script contents are pulled from a Cloud Storage location. Scripts added via startup-script-url execute after scripts specified in startup-script.
  • shutdown-script - Specifies a script that will be executed right before an instance is stopped or restarted.
  • shutdown-script-url - Same as shutdown-script except that the script contents are pulled from a Cloud Storage location.
  • metadata_startup_script is an alternative to the startup-script metadata key, except this one forces the instance to be recreated if it is changed. Only one, either metadata_startup_script or startup-script metadata key, can be included into an instance resource block.

The scripts and commands supplied via the user-data key are executed only once, on the first boot. On the other hand, the startup scripts defined in startup-script or startup-script-url run on every boot. This may present a challenge if the startup scripts contain actions that are intended to run only once, during the initialization. In such cases, the scripts must include appropriate validation logic to ensure that those actions are executed only during the initial boot.

Windows instances support Command shell (.cmd), PowerShell (.ps1), or batch file (.bat) scripts using the following metadata keys:

  • sysprep-specialize-script-ps1
  • sysprep-specialize-script-cmd
  • sysprep-specialize-script-bat
  • sysprep-specialize-script-url
  • windows-startup-script-ps1
  • windows-startup-script-cmd
  • windows-startup-script-bat
  • windows-startup-script-url
  • windows-shutdown-script-ps1
  • windows-shutdown-script-cmd
  • windows-shutdown-script-bat
  • windows-shutdown-script-url

The sysprep scripts are executed during the initial boot, the startup scripts - every boot after the initial boot, and the shutdown scripts - before an instance is stopped or restarted.

GCE Instance Metadata offers varies options for init- or boot-time customization and provides a clean, platform-native alternative to remote-exec and file provisioners for GCE workloads.

For more information, see Compute Engine - About VM metadata and Compute Engine - About startup scripts.

Back to Top

Azure VM Modification with custom_data

Azure delivers cloud-init configuration to the guest OS through the VM's custom data field, which is processed along with the other provisioning details during the instance initialization process.

In Terraform configuration, cloud-init data is supplied using the custom_data attribute of the azurerm_linux_virtual_machine resource. Azure requires this value to be base64-encoded.

resource "azurerm_linux_virtual_machine" "test-vm" { name = "test-vm" size = "Standard_B1s" location = azurerm_resource_group.test-rg.location resource_group_name = azurerm_resource_group.test-rg.name network_interface_ids = [ azurerm_network_interface.test-nic.id ] os_disk { caching = "ReadWrite" storage_account_type = "Standard_LRS" } source_image_reference { publisher = "Canonical" offer = "ubuntu-24_04-lts" sku = "server" version = "latest" } admin_username = "adminuser" admin_ssh_key { username = "adminuser" public_key = file("~/.ssh/az_key.pub") } custom_data = base64encode(local.config) # user_data = "<DATA>" } locals { config = <<-EOF #cloud-config packages: - apache2 runcmd: - systemctl enable apache2 - systemctl start apache2 EOF }

The custom_data could receive a base64-encoded cloud-init configuration file, or a shell script (as long as it starts with #!, then cloud-init will execute it). Changing this attribute forces the creation of a new compute instance resource.

The user_data and custom_data attributes provide similar functionality, but the former can be updated without causing VM re-creation.

For more information, see Custom data and cloud-init on Azure Virtual Machines and User Data for Azure Virtual Machine.

Azure also supports Custom Script Extension (CSE) that allows to download and execute scripts on Azure virtual machines during initialization process or manually, when the VM is already running. CSE require Azure Linux VM Agent to be installed and operational.

In Terraform, Custom Script Extension is typically configured via the azurerm_virtual_machine_extension resource which can manage scripts from Azure Storage, GitHub, or defined with inline commands.

While cloud-init and Azure Custom Script Extension offer comparable capabilities, cloud-init usually handles the initial OS configuration and the CSE performs application-specific deployment tasks afterward. Both can be used together on the same VM.

In general, the Custom Script Extension is more suitable for Azure-specific, post-deployment tasks, as it integrates seamlessly with Azure tooling and provides better feedback on script success/failure status.

On the other hand, cloud-init is typically the preferred choice for multi-cloud or hybrid environments due to its open-source, vendor-agnostic nature and support across different platforms.

Back to Top

Managing cloud-init Configuration Files with Terraform

While simple cloud-init scripts can be embedded directly into user data or instance metadata fields, the real-world setups often require multiple configuration fragments, conditional logic, or a combination of different tools. This is where the Terraform cloudinit provider becomes especially useful.

The cloudinit_config data source included in the cloudinit provider allows Terraform to assemble multipart MIME cloud-init configurations from multiple sources, such as cloud-config YAML files, shell scripts, etc. Cloud-init natively understands this multipart format and processes each part in order during the instance's boot. By using the provider, Terraform remains responsible only for rendering the configuration, while execution still happens entirely inside the VM.

data "cloudinit_config" "<LABEL>" { gzip = <true || false> # Whether or not to gzip the output base64_encode = <true || false> # Whether or not to base64 encode the output boundary = "<SEPARATOR>" # Specifies the boundary separator part { # Adds a file to the cloud-init configuration filename = "<FILE_NAME>" # Filename to list in the header for the part content_type = "<TYPE>" # MIME-style content type content = "<DATA>" # Body content for the part merge_type = "<MERGE_TYPE>" # X-Merge-Type header of the part } part { filename = "<FILE_NAME>" content_type = "<TYPE>" content = "<DATA>" merge_type = "<MERGE_TYPE>" } }

The cloudinit_config data source combines several configuration fragments defined by the nested part {...} blocks. The parts will be included in the final MIME document in the order they are declared in the configuration.

part {...} blocks support the following attributes:

  • content - Body content for the part.
  • content_type - An optional MIME-style content type to report in the header for the part. Defaults to text/plain.
  • filename - An optional filename to report in the header for the part.
  • merge_type - An optional value for the X-Merge-Type header of the part, to control cloud-init merging behavior.

Once rendered, the resulting MIME document is exposed through the rendered attribute and can be passed directly to a compute resource via:

  • user_data for AWS EC2
  • custom_data for Azure Linux VMs (Base64-encoded)
  • Instance metadata (user-data) on GCP

The cloudinit_config data source offers several advantages over inline heredocs:

  • Separates cloud-init configuration into reusable and composable fragments
  • Allows for more robust and consistent configuration
  • Simplifies testing and maintenance of complex cloud-init setups

Back to Top

Building VM Images with Packer

While cloud-init is well suited for provision-time initialization, some environments requiring significant volume of configuration steps or fast deployments may benefit from performing the majority of system configuration before a virtual machine is ever launched. Image-based provisioning addresses such needs by producing pre-configured machine images that can be deployed rapidly and predictably. HashiCorp Packer is the most widely used tool for this approach.

Packer uses the same HashiCorp Configuration Language (HCL) as Terraform, which significantly accelerates the learning process for Terraform users. Concepts such as variables, expressions, conditionals, and templating behave in a similar way, streamlining the introduction of a new tool into the workflow.

Packer Structure and Workflow

Packer configurations are organized around a small set of well-defined, modular blocks:

  • source blocks define where images are built (for example, an AWS AMI, an Azure managed image, or a GCP image) and image parameters (source machine image, image name, temporary instance size, etc.).
  • build blocks tie one or more sources together with provisioning steps and post-processing logic.
  • provisioners define configuration scripts customizing the image during the build process.
  • post-processors collect artifacts and perform cleanup tasks.
  • variables and locals enable environment-specific customization.

Packer follows a straightforward build-oriented workflow driven by its HCL configuration. It launches a temporary build instance from a base image defined in the source block and applies one or more provisioners from the build block to install and configure software. When completed, it captures the finalized system as a reusable image. Once the image is created and registered, the temporary resources are automatically cleaned up, leaving behind a versioned image artifact that can be referenced directly from Terraform.

For a deeper walkthrough of a production-ready workflow, see Automating AMI Image Builds & Rollouts with Packer and CodeBuild.

Cross-Cloud Image Builds

One of Packer's key strengths is its cross-cloud support. The same logical build workflow can be adapted to produce images for multiple platforms, including:

  • AWS AMIs
  • Azure managed images & Azure Compute Gallery
  • Google Compute Engine images

This makes Packer particularly valuable for organizations seeking consistency across providers in multi-cloud environments.

Back to Top

Summary and Key Takeaways

Terraform provisioners - local-exec, remote-exec and file are flexible tools intended for relatively simple, task-oriented operations. However, they should not be treated as a full-fledged configuration management solution.

  • local-exec runs commands on the machine executing Terraform. It is useful for glue logic, notifications, or triggering external systems.
  • remote-exec and file operate over SSH or WinRM and require network reachability, credentials, and precise timing. They tightly couple Terraform to the runtime state of instances, increase failure risks, and make retries and idempotency difficult to manage.

While these provisioners remain effective for certain use cases, their imperative execution model, network-dependency (for remote-exec and file), and limited visibility into execution results often introduce unnecessary complexity when used for instance configuration.

Modern infrastructure provisioning favors two alternative techniques that better align with Terraform's declarative model:

  • Instance-base initialization with cloud-init shifts configuration into the VM itself. Initialization runs during early boot, is designed for first-use setup, and works consistently across AWS, Azure, and GCP. Terraform's role is reduced to supplying metadata or user data, while the instance handles configuration locally.
  • Image-based customization with tools such as Packer moves provisioning even earlier in the lifecycle. Images are built, validated, and versioned before Terraform ever creates infrastructure. This results in faster deployments, fewer runtime dependencies, and a clear separation of responsibilities between image creation and infrastructure orchestration.

In practice, these approaches are complementary:

  • Use cloud-init for lightweight, environment-specific initialization.
  • Use Packer for heavier, repeatable system configuration that should not run at every boot.

The key takeaway:
Terraform configuration should focus on declaring infrastructure, not orchestrating configuration steps over the network. By adopting cloud-init and image-based provisioning, you get more reliable deployments, clear separation of concerns, and infrastructure code that scales well with both team size and system complexity.

Back to Top