- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.1k
Description
Terraform Version
1.5.4Use Cases
When one resource that is managed by Terraform has dependent resources that are not managed by Terraform, it is sometimes necessary to use a data resource to retrieve the calculated attributes of the dependent resources.
Currently any planned changes to the resource that is managed by Terraform will cause the data resource to be deferred to apply time. Even if the properties that are changing are not referenced by the data resource.
For any downstream resources that depend on the outputs of the data resource, they are forced plan pessimistically (usually a destroy and recreate).
This could be avoided if the logic that decides whether a data resource must be deferred was updated to look for any planned changes in each specific input, rather than considering resources as a whole.
Example scenario
A concrete scenario where we've encountered this was with the AWS provider. A planned change to a Transit Gateway prefix list was causing a planned drop and recreate of other TGW resources.
The same scenario exists with the VPC endpoint resource, which requires a separate data resource to retrieve the private ID for an interface endpoint.
In general, this is not a problem with any specific provider, but occurs any time the following setup is needed (where creating a "thing_a" will cause a "thing_b" to be automatically created)
resource "foo_thing_a" "a" {
  some_property = "foo"
}
data "foo_thing_b" "b" {
  id = foo_thing_a.a.id
}
resource "foo_thing_c" "c" {
  some_property = data.foo_thing_b.b.some_property
}
In the above scenario, a change to "some_property" guarantees the deferral of the data resource, which may cause "c" to be destroyed and recreated. This happens even if Terraform could work out that the ID of "a" wasn't planned to change.
Attempted Solutions
There are 3 workarounds for this that I know of.
1 - Live with it
Sometimes it's harmless to have downstream resources be recreated. While it results in a messy plan, it may not cause any problems.
2 - Use custom diff logic
Custom diff logic in the provider resources can be used to override the default behavior. AWS tried this and found it was not a good option (they rolled back the change).
3 - Add extra outputs to the resources to avoid the data lookup
Sometimes using a data resource can be avoided by adding the outputs to the main resource. This makes sense in certain circumstances (and was ultimately what AWS did) but it breaks encapsulation and complicates the provider.
Proposal
I propose that the core Terraform logic is updated so that data lookup deferral only happens if the specific properties upon which the data resource depends are unknown until apply time.
resource "foo_thing_a" "a" {
  property1 = "p1" # Changing this causes a possible change to "output1"
  property2 = "p2" # Changing this doesn't cause a change to "output1"
}
data "foo_thing_b" "b" {
  id = foo_thing_a.a.output1
}
In the above setup, if the configuration has a change to proprety1, the schema would indicate that output1 may change. When the data resource is considered, it has to be be deferred.
However, if the configuration only has a change to property2, then the schema would indicate that output1 will not change. The data resource can then be safely executed at plan time.
References
Here are some examples of AWS grappling with this issue and trying the various workarounds.
hashicorp/terraform-provider-aws#30085 (demonstrates the side effect)
hashicorp/terraform-provider-aws#41292 (workaround 2)
hashicorp/terraform-provider-aws#43405 (workaround 2)
hashicorp/terraform-provider-aws#43436 (workaround 3)
hashicorp/terraform-provider-aws#43706 (my suggestion to AWS to rollback workaround 2)