Skip to content

Conversation

@nyergler
Copy link
Member

This commit adds a task definition for ingestion. The task definition ARN is injected into the environments of tasks that require it. This is in preparation for supporting isolated ingestion for on premises deployments.

This commit adds a task definition for ingestion. The task definition
ARN is injected into the environments of tasks that require it. This
is in preparation for supporting isolated ingestion for on premises
deployments.
@nyergler nyergler requested review from jc and jpalawaga October 17, 2025 06:10
@nyergler
Copy link
Member Author

See polytomic/app#12381.

@jc
Copy link

jc commented Oct 17, 2025

@codex review

@jc
Copy link

jc commented Oct 17, 2025

@greptile review

Copy link
Member

@jpalawaga jpalawaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no expert with this stuff, though it seems like mostly a cmd-c cmd-v of the existing service definitions, which makes sense to me.

scheduler_log_group = module.ecs_log_groups["scheduler"].cloudwatch_log_group_name
env = merge(local.environment.env, {
INGEST_EXECUTOR_TASK_DEFINITION = aws_ecs_task_definition.ingest.arn,
INGEST_EXECUTOR_CONTAINER_NAME = "ingest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, but where are these used?

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds infrastructure support for isolated data ingestion in Polytomic's on-premises ECS deployments. The changes introduce a new "ingest" task definition that follows the existing service patterns, complete with logging, resource configuration, and environment variable management. The ingest task definition ARN is injected into worker, sync, and scheduler services via INGEST_EXECUTOR_TASK_DEFINITION and INGEST_EXECUTOR_CONTAINER_NAME environment variables, enabling these services to dynamically spawn dedicated ingestion containers using ECS Run Task API. This architectural pattern separates compute-intensive ingestion workloads from core services, improving resource isolation and scaling flexibility. The implementation maintains consistency with existing task definitions while introducing new CPU/memory configuration variables with sensible defaults (2 vCPU, 8GB memory) that reflect the expected resource requirements for data ingestion operations.

Changed Files
Filename Score Overview
terraform/modules/ecs/logs.tf 5/5 Added "ingest" to the list of services requiring CloudWatch log groups
terraform/modules/ecs/main.tf 5/5 Added ingest memory configuration to the locals environment block
terraform/modules/ecs/README.md 5/5 Auto-generated documentation update for new ingest task definition and variables
terraform/modules/ecs/vars.tf 5/5 Added CPU and memory configuration variables for the ingest container
terraform/modules/ecs/task-definitions/ingest.json.tftpl 4/5 New ECS task definition template following established patterns with proper configuration
terraform/modules/ecs/ecs-tasks.tf 2/5 Added ingest task definition but missing corresponding ECS service to deploy it

Confidence score: 2/5

  • This PR introduces a potentially incomplete implementation that may not function as intended
  • Score lowered due to missing ECS service for the new ingest task definition, making the task unusable in practice, and potential environment variable merging issues in task definitions
  • Pay close attention to terraform/modules/ecs/ecs-tasks.tf which defines the task but provides no deployment mechanism

Sequence Diagram

sequenceDiagram
    participant User
    participant Terraform
    participant AWS_ECS
    participant Task_Definitions as Task Definitions
    participant ECS_Services as ECS Services
    participant Environment as Environment Variables

    User->>Terraform: "Apply terraform configuration"
    Terraform->>Task_Definitions: "Create ingest task definition"
    Task_Definitions->>AWS_ECS: "Register aws_ecs_task_definition.ingest"
    
    Note over Task_Definitions: Task definition includes:<br/>- CPU: polytomic_resource_ingest_cpu<br/>- Memory: polytomic_resource_ingest_memory<br/>- Container: ingest.json.tftpl
    
    Terraform->>Environment: "Set INGEST_EXECUTOR_TASK_DEFINITION"
    Environment->>Task_Definitions: "Inject ingest task ARN into worker"
    Environment->>Task_Definitions: "Inject ingest task ARN into sync"
    Environment->>Task_Definitions: "Inject ingest task ARN into scheduler"
    
    Note over Environment: Environment variables added:<br/>INGEST_EXECUTOR_TASK_DEFINITION<br/>INGEST_EXECUTOR_CONTAINER_NAME
    
    Terraform->>ECS_Services: "Update worker service with ingest task ARN"
    Terraform->>ECS_Services: "Update sync service with ingest task ARN"
    Terraform->>ECS_Services: "Update scheduler service with ingest task ARN"
    
    AWS_ECS->>User: "Ingest task definition ready for isolated ingestion"
Loading

Additional Comments (1)

  1. terraform/modules/ecs/ecs-tasks.tf, line 233-274 (link)

    logic: Missing ECS service for ingest task definition. The task definition is created but no aws_ecs_service resource exists to run it, making the task unreachable.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +72 to +75
env = merge(local.environment.env, {
INGEST_EXECUTOR_TASK_DEFINITION = aws_ecs_task_definition.ingest.arn,
INGEST_EXECUTOR_CONTAINER_NAME = "ingest"
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Environment variable structure inconsistency. This creates env.env nesting which differs from other tasks that use flat environment variables.

Prompt To Fix With AI
This is a comment left during a code review.
Path: terraform/modules/ecs/ecs-tasks.tf
Line: 72:75

Comment:
**logic:** Environment variable structure inconsistency. This creates `env.env` nesting which differs from other tasks that use flat environment variables.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants