Skip to content

Conversation

@gerhard
Copy link
Member

@gerhard gerhard commented Jan 3, 2026

This started as
https://community.fly.io/t/response-body-timing-out-in-iad-ewr-all-good-in-13-other-regions/26727/3?u=gerhard and implements mayailrus' suggestion re http_service + services incorrect combo.

Before this change (all instances were showing the same config):

flyctl machine status d899761c20de38 -a cdn-2025-02-25 --display-config | jq .services
[
  {
    "protocol": "tcp",
    "ports": [
      {
        "port": 80,
        "handlers": [
          "http"
        ],
        "http_options": {
          "idle_timeout": 60
        }
      },
      {
        "port": 443,
        "handlers": [
          "http",
          "tls"
        ],
        "http_options": {
          "idle_timeout": 60
        }
      }
    ],
    "force_instance_key": null
  },
  {
    "protocol": "tcp",
    "internal_port": 9000,
    "ports": [
      {
        "port": 443,
        "handlers": [
          "tls",
          "http"
        ]
      },
      {
        "port": 80,
        "handlers": [
          "http"
        ]
      }
    ],
    "checks": [
      {
        "type": "http",
        "interval": "5s",
        "timeout": "3s",
        "grace_period": "5s",
        "method": "get",
        "path": "/health",
        "protocol": "http"
      }
    ],
    "concurrency": {
      "type": "requests",
      "hard_limit": 2700,
      "soft_limit": 2000
    },
    "force_instance_key": null
  }
]

After this change:

[
  {
    "protocol": "tcp",
    "internal_port": 9000,
    "ports": [
      {
        "port": 80,
        "handlers": [
          "http"
        ],
        "http_options": {
          "idle_timeout": 60
        }
      },
      {
        "port": 443,
        "handlers": [
          "http",
          "tls"
        ],
        "http_options": {
          "idle_timeout": 60
        }
      }
    ],
    "checks": [
      {
        "type": "http",
        "interval": "5s",
        "timeout": "3s",
        "grace_period": "5s",
        "method": "get",
        "path": "/health"
      }
    ],
    "concurrency": {
      "type": "requests",
      "hard_limit": 2700,
      "soft_limit": 2000
    },
    "force_instance_key": null
  }
]

FWIW, even though this is a good & necessary change, the issue that we were observing fixed itself BEFORE this change was applied 🤷 https://github.com/thechangelog/pipely/actions/runs/20672953558/job/59356750823

image image

As we rolled out this manifest change, we also scaled down instance memory as Varnish sips memory after introducing cache file storage #44. Busiest HOT instance had a short peak to 3.2GB, but otherwise HOT instances stay within the 1GB limit.

Also, make the just scale command operate on both HOT & COLD instance so that sizing will be deterministic. While this will result in unnecessary restarts for instances, it ensures that ALL instances will be in the expected state following a right-sizing operation. As this is meant to be run ad-hoc, typically once or twice during the lifespan of a deployment (~6-9 months), the extra instance churn is OK.

Lastly, cdn-2025-02-25 is scaled down to the 4 primary regions so that it serves the last clients that did not pick up the DNS change. The instance will be deleted in the next few days, this change is the first step towards that state.

Summary by CodeRabbit

  • New Features

    • Improved regional scaling logic to better allocate resources for hot vs. cold regions.
  • Infrastructure & Deployment

    • Promote and adopt the new cdn-2025-12-06 fleet: new machine listings, updated deployment structure, and adjusted concurrency limits.
    • Right-sized memory defaults for regions (reduced default/hot memory) and removed legacy deployment data for the older CDN.
  • Documentation

    • Roadmap updated with the cdn-2025-12-06 identifier, a follow-up right-sizing/cleanup task, and clarified URL-caching note formatting.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 3, 2026

📝 Walkthrough

Walkthrough

Removes Fly.io configs and metadata for cdn-2025-02-25, adds/updates cdn-2025-12-06 deployment files and machine/region data, adjusts default memory values, and changes the Justfile scaling logic and invocation style. README entries and formatting updated.

Changes

Cohort / File(s) Summary
README updates
README.md
Reformatted production promotion to use a fenced CDN identifier, added "Right-size cdn-2025-12-06 + cleanup" (PR #49), and adjusted URL-caching note formatting and ordering.
Removed old instance configs
fly.io/cdn-2025-02-25/
fly.io/cdn-2025-02-25/.envrc, .../fly.toml, .../justfile, .../machines.txt, .../regions.txt, .../sizes.txt
Deleted the cdn-2025-02-25 environment file, fly.toml, Justfile, machines listing, regions metadata, and size definitions — all old instance config/data removed.
New/updated instance env
fly.io/cdn-2025-12-06/.envrc
Reduced memory defaults: FLY_APP_REGION_DEFAULT_MEM 4096 → 2048 and FLY_APP_REGION_HOT_MEM 8192 → 4096.
Restructured Fly.io config
fly.io/cdn-2025-12-06/fly.toml
Replaced top-level [[services]] with http_service (internal_port, force_https), moved concurrency and checks under http_service, removed explicit global port mappings, and adjusted formatting.
Machines & regions data
fly.io/cdn-2025-12-06/machines.txt, fly.io/cdn-2025-12-06/regions.txt
Added a new machines listing (~15 entries) and simplified/updated region metadata (NAME, CODE, CAPACITY), removing gateway/GPUs/launch-plan columns and updating capacities.
Scaling recipe changes
just/fly.just
Inverted HOT-region condition (HOT path now triggers when HOT is present), HOT path applies VM sizing and sets VARNISH_SIZE to 70% of HOT mem (and computes VARNISH_FILE_SIZE), COLD remains fallback; removed @ prefix from just machines invocation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through lines of config and text,
Deleted old nests and set the new ones next,
Memory trimmed and regions counted fair,
HOT now first, COLD waits with tidy care,
Promote the CDN — nibble, prune, prepare! 🥕✨

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Right-size cdn-2025-12-06 + cleanup' accurately summarizes the main changes: memory scaling adjustments for the cdn-2025-12-06 instance and cleanup of the deprecated cdn-2025-02-25 configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3577c27 and 9081a6f.

📒 Files selected for processing (12)
  • README.md
  • fly.io/cdn-2025-02-25/.envrc
  • fly.io/cdn-2025-02-25/fly.toml
  • fly.io/cdn-2025-02-25/justfile
  • fly.io/cdn-2025-02-25/machines.txt
  • fly.io/cdn-2025-02-25/regions.txt
  • fly.io/cdn-2025-02-25/sizes.txt
  • fly.io/cdn-2025-12-06/.envrc
  • fly.io/cdn-2025-12-06/fly.toml
  • fly.io/cdn-2025-12-06/machines.txt
  • fly.io/cdn-2025-12-06/regions.txt
  • just/fly.just
💤 Files with no reviewable changes (6)
  • fly.io/cdn-2025-02-25/fly.toml
  • fly.io/cdn-2025-02-25/.envrc
  • fly.io/cdn-2025-02-25/sizes.txt
  • fly.io/cdn-2025-02-25/regions.txt
  • fly.io/cdn-2025-02-25/justfile
  • fly.io/cdn-2025-02-25/machines.txt
🚧 Files skipped from review as they are similar to previous changes (2)
  • fly.io/cdn-2025-12-06/.envrc
  • fly.io/cdn-2025-12-06/fly.toml
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2026-01-03T13:22:23.700Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 49
File: just/fly.just:20-20
Timestamp: 2026-01-03T13:22:23.700Z
Learning: Fly.io region codes are always exactly 3 characters long and never overlap, so simple regex matching with `[[ {{ FLY_APP_REGIONS_HOT }} =~ $region ]]` is safe and does not require delimiter-based matching to prevent partial matches.

Applied to files:

  • just/fly.just
📚 Learning: 2025-12-30T13:45:37.201Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/hurl.just:13-17
Timestamp: 2025-12-30T13:45:37.201Z
Learning: In Pipely's Justfiles, avoid defining color variables (GREEN, NORMAL, MAGENTA, BOLD, WHITE, CYAN, YELLOW, BLUE). Rely on Just's default color handling so output remains uncolored when colors aren’t defined. Do not hardcode empty strings or override defaults; let the tool manage color defaults for consistent behavior across Justfiles.

Applied to files:

  • just/fly.just
📚 Learning: 2025-12-30T17:03:18.040Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/fly.just:94-98
Timestamp: 2025-12-30T17:03:18.040Z
Learning: In Fly.io deployments, create volumes only in the primary region and do not duplicate volume creation across multiple regions. Do not rely on regional replication for volumes. If you need global configuration, set environment variables (e.g., VARNISH_FILE_SIZE) once in a global scope so they apply to all machines, rather than per-region or per-instance.

Applied to files:

  • just/fly.just
📚 Learning: 2025-12-30T13:54:21.688Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: fly.io/cdn-2025-02-05/regions.txt:33-33
Timestamp: 2025-12-30T13:54:21.688Z
Learning: Preserve the exact verbatim content of CLI tool outputs (e.g., regions.txt) as produced by the tool. Do not normalize spelling, spacing, or formatting. Treat these files as canonical records of command output; document the source command (and timestamp, if relevant) in a brief header or comment if possible. This guideline applies to any file named regions.txt across the repository.

Applied to files:

  • fly.io/cdn-2025-12-06/regions.txt
🪛 LanguageTool
fly.io/cdn-2025-12-06/regions.txt

[uncategorized] ~28-~28: Did you mean “São Paulo” (= city in Brazil)?
Context: ... South America Sao Paulo, Brazil gru 1707

(SAO_PAOLO)

🔇 Additional comments (6)
README.md (1)

22-22: Documentation updates look good.

The changes enhance consistency and clarity:

  • Line 22: Backticks around cdn-2025-12-06 make the identifier more visually distinct and align with code-style formatting conventions.
  • Line 25: New entry for PR Right-size cdn-2025-12-06 + cleanup #49 is properly placed in the v1.2 roadmap section, references the correct PR, and accurately reflects the PR objective.
  • Line 27: Word reordering improves sentence readability without changing the intent.

Also applies to: 25-25, 27-27

fly.io/cdn-2025-12-06/machines.txt (1)

1-21: LGTM — healthy machine deployment snapshot.

All 15 machines are in a healthy state (started, 1/1 checks passing) across the expected regions. The image version (v1.1.0) is consistent, and the mix of performance-1x and performance-2x instances aligns with the PR's memory-scaling adjustments. This looks like an accurate point-in-time snapshot of the deployment.

fly.io/cdn-2025-12-06/regions.txt (1)

1-30: Verify regional capacity data reflects current Fly environment.

The header simplification (removing GATEWAY, GPUS, LAUNCH PLAN+) looks correct, and capacity values have been updated across all regions. Negative values (e.g., Ashburn: −1310, Tokyo: −252) are normal in Fly and indicate regions at or over capacity with pending allocations.

Note: Line 29 shows "Sao Paulo, Brazil"—static analysis suggests this should be "São Paulo". Per learnings, CLI tool outputs should be preserved verbatim. Please confirm whether the Fly CLI output uses the unaccented spelling, and correct if needed to match the canonical tool output format.

just/fly.just (3)

17-17: LGTM: Improved visibility for ad-hoc operations.

Removing the @ prefix allows the just machines command to be echoed during execution, providing better visibility into scale and restart operations. This aligns well with the PR's intent for these commands to be used ad-hoc.

Also applies to: 79-79


20-28: LGTM: HOT region configuration is correct.

The HOT region path correctly identifies machines in hot regions and applies the appropriate resource specifications. Both VARNISH_SIZE (70% of memory) and VARNISH_FILE_SIZE (90% of disk) are properly configured for optimal cache performance.


36-36: LGTM: Consistency improvement for COLD regions.

Adding VARNISH_FILE_SIZE to the COLD region path ensures all machines have uniform file-based cache configuration, matching the HOT region setup. This aligns with the PR's goal of enabling Varnish cache file storage across all instances.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
just/fly.just (1)

20-27: Consider more precise region matching logic.

The regex match [[ {{ FLY_APP_REGIONS_HOT }} =~ $region ]] works for the current 3-letter region codes but could theoretically produce false positives (e.g., if a region "wr" existed, it would match "ewr" in the HOT regions string).

🔎 More robust region matching

Consider using a more precise match that checks for word boundaries:

-        if [[ {{ FLY_APP_REGIONS_HOT }} =~ $region ]]; then \
+        if [[ ",{{ FLY_APP_REGIONS_HOT }}," =~ ,$region, ]]; then \

This ensures the region is matched as a complete comma-delimited token rather than a substring.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a76a0e1 and b0f75c8.

📒 Files selected for processing (12)
  • README.md
  • fly.io/cdn-2025-02-25/.envrc
  • fly.io/cdn-2025-02-25/fly.toml
  • fly.io/cdn-2025-02-25/justfile
  • fly.io/cdn-2025-02-25/machines.txt
  • fly.io/cdn-2025-02-25/regions.txt
  • fly.io/cdn-2025-02-25/sizes.txt
  • fly.io/cdn-2025-12-06/.envrc
  • fly.io/cdn-2025-12-06/fly.toml
  • fly.io/cdn-2025-12-06/machines.txt
  • fly.io/cdn-2025-12-06/regions.txt
  • just/fly.just
💤 Files with no reviewable changes (6)
  • fly.io/cdn-2025-02-25/justfile
  • fly.io/cdn-2025-02-25/.envrc
  • fly.io/cdn-2025-02-25/regions.txt
  • fly.io/cdn-2025-02-25/sizes.txt
  • fly.io/cdn-2025-02-25/fly.toml
  • fly.io/cdn-2025-02-25/machines.txt
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-30T13:54:21.688Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: fly.io/cdn-2025-02-05/regions.txt:33-33
Timestamp: 2025-12-30T13:54:21.688Z
Learning: Preserve the exact verbatim content of CLI tool outputs (e.g., regions.txt) as produced by the tool. Do not normalize spelling, spacing, or formatting. Treat these files as canonical records of command output; document the source command (and timestamp, if relevant) in a brief header or comment if possible. This guideline applies to any file named regions.txt across the repository.

Applied to files:

  • fly.io/cdn-2025-12-06/regions.txt
📚 Learning: 2025-12-30T17:03:18.040Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/fly.just:94-98
Timestamp: 2025-12-30T17:03:18.040Z
Learning: In Fly.io deployments, create volumes only in the primary region and do not duplicate volume creation across multiple regions. Do not rely on regional replication for volumes. If you need global configuration, set environment variables (e.g., VARNISH_FILE_SIZE) once in a global scope so they apply to all machines, rather than per-region or per-instance.

Applied to files:

  • just/fly.just
📚 Learning: 2025-12-30T13:45:37.201Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/hurl.just:13-17
Timestamp: 2025-12-30T13:45:37.201Z
Learning: In Pipely's Justfiles, avoid defining color variables (GREEN, NORMAL, MAGENTA, BOLD, WHITE, CYAN, YELLOW, BLUE). Rely on Just's default color handling so output remains uncolored when colors aren’t defined. Do not hardcode empty strings or override defaults; let the tool manage color defaults for consistent behavior across Justfiles.

Applied to files:

  • just/fly.just
🪛 LanguageTool
fly.io/cdn-2025-12-06/regions.txt

[uncategorized] ~28-~28: Did you mean “São Paulo” (= city in Brazil)?
Context: ... South America Sao Paulo, Brazil gru 1707

(SAO_PAOLO)

🔇 Additional comments (7)
fly.io/cdn-2025-12-06/fly.toml (2)

26-31: LGTM: Health check migration is correct.

The relocation of health checks under http_service.checks and removal of the protocol field aligns with the fix described in the PR objectives. The check parameters remain intact.


13-24: The http_service configuration is correct.

The http_service block with internal_port = 9000 is the proper way to configure HTTP/HTTPS on Fly.io. This simplified syntax automatically exposes ports 80 and 443 externally and routes traffic to the specified internal_port—no explicit port definitions needed. The migration from [[services]] to http_service is appropriate for this use case.

fly.io/cdn-2025-12-06/.envrc (1)

11-14: LGTM: Memory reduction is well-justified.

The 50% memory reduction for both DEFAULT (4096→2048 MB) and HOT (8192→4096 MB) instances is appropriate given the observed usage patterns mentioned in the PR (peak 3.2 GB for HOT, typically within 1 GB limit). The new allocations provide adequate headroom above observed peaks.

README.md (1)

22-27: LGTM: Documentation updates are clear and consistent.

The updates accurately reflect the current PR (adding the "Right-size cdn-2025-12-06 + cleanup" task) and improve consistency by using backticks for the instance name reference.

fly.io/cdn-2025-12-06/machines.txt (1)

1-21: LGTM: Machines snapshot reflects correct HOT/COLD sizing.

The machines listing accurately documents the deployment state with HOT regions (sjc, dfw, ord, iad, ewr, fra, sin, nrt) running performance-2x:4096MB instances and COLD regions running performance-1x:2048MB instances, consistent with the environment configuration in .envrc.

just/fly.just (1)

17-17: LGTM: Improved visibility for scale and restart operations.

Removing the @ silent prefix from just machines calls makes the machine listings visible during execution, improving observability for these ad-hoc operations.

Also applies to: 77-77

fly.io/cdn-2025-12-06/regions.txt (1)

29-29: Preserve CLI output format (do not normalize "Sao Paulo").

Static analysis flagged "Sao Paulo" vs "São Paulo", but based on learnings from PR #44, regions.txt files should preserve the exact verbatim output from the Fly CLI tool, including spelling and formatting. This ensures the file remains a canonical record of the command output.

However, consider adding a brief header comment documenting the source command and timestamp (e.g., # Generated by: fly regions or similar), as suggested in the learnings. This improves maintainability and clarifies the file's provenance without altering the actual data.

@gerhard gerhard force-pushed the right-size-cdn-2025-12-06+cleanup branch from b0f75c8 to 3577c27 Compare January 3, 2026 11:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
just/fly.just (1)

20-36: Add VARNISH_FILE_SIZE to the scale command's machine updates.

The scale command updates machines with VARNISH_SIZE but doesn't set VARNISH_FILE_SIZE. The deploy command correctly sets it (line 11). Add --env VARNISH_FILE_SIZE=$(({{ FLY_APP_DISK_SIZE }} * 90 / 100))G to both machine update calls (lines 22-27 for HOT regions and lines 30-35 for COLD regions) to ensure scaled machines have the correct file cache configuration.

Current scale command missing VARNISH_FILE_SIZE
          just flyctl machine update --yes \
            --vm-cpu-kind {{ FLY_APP_REGION_HOT_CPU_KIND }} \
            --vm-cpus {{ FLY_APP_REGION_HOT_CPU_NUM }} \
            --vm-memory {{ FLY_APP_REGION_HOT_MEM }} \
            --env VARNISH_SIZE=$(({{ FLY_APP_REGION_HOT_MEM }} * 70 / 100))M \
            "$machine"; \

Should include: --env VARNISH_FILE_SIZE=$(({{ FLY_APP_DISK_SIZE }} * 90 / 100))G

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0f75c8 and 3577c27.

📒 Files selected for processing (12)
  • README.md
  • fly.io/cdn-2025-02-25/.envrc
  • fly.io/cdn-2025-02-25/fly.toml
  • fly.io/cdn-2025-02-25/justfile
  • fly.io/cdn-2025-02-25/machines.txt
  • fly.io/cdn-2025-02-25/regions.txt
  • fly.io/cdn-2025-02-25/sizes.txt
  • fly.io/cdn-2025-12-06/.envrc
  • fly.io/cdn-2025-12-06/fly.toml
  • fly.io/cdn-2025-12-06/machines.txt
  • fly.io/cdn-2025-12-06/regions.txt
  • just/fly.just
💤 Files with no reviewable changes (6)
  • fly.io/cdn-2025-02-25/.envrc
  • fly.io/cdn-2025-02-25/sizes.txt
  • fly.io/cdn-2025-02-25/fly.toml
  • fly.io/cdn-2025-02-25/regions.txt
  • fly.io/cdn-2025-02-25/justfile
  • fly.io/cdn-2025-02-25/machines.txt
🚧 Files skipped from review as they are similar to previous changes (3)
  • README.md
  • fly.io/cdn-2025-12-06/machines.txt
  • fly.io/cdn-2025-12-06/.envrc
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-30T17:03:18.040Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/fly.just:94-98
Timestamp: 2025-12-30T17:03:18.040Z
Learning: In Fly.io deployments, create volumes only in the primary region and do not duplicate volume creation across multiple regions. Do not rely on regional replication for volumes. If you need global configuration, set environment variables (e.g., VARNISH_FILE_SIZE) once in a global scope so they apply to all machines, rather than per-region or per-instance.

Applied to files:

  • just/fly.just
📚 Learning: 2025-12-30T13:45:37.201Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/hurl.just:13-17
Timestamp: 2025-12-30T13:45:37.201Z
Learning: In Pipely's Justfiles, avoid defining color variables (GREEN, NORMAL, MAGENTA, BOLD, WHITE, CYAN, YELLOW, BLUE). Rely on Just's default color handling so output remains uncolored when colors aren’t defined. Do not hardcode empty strings or override defaults; let the tool manage color defaults for consistent behavior across Justfiles.

Applied to files:

  • just/fly.just
📚 Learning: 2025-12-30T13:54:21.688Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: fly.io/cdn-2025-02-05/regions.txt:33-33
Timestamp: 2025-12-30T13:54:21.688Z
Learning: Preserve the exact verbatim content of CLI tool outputs (e.g., regions.txt) as produced by the tool. Do not normalize spelling, spacing, or formatting. Treat these files as canonical records of command output; document the source command (and timestamp, if relevant) in a brief header or comment if possible. This guideline applies to any file named regions.txt across the repository.

Applied to files:

  • fly.io/cdn-2025-12-06/regions.txt
🪛 LanguageTool
fly.io/cdn-2025-12-06/regions.txt

[uncategorized] ~28-~28: Did you mean “São Paulo” (= city in Brazil)?
Context: ... South America Sao Paulo, Brazil gru 1707

(SAO_PAOLO)

🔇 Additional comments (7)
fly.io/cdn-2025-12-06/regions.txt (1)

1-29: Clarify the negative capacity values in the regions table.

The file shows several regions with negative CAPACITY values (Tokyo: -252, Ashburn: -1310, Secaucus: -56). This is unusual and warrants clarification: do these represent reserved or over-committed capacity, data from a constrained time window, or a configuration issue?

Please confirm that these negative values are intentional and correctly reflect the current deployment state, especially given the PR's objective to scale down to the four primary regions.

fly.io/cdn-2025-12-06/fly.toml (4)

13-15: Verify force_https = false is intentional for a CDN service.

For a CDN, forcing HTTPS is typically a security best practice. Setting force_https = false allows unencrypted HTTP traffic. Please confirm this is intentional and not a configuration oversight.

If this setting is meant to allow HTTP traffic for specific compatibility reasons, consider documenting the rationale in a comment. Otherwise, consider changing to:

🔎 Suggested change to enforce HTTPS
 [http_service]
   internal_port = 9000
-  force_https = false
+  force_https = true

20-24: LGTM! Concurrency limits are well-documented.

The concurrency configuration appropriately aligns with Varnish's thread pool settings, and the comment clearly explains the rationale for the chosen values.


26-31: LGTM! Health check configuration is appropriate.

The health check parameters are well-balanced for quick failure detection without being overly aggressive. The migration to [[http_service.checks]] correctly removes the explicit protocol field since it's implicit in the http_service context.


33-35: LGTM! Mount configuration is correct.

The Varnish cache mount destination is properly configured for persistent file storage, which aligns with the PR's goal of enabling Varnish cache file storage to reduce memory usage.

just/fly.just (2)

17-17: Improved visibility for machine operations.

Removing the @ prefix makes the just machines command visible during execution, improving observability for these ad-hoc scaling operations.


77-77: Consistent visibility improvement.

This change mirrors the one at line 17, ensuring consistent command visibility across both scale and restart recipes.

This started as
https://community.fly.io/t/response-body-timing-out-in-iad-ewr-all-good-in-13-other-regions/26727/3?u=gerhard
and implements mayailrus' suggestion re `http_service` + `services`
incorrect combo.

Before this change (all instances were showing the same config):

    flyctl machine status d899761c20de38 -a cdn-2025-02-25 --display-config | jq .services
    [
      {
        "protocol": "tcp",
        "ports": [
          {
            "port": 80,
            "handlers": [
              "http"
            ],
            "http_options": {
              "idle_timeout": 60
            }
          },
          {
            "port": 443,
            "handlers": [
              "http",
              "tls"
            ],
            "http_options": {
              "idle_timeout": 60
            }
          }
        ],
        "force_instance_key": null
      },
      {
        "protocol": "tcp",
        "internal_port": 9000,
        "ports": [
          {
            "port": 443,
            "handlers": [
              "tls",
              "http"
            ]
          },
          {
            "port": 80,
            "handlers": [
              "http"
            ]
          }
        ],
        "checks": [
          {
            "type": "http",
            "interval": "5s",
            "timeout": "3s",
            "grace_period": "5s",
            "method": "get",
            "path": "/health",
            "protocol": "http"
          }
        ],
        "concurrency": {
          "type": "requests",
          "hard_limit": 2700,
          "soft_limit": 2000
        },
        "force_instance_key": null
      }
    ]

After this change:

    [
      {
        "protocol": "tcp",
        "internal_port": 9000,
        "ports": [
          {
            "port": 80,
            "handlers": [
              "http"
            ],
            "http_options": {
              "idle_timeout": 60
            }
          },
          {
            "port": 443,
            "handlers": [
              "http",
              "tls"
            ],
            "http_options": {
              "idle_timeout": 60
            }
          }
        ],
        "checks": [
          {
            "type": "http",
            "interval": "5s",
            "timeout": "3s",
            "grace_period": "5s",
            "method": "get",
            "path": "/health"
          }
        ],
        "concurrency": {
          "type": "requests",
          "hard_limit": 2700,
          "soft_limit": 2000
        },
        "force_instance_key": null
      }
    ]

FWIW, even though this is a good & necessary change, the issue that we
were observing fixed itself BEFORE this change was applied 🤷
https://github.com/thechangelog/pipely/actions/runs/20672953558/job/59356750823

As we rolled out this manifest change, we also scaled down instance
memory as Varnish sips memory after introducing cache file storage
Busiest HOT instance had a short peak to 3.2GB, but otherwise HOT
instances stay within the 1GB limit.

Also, make the `just scale` command operate on **both** HOT & COLD
instance so that sizing will be deterministic. While this will result in
unnecessary restarts for instances, it ensures that ALL instances will
be in the expected state following a right-sizing operation. As this is
meant to be run ad-hoc, typically once or twice during the lifespan of a
deployment (~6-9 months), the extra instance churn is OK.

Lastly, `cdn-2025-02-25` is scaled down to the 4 primary regions so that
it serves the last clients that did not pick up the DNS change. The
instance will be deleted in the next few days, this change is the first
step towards that state.

Signed-off-by: Gerhard Lazu <[email protected]>
@gerhard gerhard force-pushed the right-size-cdn-2025-12-06+cleanup branch from 3577c27 to 9081a6f Compare January 3, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants