-
Notifications
You must be signed in to change notification settings - Fork 2
Right-size cdn-2025-12-06 + cleanup #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughRemoves Fly.io configs and metadata for cdn-2025-02-25, adds/updates cdn-2025-12-06 deployment files and machine/region data, adjusts default memory values, and changes the Justfile scaling logic and invocation style. README entries and formatting updated. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (12)
💤 Files with no reviewable changes (6)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧠 Learnings (4)📚 Learning: 2026-01-03T13:22:23.700ZApplied to files:
📚 Learning: 2025-12-30T13:45:37.201ZApplied to files:
📚 Learning: 2025-12-30T17:03:18.040ZApplied to files:
📚 Learning: 2025-12-30T13:54:21.688ZApplied to files:
🪛 LanguageToolfly.io/cdn-2025-12-06/regions.txt[uncategorized] ~28-~28: Did you mean “São Paulo” (= city in Brazil)? (SAO_PAOLO) 🔇 Additional comments (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
just/fly.just (1)
20-27: Consider more precise region matching logic.The regex match
[[ {{ FLY_APP_REGIONS_HOT }} =~ $region ]]works for the current 3-letter region codes but could theoretically produce false positives (e.g., if a region "wr" existed, it would match "ewr" in the HOT regions string).🔎 More robust region matching
Consider using a more precise match that checks for word boundaries:
- if [[ {{ FLY_APP_REGIONS_HOT }} =~ $region ]]; then \ + if [[ ",{{ FLY_APP_REGIONS_HOT }}," =~ ,$region, ]]; then \This ensures the region is matched as a complete comma-delimited token rather than a substring.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
README.mdfly.io/cdn-2025-02-25/.envrcfly.io/cdn-2025-02-25/fly.tomlfly.io/cdn-2025-02-25/justfilefly.io/cdn-2025-02-25/machines.txtfly.io/cdn-2025-02-25/regions.txtfly.io/cdn-2025-02-25/sizes.txtfly.io/cdn-2025-12-06/.envrcfly.io/cdn-2025-12-06/fly.tomlfly.io/cdn-2025-12-06/machines.txtfly.io/cdn-2025-12-06/regions.txtjust/fly.just
💤 Files with no reviewable changes (6)
- fly.io/cdn-2025-02-25/justfile
- fly.io/cdn-2025-02-25/.envrc
- fly.io/cdn-2025-02-25/regions.txt
- fly.io/cdn-2025-02-25/sizes.txt
- fly.io/cdn-2025-02-25/fly.toml
- fly.io/cdn-2025-02-25/machines.txt
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-30T13:54:21.688Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: fly.io/cdn-2025-02-05/regions.txt:33-33
Timestamp: 2025-12-30T13:54:21.688Z
Learning: Preserve the exact verbatim content of CLI tool outputs (e.g., regions.txt) as produced by the tool. Do not normalize spelling, spacing, or formatting. Treat these files as canonical records of command output; document the source command (and timestamp, if relevant) in a brief header or comment if possible. This guideline applies to any file named regions.txt across the repository.
Applied to files:
fly.io/cdn-2025-12-06/regions.txt
📚 Learning: 2025-12-30T17:03:18.040Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/fly.just:94-98
Timestamp: 2025-12-30T17:03:18.040Z
Learning: In Fly.io deployments, create volumes only in the primary region and do not duplicate volume creation across multiple regions. Do not rely on regional replication for volumes. If you need global configuration, set environment variables (e.g., VARNISH_FILE_SIZE) once in a global scope so they apply to all machines, rather than per-region or per-instance.
Applied to files:
just/fly.just
📚 Learning: 2025-12-30T13:45:37.201Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/hurl.just:13-17
Timestamp: 2025-12-30T13:45:37.201Z
Learning: In Pipely's Justfiles, avoid defining color variables (GREEN, NORMAL, MAGENTA, BOLD, WHITE, CYAN, YELLOW, BLUE). Rely on Just's default color handling so output remains uncolored when colors aren’t defined. Do not hardcode empty strings or override defaults; let the tool manage color defaults for consistent behavior across Justfiles.
Applied to files:
just/fly.just
🪛 LanguageTool
fly.io/cdn-2025-12-06/regions.txt
[uncategorized] ~28-~28: Did you mean “São Paulo” (= city in Brazil)?
Context: ... South America Sao Paulo, Brazil gru 1707
(SAO_PAOLO)
🔇 Additional comments (7)
fly.io/cdn-2025-12-06/fly.toml (2)
26-31: LGTM: Health check migration is correct.The relocation of health checks under
http_service.checksand removal of theprotocolfield aligns with the fix described in the PR objectives. The check parameters remain intact.
13-24: The http_service configuration is correct.The
http_serviceblock withinternal_port = 9000is the proper way to configure HTTP/HTTPS on Fly.io. This simplified syntax automatically exposes ports 80 and 443 externally and routes traffic to the specified internal_port—no explicit port definitions needed. The migration from[[services]]tohttp_serviceis appropriate for this use case.fly.io/cdn-2025-12-06/.envrc (1)
11-14: LGTM: Memory reduction is well-justified.The 50% memory reduction for both DEFAULT (4096→2048 MB) and HOT (8192→4096 MB) instances is appropriate given the observed usage patterns mentioned in the PR (peak 3.2 GB for HOT, typically within 1 GB limit). The new allocations provide adequate headroom above observed peaks.
README.md (1)
22-27: LGTM: Documentation updates are clear and consistent.The updates accurately reflect the current PR (adding the "Right-size
cdn-2025-12-06+ cleanup" task) and improve consistency by using backticks for the instance name reference.fly.io/cdn-2025-12-06/machines.txt (1)
1-21: LGTM: Machines snapshot reflects correct HOT/COLD sizing.The machines listing accurately documents the deployment state with HOT regions (sjc, dfw, ord, iad, ewr, fra, sin, nrt) running performance-2x:4096MB instances and COLD regions running performance-1x:2048MB instances, consistent with the environment configuration in
.envrc.just/fly.just (1)
17-17: LGTM: Improved visibility for scale and restart operations.Removing the
@silent prefix fromjust machinescalls makes the machine listings visible during execution, improving observability for these ad-hoc operations.Also applies to: 77-77
fly.io/cdn-2025-12-06/regions.txt (1)
29-29: Preserve CLI output format (do not normalize "Sao Paulo").Static analysis flagged "Sao Paulo" vs "São Paulo", but based on learnings from PR #44, regions.txt files should preserve the exact verbatim output from the Fly CLI tool, including spelling and formatting. This ensures the file remains a canonical record of the command output.
However, consider adding a brief header comment documenting the source command and timestamp (e.g.,
# Generated by: fly regionsor similar), as suggested in the learnings. This improves maintainability and clarifies the file's provenance without altering the actual data.
b0f75c8 to
3577c27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
just/fly.just (1)
20-36: Add VARNISH_FILE_SIZE to the scale command's machine updates.The
scalecommand updates machines withVARNISH_SIZEbut doesn't setVARNISH_FILE_SIZE. Thedeploycommand correctly sets it (line 11). Add--env VARNISH_FILE_SIZE=$(({{ FLY_APP_DISK_SIZE }} * 90 / 100))Gto both machine update calls (lines 22-27 for HOT regions and lines 30-35 for COLD regions) to ensure scaled machines have the correct file cache configuration.Current scale command missing VARNISH_FILE_SIZE
just flyctl machine update --yes \ --vm-cpu-kind {{ FLY_APP_REGION_HOT_CPU_KIND }} \ --vm-cpus {{ FLY_APP_REGION_HOT_CPU_NUM }} \ --vm-memory {{ FLY_APP_REGION_HOT_MEM }} \ --env VARNISH_SIZE=$(({{ FLY_APP_REGION_HOT_MEM }} * 70 / 100))M \ "$machine"; \Should include:
--env VARNISH_FILE_SIZE=$(({{ FLY_APP_DISK_SIZE }} * 90 / 100))G
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
README.mdfly.io/cdn-2025-02-25/.envrcfly.io/cdn-2025-02-25/fly.tomlfly.io/cdn-2025-02-25/justfilefly.io/cdn-2025-02-25/machines.txtfly.io/cdn-2025-02-25/regions.txtfly.io/cdn-2025-02-25/sizes.txtfly.io/cdn-2025-12-06/.envrcfly.io/cdn-2025-12-06/fly.tomlfly.io/cdn-2025-12-06/machines.txtfly.io/cdn-2025-12-06/regions.txtjust/fly.just
💤 Files with no reviewable changes (6)
- fly.io/cdn-2025-02-25/.envrc
- fly.io/cdn-2025-02-25/sizes.txt
- fly.io/cdn-2025-02-25/fly.toml
- fly.io/cdn-2025-02-25/regions.txt
- fly.io/cdn-2025-02-25/justfile
- fly.io/cdn-2025-02-25/machines.txt
🚧 Files skipped from review as they are similar to previous changes (3)
- README.md
- fly.io/cdn-2025-12-06/machines.txt
- fly.io/cdn-2025-12-06/.envrc
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-30T17:03:18.040Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/fly.just:94-98
Timestamp: 2025-12-30T17:03:18.040Z
Learning: In Fly.io deployments, create volumes only in the primary region and do not duplicate volume creation across multiple regions. Do not rely on regional replication for volumes. If you need global configuration, set environment variables (e.g., VARNISH_FILE_SIZE) once in a global scope so they apply to all machines, rather than per-region or per-instance.
Applied to files:
just/fly.just
📚 Learning: 2025-12-30T13:45:37.201Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: just/hurl.just:13-17
Timestamp: 2025-12-30T13:45:37.201Z
Learning: In Pipely's Justfiles, avoid defining color variables (GREEN, NORMAL, MAGENTA, BOLD, WHITE, CYAN, YELLOW, BLUE). Rely on Just's default color handling so output remains uncolored when colors aren’t defined. Do not hardcode empty strings or override defaults; let the tool manage color defaults for consistent behavior across Justfiles.
Applied to files:
just/fly.just
📚 Learning: 2025-12-30T13:54:21.688Z
Learnt from: gerhard
Repo: thechangelog/pipely PR: 44
File: fly.io/cdn-2025-02-05/regions.txt:33-33
Timestamp: 2025-12-30T13:54:21.688Z
Learning: Preserve the exact verbatim content of CLI tool outputs (e.g., regions.txt) as produced by the tool. Do not normalize spelling, spacing, or formatting. Treat these files as canonical records of command output; document the source command (and timestamp, if relevant) in a brief header or comment if possible. This guideline applies to any file named regions.txt across the repository.
Applied to files:
fly.io/cdn-2025-12-06/regions.txt
🪛 LanguageTool
fly.io/cdn-2025-12-06/regions.txt
[uncategorized] ~28-~28: Did you mean “São Paulo” (= city in Brazil)?
Context: ... South America Sao Paulo, Brazil gru 1707
(SAO_PAOLO)
🔇 Additional comments (7)
fly.io/cdn-2025-12-06/regions.txt (1)
1-29: Clarify the negative capacity values in the regions table.The file shows several regions with negative CAPACITY values (Tokyo: -252, Ashburn: -1310, Secaucus: -56). This is unusual and warrants clarification: do these represent reserved or over-committed capacity, data from a constrained time window, or a configuration issue?
Please confirm that these negative values are intentional and correctly reflect the current deployment state, especially given the PR's objective to scale down to the four primary regions.
fly.io/cdn-2025-12-06/fly.toml (4)
13-15: Verifyforce_https = falseis intentional for a CDN service.For a CDN, forcing HTTPS is typically a security best practice. Setting
force_https = falseallows unencrypted HTTP traffic. Please confirm this is intentional and not a configuration oversight.If this setting is meant to allow HTTP traffic for specific compatibility reasons, consider documenting the rationale in a comment. Otherwise, consider changing to:
🔎 Suggested change to enforce HTTPS
[http_service] internal_port = 9000 - force_https = false + force_https = true
20-24: LGTM! Concurrency limits are well-documented.The concurrency configuration appropriately aligns with Varnish's thread pool settings, and the comment clearly explains the rationale for the chosen values.
26-31: LGTM! Health check configuration is appropriate.The health check parameters are well-balanced for quick failure detection without being overly aggressive. The migration to
[[http_service.checks]]correctly removes the explicit protocol field since it's implicit in the http_service context.
33-35: LGTM! Mount configuration is correct.The Varnish cache mount destination is properly configured for persistent file storage, which aligns with the PR's goal of enabling Varnish cache file storage to reduce memory usage.
just/fly.just (2)
17-17: Improved visibility for machine operations.Removing the
@prefix makes thejust machinescommand visible during execution, improving observability for these ad-hoc scaling operations.
77-77: Consistent visibility improvement.This change mirrors the one at line 17, ensuring consistent command visibility across both
scaleandrestartrecipes.
This started as https://community.fly.io/t/response-body-timing-out-in-iad-ewr-all-good-in-13-other-regions/26727/3?u=gerhard and implements mayailrus' suggestion re `http_service` + `services` incorrect combo. Before this change (all instances were showing the same config): flyctl machine status d899761c20de38 -a cdn-2025-02-25 --display-config | jq .services [ { "protocol": "tcp", "ports": [ { "port": 80, "handlers": [ "http" ], "http_options": { "idle_timeout": 60 } }, { "port": 443, "handlers": [ "http", "tls" ], "http_options": { "idle_timeout": 60 } } ], "force_instance_key": null }, { "protocol": "tcp", "internal_port": 9000, "ports": [ { "port": 443, "handlers": [ "tls", "http" ] }, { "port": 80, "handlers": [ "http" ] } ], "checks": [ { "type": "http", "interval": "5s", "timeout": "3s", "grace_period": "5s", "method": "get", "path": "/health", "protocol": "http" } ], "concurrency": { "type": "requests", "hard_limit": 2700, "soft_limit": 2000 }, "force_instance_key": null } ] After this change: [ { "protocol": "tcp", "internal_port": 9000, "ports": [ { "port": 80, "handlers": [ "http" ], "http_options": { "idle_timeout": 60 } }, { "port": 443, "handlers": [ "http", "tls" ], "http_options": { "idle_timeout": 60 } } ], "checks": [ { "type": "http", "interval": "5s", "timeout": "3s", "grace_period": "5s", "method": "get", "path": "/health" } ], "concurrency": { "type": "requests", "hard_limit": 2700, "soft_limit": 2000 }, "force_instance_key": null } ] FWIW, even though this is a good & necessary change, the issue that we were observing fixed itself BEFORE this change was applied 🤷 https://github.com/thechangelog/pipely/actions/runs/20672953558/job/59356750823 As we rolled out this manifest change, we also scaled down instance memory as Varnish sips memory after introducing cache file storage Busiest HOT instance had a short peak to 3.2GB, but otherwise HOT instances stay within the 1GB limit. Also, make the `just scale` command operate on **both** HOT & COLD instance so that sizing will be deterministic. While this will result in unnecessary restarts for instances, it ensures that ALL instances will be in the expected state following a right-sizing operation. As this is meant to be run ad-hoc, typically once or twice during the lifespan of a deployment (~6-9 months), the extra instance churn is OK. Lastly, `cdn-2025-02-25` is scaled down to the 4 primary regions so that it serves the last clients that did not pick up the DNS change. The instance will be deleted in the next few days, this change is the first step towards that state. Signed-off-by: Gerhard Lazu <[email protected]>
3577c27 to
9081a6f
Compare
This started as
https://community.fly.io/t/response-body-timing-out-in-iad-ewr-all-good-in-13-other-regions/26727/3?u=gerhard and implements mayailrus' suggestion re
http_service+servicesincorrect combo.Before this change (all instances were showing the same config):
After this change:
FWIW, even though this is a good & necessary change, the issue that we were observing fixed itself BEFORE this change was applied 🤷 https://github.com/thechangelog/pipely/actions/runs/20672953558/job/59356750823
As we rolled out this manifest change, we also scaled down instance memory as Varnish sips memory after introducing cache file storage #44. Busiest HOT instance had a short peak to 3.2GB, but otherwise HOT instances stay within the 1GB limit.
Also, make the
just scalecommand operate on both HOT & COLD instance so that sizing will be deterministic. While this will result in unnecessary restarts for instances, it ensures that ALL instances will be in the expected state following a right-sizing operation. As this is meant to be run ad-hoc, typically once or twice during the lifespan of a deployment (~6-9 months), the extra instance churn is OK.Lastly,
cdn-2025-02-25is scaled down to the 4 primary regions so that it serves the last clients that did not pick up the DNS change. The instance will be deleted in the next few days, this change is the first step towards that state.Summary by CodeRabbit
New Features
Infrastructure & Deployment
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.