-
Notifications
You must be signed in to change notification settings - Fork 12.7k
chore(CI): enable multi arch docker image build #37353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like this PR is ready to merge! 🎉 |
|
WalkthroughAdds multi-architecture Docker build and publish support (amd64, arm64) to CI: build actions now accept an Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant BD as build-docker (per-arch)
participant Buildx as buildx bake
participant Store as Actions Artifact Store
participant Pub as build-gh-docker-publish
participant Imagetools as docker buildx imagetools / skopeo
participant Registry as GHCR/DockerHub
GH->>BD: start job with matrix.arch (amd64, arm64)
BD->>Buildx: run `docker buildx bake` (arch=<arch>) -> generate metadata.json + digest
BD->>Store: upload digest artifact (per-arch)
par After all arch jobs finish
GH->>Pub: trigger publish job
Pub->>Store: download all per-arch digests
Pub->>Imagetools: assemble multi-arch manifest from digests
Imagetools->>Registry: push combined manifest & tags
Registry-->>Pub: confirm publish
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
935c975 to
b4c1a20
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #37353 +/- ##
========================================
Coverage 67.07% 67.08%
========================================
Files 3419 3419
Lines 117938 117938
Branches 21578 21574 -4
========================================
+ Hits 79111 79113 +2
- Misses 36134 36140 +6
+ Partials 2693 2685 -8
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
1602240 to
454e920
Compare
454e920 to
140d6d8
Compare
140d6d8 to
bf8721e
Compare
bf8721e to
e85a139
Compare
70b10fe to
34beeda
Compare
e85a139 to
6057fac
Compare
f4b8428 to
77f98f5
Compare
3464c6e to
0cadc31
Compare
95510ab to
0d2bc99
Compare
834beb8 to
e429ae8
Compare
0d2bc99 to
a3ad0b9
Compare
a3ad0b9 to
8096358
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
360-364: Fragile string manipulation for rocketchat-cov image name.Lines 360–364 use a string-based approach to determine the image name:
if [ "$service" == "rocketchat-cov" ]; then IMAGE=$(docker compose ... | jq ... "rocketchat") ... -cov else IMAGE=$(docker compose ... "service") fiThis assumes the directory name
rocketchat-covexactly maps to servicerocketchatwith a-covsuffix applied to the image. If the service name ever changes or if other services need similar coverage variants, this logic becomes brittle.Consider passing or deriving the service name and coverage flag more explicitly (e.g., via environment variables or structured metadata) rather than relying on directory name parsing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
.github/actions/build-docker/action.yml(4 hunks).github/actions/meteor-build/action.yml(1 hunks).github/actions/setup-node/action.yml(1 hunks).github/workflows/ci.yml(10 hunks)docker-compose-ci.yml(8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: CodeQL-Build
- GitHub Check: CodeQL-Build
🔇 Additional comments (12)
.github/actions/meteor-build/action.yml (1)
132-132: Approve artifact action upgrade to v4.The download-artifact action was upgraded from an earlier version to v6, consistent with the broader artifact handling improvements across the PR for multi-arch support.
.github/actions/setup-node/action.yml (1)
49-49: LGTM: Cache key now includes OS and architecture.Adding
runner.osandrunner.archto the cache key is essential for multi-arch builds to prevent cross-arch cache hits. This ensures ARM64 runners won't use AMD64-cached artifacts and vice versa.docker-compose-ci.yml (1)
5-5: Potential platform specification conflict between runtime constraint and bake config.Line 5 declares
platform: linux/amd64(runtime execution constraint), while lines 9–12 declare bothlinux/amd64andlinux/arm64in thex-bakeblock. The runtimeplatformattribute may override the bake configuration during docker compose operations. During buildx bake builds, the--set *.platform=linux/${{ inputs.arch }}override in build-docker action should take precedence, but this could cause confusion during local testing or non-bake builds.Please verify that the runtime
platformconstraints don't interfere with buildx bake builds by confirming:
- buildx bake respects
--set *.platformoverrides over docker-composeplatformattributes- Local docker compose up operations (if used outside CI) still respect the runtime platform constraint
Also applies to: 9-12
.github/actions/build-docker/action.yml (3)
16-19: LGTM: New arch input for per-architecture builds.The arch input with default 'amd64' is properly scoped for building specific architectures. This integrates well with the per-arch matrix in ci.yml.
78-91: LGTM: Docker daemon and buildx setup for multi-arch support.Docker setup with containerd-snapshotter enabled and buildx with garbage collection and storage limits is appropriate. The configuration enables efficient multi-arch builds with resource bounds.
106-106: Verify image name extraction and digest parsing robustness.Lines 106 and 126–128 rely on:
docker compose configoutput to extract the correct image name/tmp/meta.jsoncontaining a nested key structure[service][containerimage.digest]These operations are brittle if the docker-compose-ci.yml format changes or if buildx outputs differ from expectations. Additionally,
IMAGE_NO_TAGsed substitution (line 127) removes everything after the first colon, which assumes image names followrepo/name:tagformat without spaces or special characters.Can you confirm:
- What buildx version is used, and whether
containerimage.digestis the correct key for extracting the digest from/tmp/meta.json?- Whether the
IMAGEextraction from docker-compose-ci.yml always returns a valid image reference in the formatrepo/name:tag?Alternatively, add error handling or validation after extraction.
Also applies to: 126-128
.github/workflows/ci.yml (6)
270-270: Verify ARM64 runner availability and matrix coverage.Line 270 assumes
ubuntu-24.04-armis available as a runner for ARM64 builds. GitHub Actions doesn't provide native ARM64 runners with this naming convention; this appears to be a custom/self-hosted runner configuration. Additionally, the matrix at line 279 includes both amd64 and arm64, but only line 270 conditionally selects the arm64 runner.Confirm:
- Custom self-hosted runners named
ubuntu-24.04-armare registered in the repository settings- These runners are appropriately labeled and have sufficient resources for Docker builds
- Consider documenting this configuration requirement in the repository
Also applies to: 279-279, 285-287
273-273: DOCKER_TAG inconsistency between build and publish jobs.Line 273 appends
matrix.archtoDOCKER_TAG:
DOCKER_TAG: ${{ needs.release-versions.outputs.gh-docker-tag }}-${{ matrix.arch }}However, line 313 in the publish job resets it without arch:
DOCKER_TAG: ${{ needs.release-versions.outputs.gh-docker-tag }}This means:
- Per-arch images are tagged as
<base-tag>-amd64and<base-tag>-arm64- The manifest is tagged as
<base-tag>(without arch)- Test jobs receive the arch-less tag but will run against per-arch images via the manifest
Clarify whether this is intentional (multi-arch manifests with per-arch suffixes) or if per-arch tags should be removed after manifest creation to avoid confusion.
Also applies to: 313-313
335-339: Verify artifact merge behavior for digest collection.Line 335 downloads artifacts matching pattern
digests-*withmerge-multiple: true. This will collect digests from all per-arch, per-service, per-type builds. However, the path structure from build-docker (line 125) is/tmp/digests/${{ inputs.service }}${{ inputs.type == 'coverage' && '-cov' || '' }}/${{ inputs.arch }}/digest.txt.Confirm that the artifact names from line 133 (
digests-${{ inputs.service }}-${{ inputs.arch }}-${{ inputs.type }}) properly reconstruct the nested directory structure when merged, and that the manifest creation loop (lines 347–374) correctly traverses the merged directory layout.
714-718: LGTM: skopeo logic handles tag propagation correctly.The skopeo copy command with
--allflag and conditional--additional-tagarguments correctly handles multi-tag publishing. The logic ensures the first tag is primary and others are applied as additional tags. Array initialization and loop structure avoid edge cases.
410-410: LGTM: Test job dependencies updated to publish job.All test jobs (test-api, test-ui, test-api-ee, test-ui-ee, test-ui-ee-watcher) now depend on
build-gh-docker-publishinstead ofbuild-gh-docker, ensuring per-arch builds and manifest aggregation complete before testing begins. This is the correct dependency model for multi-arch workflows.Also applies to: 426-426, 451-451, 470-470, 497-497
565-565: LGTM: Deploy job updated to depend on publish job.The deploy job now correctly depends on
build-gh-docker-publish, ensuring Docker manifests are published before asset deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
.github/workflows/ci.yml (3)
343-375: Add error handling and validation to the manifest creation script.The shell script that creates multi-arch manifests lacks robustness:
- No validation of digest format: Lines 352–355 read digest files without validating their content or handling malformed files.
- Single point of failure: If
docker buildx imagetools createfails for one service (line 368), the entire script exits, preventing other services' manifests from being created.- Silent success on empty digests: If the download found no artifacts or the directory structure is unexpected, the loop iterates zero times without warning.
Consider adding:
- Validation that
/tmp/digestscontains expected service directories- Error handling per service with logging (continue on individual failures, aggregate results)
- Digest file format validation before attempting manifest creation
- Summary output showing which services succeeded/failed
Example improvement:
set -o xtrace shopt -s nullglob FAILED_SERVICES=() for service_dir in /tmp/digests/*; do [[ -d "$service_dir" ]] || continue service="$(basename "$service_dir")" echo "Creating manifest for $service" mapfile -t refs < <( find "$service_dir" -type f -name 'digest.txt' -print0 \ | xargs -0 -I{} sh -c "tr -d '\r' < '{}' | sed '/^[[:space:]]*$/d'" ) if [[ ${#refs[@]} -eq 0 ]]; then echo "WARNING: No digests found for $service" FAILED_SERVICES+=("$service") continue fi # ... rest of logic ... if ! docker buildx imagetools create ...; then echo "ERROR: Failed to create manifest for $service" FAILED_SERVICES+=("$service") fi done if [[ ${#FAILED_SERVICES[@]} -gt 0 ]]; then echo "Failed services: ${FAILED_SERVICES[*]}" exit 1 fi
359-365: Consider centralizing service-to-image name mapping for maintainability.The special case for
rocketchat-covat line 360–365 handles the image name discrepancy (service name ≠ image name), but this logic is:
- Fragile: If new services with similar naming conventions are added, the conditional logic must be updated.
- Fragile: Depends on
docker-compose-ci.ymlformat; no error handling if the jq query fails or service is not found.- Hard to extend: Future maintainers may not realize they need to add similar special cases.
Consider extracting the service-to-image mapping into a dedicated data structure or function, or documenting the mapping rule clearly as a comment. For example:
# Service to image name mapping (e.g., 'rocketchat' → 'rocket.chat') # Special handling: 'rocketchat-cov' → 'rocket.chat-cov' (coverage image suffix) if [ "$service" == "rocketchat-cov" ]; then base_service="rocketchat" image_suffix="-cov" else base_service="$service" image_suffix="" fi IMAGE=$(docker compose -f docker-compose-ci.yml config --format json 2>/dev/null | \ jq -r --arg s "$base_service" '.services[$s].image') || { echo "ERROR: Failed to resolve image for service $service" exit 1 } IMAGE="${IMAGE}${image_suffix}"
662-722: Add error handling and validation to the DockerHub publish step.The
docker-image-publishjob's refactor to useskopeo copy --allis sound for multi-arch images, but lacks resilience:
- No pre-flight validation: The job assumes the source image in GHCR exists (from
build-gh-docker-publish), but doesn't verify it before attempting the copy.- No retry logic: If
skopeo copyfails transiently, the entire publish fails without retry.- No error recovery: If the copy partially fails (e.g., one tag succeeds, another fails), there's no rollback or partial success tracking.
Consider adding:
# Validate source image exists if ! docker pull "${SRC}" 2>/dev/null; then echo "ERROR: Source image not found in GHCR: ${SRC}" exit 1 fi # Retry logic with backoff retry_count=0 max_retries=3 until skopeo copy --all \ "docker://${SRC}" \ "docker://${DEST_REPO}:${PRIMARY}" \ "${EXTRA_ARGS[@]}"; do retry_count=$((retry_count + 1)) if [[ $retry_count -ge $max_retries ]]; then echo "ERROR: Failed to copy image after $max_retries attempts" exit 1 fi echo "Retry $retry_count/$max_retries..." sleep $((2 ** retry_count)) done
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Jira integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
.github/actions/setup-node/action.yml(1 hunks).github/workflows/ci.yml(10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/actions/setup-node/action.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: 🚢 Build Docker (amd64, rocketchat, coverage)
- GitHub Check: 🚢 Build Docker (arm64, rocketchat, production)
- GitHub Check: 🚢 Build Docker (arm64, presence-service, production)
- GitHub Check: 🚢 Build Docker (arm64, account-service, production)
- GitHub Check: 🚢 Build Docker (arm64, ddp-streamer-service, production)
- GitHub Check: 🚢 Build Docker (arm64, queue-worker-service, production)
- GitHub Check: 🚢 Build Docker (amd64, stream-hub-service, production)
- GitHub Check: 🚢 Build Docker (amd64, authorization-service, production)
- GitHub Check: 🚢 Build Docker (amd64, rocketchat, production)
- GitHub Check: 🔨 Test Storybook / Test Storybook
- GitHub Check: 🔨 Test Unit / Unit Tests
- GitHub Check: 🔎 Code Check / Code Lint
- GitHub Check: 🔎 Code Check / TypeScript
- GitHub Check: CodeQL-Build
- GitHub Check: CodeQL-Build
🔇 Additional comments (3)
.github/workflows/ci.yml (3)
270-270: Verify the arm64 runner is configured in this GitHub organization.The conditional runner selection assumes a
ubuntu-24.04-armrunner label exists, but there's no documentation or validation. If this runner is missing, the job will fall back to default runners, potentially breaking the arm64 build silently.Please confirm that GitHub Actions is configured with a runner explicitly labeled
ubuntu-24.04-arm(or verify the exact label used), or add inline documentation referencing the runner configuration.
333-339: Verify that thebuild-dockeraction uploads digest artifacts matching the pattern.The download step expects artifacts with pattern
digests-*, but there's no visible artifact upload step in thebuild-gh-dockerjob's invocation of./.github/actions/build-docker. If the action doesn't upload digests, this download will fail or succeed silently with no digests, causing the manifest creation to fail.Please verify:
- The
./.github/actions/build-dockeraction uploads digests with the naming patterndigests-{service}ordigests-*.- Confirm the artifact upload uses an appropriate retention policy (currently the build job uploads artifacts with 5-day retention).
410-410: Dependency chain correctly serialized for multi-arch manifest aggregation.All test jobs now depend on
build-gh-docker-publishinstead ofbuild-gh-docker, ensuring tests run against the finalized multi-arch manifests rather than individual per-arch images. This is the correct approach; the tests will use thegh-docker-tagwithout the arch suffix (e.g.,pr-12345instead ofpr-12345-amd64), pulling whichever platform is appropriate for the test runner.Also applies to: 426-426, 451-451, 470-470, 497-497
https://rocketchat.atlassian.net/browse/ARCH-1853
Depends on #37352
Replaces #37304
Proposed changes (including videos or screenshots)
Issue(s)
Steps to test or reproduce
Further comments
Summary by CodeRabbit