Skip to content

Conversation

@rodrigok
Copy link
Member

@rodrigok rodrigok commented Oct 31, 2025

https://rocketchat.atlassian.net/browse/ARCH-1853

Depends on #37352

Replaces #37304

Proposed changes (including videos or screenshots)

Issue(s)

Steps to test or reproduce

Further comments

Summary by CodeRabbit

  • Chores
    • Added explicit architecture input and broad multi-architecture build support (amd64, arm64).
    • Reworked Docker build/publish flow to use modern multi-arch build tooling, produce per-arch image digests, and upload digest artifacts.
    • Upgraded artifact restore/download steps and build tooling to newer versions.
    • Improved caching and cache keys to account for runner OS/architecture and enhanced build metadata/logging.

@dionisio-bot
Copy link
Contributor

dionisio-bot bot commented Oct 31, 2025

Looks like this PR is ready to merge! 🎉
If you have any trouble, please check the PR guidelines

@changeset-bot
Copy link

changeset-bot bot commented Oct 31, 2025

⚠️ No Changeset found

Latest commit: bfcbd92

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

Walkthrough

Adds multi-architecture Docker build and publish support (amd64, arm64) to CI: build actions now accept an arch input and use buildx bake with metadata/digest outputs; artifact downloads upgraded; a separate publish job aggregates per-arch digests to create multi-arch manifests and push images.

Changes

Cohort / File(s) Summary
Build action updates
.github/actions/build-docker/action.yml, .github/actions/meteor-build/action.yml, .github/actions/setup-node/action.yml
Added public arch input (default amd64) to build-docker; switched artifact downloads to actions/download-artifact@v6; expanded cache key to include runner.os and runner.arch; build-docker now uses docker/setup-docker-action@v4, docker/setup-buildx-action@v3, and docker buildx bake producing metadata JSON and per-arch digest artifacts.
CI workflow and publish orchestration
.github/workflows/ci.yml
Introduced matrix.arch (amd64, arm64) and per-arch runs; passes arch into build action; added build-gh-docker-publish job that collects per-arch digests and uses docker buildx imagetools/skopeo flow to assemble and push multi-arch manifests; reorganized job dependencies to route image publishing through the new job.
Compose bake config
docker-compose-ci.yml
Added x-bake definitions with platforms: linux/amd64, linux/arm64 to service build sections to enable cross-platform bake builds.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant BD as build-docker (per-arch)
    participant Buildx as buildx bake
    participant Store as Actions Artifact Store
    participant Pub as build-gh-docker-publish
    participant Imagetools as docker buildx imagetools / skopeo
    participant Registry as GHCR/DockerHub

    GH->>BD: start job with matrix.arch (amd64, arm64)
    BD->>Buildx: run `docker buildx bake` (arch=<arch>) -> generate metadata.json + digest
    BD->>Store: upload digest artifact (per-arch)
    
    par After all arch jobs finish
      GH->>Pub: trigger publish job
      Pub->>Store: download all per-arch digests
      Pub->>Imagetools: assemble multi-arch manifest from digests
      Imagetools->>Registry: push combined manifest & tags
      Registry-->>Pub: confirm publish
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to focus:
    • .github/workflows/ci.yml: new build-gh-docker-publish job, job dependency rewire, and matrix/runner changes.
    • .github/actions/build-docker/action.yml: buildx bake invocation, metadata/digest generation, and publish vs load logic.
    • Artifact handling and tag/manifest creation steps (digest upload/download paths and naming).

Possibly related PRs

Suggested labels

stat: ready to merge

Suggested reviewers

  • sampaiodiego

Poem

🐰 I hopped through CI fields today,

amd64 and arm64 in play,
I baked the builds, then tied the seams,
digests danced into multi-arch dreams,
Hooray — the registry now knows both ways! 🥕

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main objective: enabling multi-architecture Docker image builds in CI, which aligns with all the substantial changes across workflow and action files.
Linked Issues check ✅ Passed The changeset implements multi-architecture Docker builds (amd64 and arm64) across CI workflows and actions, directly addressing the ARM Github Docker Images requirement.
Out of Scope Changes check ✅ Passed All changes are focused on CI/CD infrastructure to support multi-architecture builds; no unrelated code modifications are present.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/multi-arch-images

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from 935c975 to b4c1a20 Compare October 31, 2025 01:15
@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.08%. Comparing base (64a4e8b) to head (bfcbd92).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop   #37353   +/-   ##
========================================
  Coverage    67.07%   67.08%           
========================================
  Files         3419     3419           
  Lines       117938   117938           
  Branches     21578    21574    -4     
========================================
+ Hits         79111    79113    +2     
- Misses       36134    36140    +6     
+ Partials      2693     2685    -8     
Flag Coverage Δ
e2e 57.49% <ø> (+<0.01%) ⬆️
unit 72.15% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rodrigok rodrigok force-pushed the chore/multi-arch-images branch 2 times, most recently from 1602240 to 454e920 Compare October 31, 2025 02:57
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from 454e920 to 140d6d8 Compare October 31, 2025 03:14
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from 140d6d8 to bf8721e Compare October 31, 2025 11:12
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from bf8721e to e85a139 Compare October 31, 2025 11:17
@rodrigok rodrigok force-pushed the improve/ci branch 10 times, most recently from 70b10fe to 34beeda Compare October 31, 2025 19:45
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from e85a139 to 6057fac Compare October 31, 2025 20:08
@rodrigok rodrigok force-pushed the improve/ci branch 2 times, most recently from f4b8428 to 77f98f5 Compare October 31, 2025 20:21
@rodrigok rodrigok force-pushed the improve/ci branch 2 times, most recently from 3464c6e to 0cadc31 Compare October 31, 2025 21:55
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from 95510ab to 0d2bc99 Compare November 3, 2025 19:19
@rodrigok rodrigok force-pushed the improve/ci branch 3 times, most recently from 834beb8 to e429ae8 Compare November 4, 2025 22:26
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from 0d2bc99 to a3ad0b9 Compare November 4, 2025 22:27
Base automatically changed from improve/ci to develop November 5, 2025 13:20
@rodrigok rodrigok force-pushed the chore/multi-arch-images branch from a3ad0b9 to 8096358 Compare November 5, 2025 13:31
@rodrigok rodrigok marked this pull request as ready for review November 5, 2025 13:34
@rodrigok rodrigok requested a review from a team as a code owner November 5, 2025 13:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)

360-364: Fragile string manipulation for rocketchat-cov image name.

Lines 360–364 use a string-based approach to determine the image name:

if [ "$service" == "rocketchat-cov" ]; then
  IMAGE=$(docker compose ... | jq ... "rocketchat") ... -cov
else
  IMAGE=$(docker compose ... "service")
fi

This assumes the directory name rocketchat-cov exactly maps to service rocketchat with a -cov suffix applied to the image. If the service name ever changes or if other services need similar coverage variants, this logic becomes brittle.

Consider passing or deriving the service name and coverage flag more explicitly (e.g., via environment variables or structured metadata) rather than relying on directory name parsing.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f488026 and 785032c.

📒 Files selected for processing (5)
  • .github/actions/build-docker/action.yml (4 hunks)
  • .github/actions/meteor-build/action.yml (1 hunks)
  • .github/actions/setup-node/action.yml (1 hunks)
  • .github/workflows/ci.yml (10 hunks)
  • docker-compose-ci.yml (8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: CodeQL-Build
  • GitHub Check: CodeQL-Build
🔇 Additional comments (12)
.github/actions/meteor-build/action.yml (1)

132-132: Approve artifact action upgrade to v4.

The download-artifact action was upgraded from an earlier version to v6, consistent with the broader artifact handling improvements across the PR for multi-arch support.

.github/actions/setup-node/action.yml (1)

49-49: LGTM: Cache key now includes OS and architecture.

Adding runner.os and runner.arch to the cache key is essential for multi-arch builds to prevent cross-arch cache hits. This ensures ARM64 runners won't use AMD64-cached artifacts and vice versa.

docker-compose-ci.yml (1)

5-5: Potential platform specification conflict between runtime constraint and bake config.

Line 5 declares platform: linux/amd64 (runtime execution constraint), while lines 9–12 declare both linux/amd64 and linux/arm64 in the x-bake block. The runtime platform attribute may override the bake configuration during docker compose operations. During buildx bake builds, the --set *.platform=linux/${{ inputs.arch }} override in build-docker action should take precedence, but this could cause confusion during local testing or non-bake builds.

Please verify that the runtime platform constraints don't interfere with buildx bake builds by confirming:

  1. buildx bake respects --set *.platform overrides over docker-compose platform attributes
  2. Local docker compose up operations (if used outside CI) still respect the runtime platform constraint

Also applies to: 9-12

.github/actions/build-docker/action.yml (3)

16-19: LGTM: New arch input for per-architecture builds.

The arch input with default 'amd64' is properly scoped for building specific architectures. This integrates well with the per-arch matrix in ci.yml.


78-91: LGTM: Docker daemon and buildx setup for multi-arch support.

Docker setup with containerd-snapshotter enabled and buildx with garbage collection and storage limits is appropriate. The configuration enables efficient multi-arch builds with resource bounds.


106-106: Verify image name extraction and digest parsing robustness.

Lines 106 and 126–128 rely on:

  1. docker compose config output to extract the correct image name
  2. /tmp/meta.json containing a nested key structure [service][containerimage.digest]

These operations are brittle if the docker-compose-ci.yml format changes or if buildx outputs differ from expectations. Additionally, IMAGE_NO_TAG sed substitution (line 127) removes everything after the first colon, which assumes image names follow repo/name:tag format without spaces or special characters.

Can you confirm:

  1. What buildx version is used, and whether containerimage.digest is the correct key for extracting the digest from /tmp/meta.json?
  2. Whether the IMAGE extraction from docker-compose-ci.yml always returns a valid image reference in the format repo/name:tag?

Alternatively, add error handling or validation after extraction.

Also applies to: 126-128

.github/workflows/ci.yml (6)

270-270: Verify ARM64 runner availability and matrix coverage.

Line 270 assumes ubuntu-24.04-arm is available as a runner for ARM64 builds. GitHub Actions doesn't provide native ARM64 runners with this naming convention; this appears to be a custom/self-hosted runner configuration. Additionally, the matrix at line 279 includes both amd64 and arm64, but only line 270 conditionally selects the arm64 runner.

Confirm:

  1. Custom self-hosted runners named ubuntu-24.04-arm are registered in the repository settings
  2. These runners are appropriately labeled and have sufficient resources for Docker builds
  3. Consider documenting this configuration requirement in the repository

Also applies to: 279-279, 285-287


273-273: DOCKER_TAG inconsistency between build and publish jobs.

Line 273 appends matrix.arch to DOCKER_TAG:
DOCKER_TAG: ${{ needs.release-versions.outputs.gh-docker-tag }}-${{ matrix.arch }}

However, line 313 in the publish job resets it without arch:
DOCKER_TAG: ${{ needs.release-versions.outputs.gh-docker-tag }}

This means:

  • Per-arch images are tagged as <base-tag>-amd64 and <base-tag>-arm64
  • The manifest is tagged as <base-tag> (without arch)
  • Test jobs receive the arch-less tag but will run against per-arch images via the manifest

Clarify whether this is intentional (multi-arch manifests with per-arch suffixes) or if per-arch tags should be removed after manifest creation to avoid confusion.

Also applies to: 313-313


335-339: Verify artifact merge behavior for digest collection.

Line 335 downloads artifacts matching pattern digests-* with merge-multiple: true. This will collect digests from all per-arch, per-service, per-type builds. However, the path structure from build-docker (line 125) is /tmp/digests/${{ inputs.service }}${{ inputs.type == 'coverage' && '-cov' || '' }}/${{ inputs.arch }}/digest.txt.

Confirm that the artifact names from line 133 (digests-${{ inputs.service }}-${{ inputs.arch }}-${{ inputs.type }}) properly reconstruct the nested directory structure when merged, and that the manifest creation loop (lines 347–374) correctly traverses the merged directory layout.


714-718: LGTM: skopeo logic handles tag propagation correctly.

The skopeo copy command with --all flag and conditional --additional-tag arguments correctly handles multi-tag publishing. The logic ensures the first tag is primary and others are applied as additional tags. Array initialization and loop structure avoid edge cases.


410-410: LGTM: Test job dependencies updated to publish job.

All test jobs (test-api, test-ui, test-api-ee, test-ui-ee, test-ui-ee-watcher) now depend on build-gh-docker-publish instead of build-gh-docker, ensuring per-arch builds and manifest aggregation complete before testing begins. This is the correct dependency model for multi-arch workflows.

Also applies to: 426-426, 451-451, 470-470, 497-497


565-565: LGTM: Deploy job updated to depend on publish job.

The deploy job now correctly depends on build-gh-docker-publish, ensuring Docker manifests are published before asset deployment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.github/workflows/ci.yml (3)

343-375: Add error handling and validation to the manifest creation script.

The shell script that creates multi-arch manifests lacks robustness:

  1. No validation of digest format: Lines 352–355 read digest files without validating their content or handling malformed files.
  2. Single point of failure: If docker buildx imagetools create fails for one service (line 368), the entire script exits, preventing other services' manifests from being created.
  3. Silent success on empty digests: If the download found no artifacts or the directory structure is unexpected, the loop iterates zero times without warning.

Consider adding:

  • Validation that /tmp/digests contains expected service directories
  • Error handling per service with logging (continue on individual failures, aggregate results)
  • Digest file format validation before attempting manifest creation
  • Summary output showing which services succeeded/failed

Example improvement:

  set -o xtrace
  shopt -s nullglob

  FAILED_SERVICES=()

  for service_dir in /tmp/digests/*; do
    [[ -d "$service_dir" ]] || continue
    service="$(basename "$service_dir")"
    echo "Creating manifest for $service"

    mapfile -t refs < <(
      find "$service_dir" -type f -name 'digest.txt' -print0 \
        | xargs -0 -I{} sh -c "tr -d '\r' < '{}' | sed '/^[[:space:]]*$/d'"
    )

    if [[ ${#refs[@]} -eq 0 ]]; then
      echo "WARNING: No digests found for $service"
      FAILED_SERVICES+=("$service")
      continue
    fi

    # ... rest of logic ...
    
    if ! docker buildx imagetools create ...; then
      echo "ERROR: Failed to create manifest for $service"
      FAILED_SERVICES+=("$service")
    fi
  done

  if [[ ${#FAILED_SERVICES[@]} -gt 0 ]]; then
    echo "Failed services: ${FAILED_SERVICES[*]}"
    exit 1
  fi

359-365: Consider centralizing service-to-image name mapping for maintainability.

The special case for rocketchat-cov at line 360–365 handles the image name discrepancy (service name ≠ image name), but this logic is:

  • Fragile: If new services with similar naming conventions are added, the conditional logic must be updated.
  • Fragile: Depends on docker-compose-ci.yml format; no error handling if the jq query fails or service is not found.
  • Hard to extend: Future maintainers may not realize they need to add similar special cases.

Consider extracting the service-to-image mapping into a dedicated data structure or function, or documenting the mapping rule clearly as a comment. For example:

  # Service to image name mapping (e.g., 'rocketchat' → 'rocket.chat')
  # Special handling: 'rocketchat-cov' → 'rocket.chat-cov' (coverage image suffix)
  if [ "$service" == "rocketchat-cov" ]; then
    base_service="rocketchat"
    image_suffix="-cov"
  else
    base_service="$service"
    image_suffix=""
  fi

  IMAGE=$(docker compose -f docker-compose-ci.yml config --format json 2>/dev/null | \
    jq -r --arg s "$base_service" '.services[$s].image') || {
    echo "ERROR: Failed to resolve image for service $service"
    exit 1
  }
  IMAGE="${IMAGE}${image_suffix}"

662-722: Add error handling and validation to the DockerHub publish step.

The docker-image-publish job's refactor to use skopeo copy --all is sound for multi-arch images, but lacks resilience:

  1. No pre-flight validation: The job assumes the source image in GHCR exists (from build-gh-docker-publish), but doesn't verify it before attempting the copy.
  2. No retry logic: If skopeo copy fails transiently, the entire publish fails without retry.
  3. No error recovery: If the copy partially fails (e.g., one tag succeeds, another fails), there's no rollback or partial success tracking.

Consider adding:

  # Validate source image exists
  if ! docker pull "${SRC}" 2>/dev/null; then
    echo "ERROR: Source image not found in GHCR: ${SRC}"
    exit 1
  fi

  # Retry logic with backoff
  retry_count=0
  max_retries=3
  until skopeo copy --all \
    "docker://${SRC}" \
    "docker://${DEST_REPO}:${PRIMARY}" \
    "${EXTRA_ARGS[@]}"; do
    retry_count=$((retry_count + 1))
    if [[ $retry_count -ge $max_retries ]]; then
      echo "ERROR: Failed to copy image after $max_retries attempts"
      exit 1
    fi
    echo "Retry $retry_count/$max_retries..."
    sleep $((2 ** retry_count))
  done
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 785032c and b7d13dc.

📒 Files selected for processing (2)
  • .github/actions/setup-node/action.yml (1 hunks)
  • .github/workflows/ci.yml (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/actions/setup-node/action.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: 🚢 Build Docker (amd64, rocketchat, coverage)
  • GitHub Check: 🚢 Build Docker (arm64, rocketchat, production)
  • GitHub Check: 🚢 Build Docker (arm64, presence-service, production)
  • GitHub Check: 🚢 Build Docker (arm64, account-service, production)
  • GitHub Check: 🚢 Build Docker (arm64, ddp-streamer-service, production)
  • GitHub Check: 🚢 Build Docker (arm64, queue-worker-service, production)
  • GitHub Check: 🚢 Build Docker (amd64, stream-hub-service, production)
  • GitHub Check: 🚢 Build Docker (amd64, authorization-service, production)
  • GitHub Check: 🚢 Build Docker (amd64, rocketchat, production)
  • GitHub Check: 🔨 Test Storybook / Test Storybook
  • GitHub Check: 🔨 Test Unit / Unit Tests
  • GitHub Check: 🔎 Code Check / Code Lint
  • GitHub Check: 🔎 Code Check / TypeScript
  • GitHub Check: CodeQL-Build
  • GitHub Check: CodeQL-Build
🔇 Additional comments (3)
.github/workflows/ci.yml (3)

270-270: Verify the arm64 runner is configured in this GitHub organization.

The conditional runner selection assumes a ubuntu-24.04-arm runner label exists, but there's no documentation or validation. If this runner is missing, the job will fall back to default runners, potentially breaking the arm64 build silently.

Please confirm that GitHub Actions is configured with a runner explicitly labeled ubuntu-24.04-arm (or verify the exact label used), or add inline documentation referencing the runner configuration.


333-339: Verify that the build-docker action uploads digest artifacts matching the pattern.

The download step expects artifacts with pattern digests-*, but there's no visible artifact upload step in the build-gh-docker job's invocation of ./.github/actions/build-docker. If the action doesn't upload digests, this download will fail or succeed silently with no digests, causing the manifest creation to fail.

Please verify:

  1. The ./.github/actions/build-docker action uploads digests with the naming pattern digests-{service} or digests-*.
  2. Confirm the artifact upload uses an appropriate retention policy (currently the build job uploads artifacts with 5-day retention).

410-410: Dependency chain correctly serialized for multi-arch manifest aggregation.

All test jobs now depend on build-gh-docker-publish instead of build-gh-docker, ensuring tests run against the finalized multi-arch manifests rather than individual per-arch images. This is the correct approach; the tests will use the gh-docker-tag without the arch suffix (e.g., pr-12345 instead of pr-12345-amd64), pulling whichever platform is appropriate for the test runner.

Also applies to: 426-426, 451-451, 470-470, 497-497

@sampaiodiego sampaiodiego added this to the 7.13.0 milestone Nov 5, 2025
@sampaiodiego sampaiodiego added the stat: QA assured Means it has been tested and approved by a company insider label Nov 5, 2025
@dionisio-bot dionisio-bot bot added the stat: ready to merge PR tested and approved waiting for merge label Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stat: QA assured Means it has been tested and approved by a company insider stat: ready to merge PR tested and approved waiting for merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants