Skip to content

Conversation

@alliepiper
Copy link
Contributor

The cuda-python team offered to let us use their hw while we wait for our much larger order to arrive. Adding minimal CUB coverage to ensure that our blackwell implementations don't regress.

@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Dec 2, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-project-automation github-project-automation bot moved this to Todo in CCCL Dec 2, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Dec 2, 2025
@alliepiper
Copy link
Contributor Author

/ok to test

@github-project-automation github-project-automation bot moved this from In Progress to In Review in CCCL Dec 2, 2025
@github-actions

This comment has been minimized.

@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Dec 2, 2025

I see a CI test failure in:

91/123 Test #336: cub.test.device.segmented_scan_api.lid_0 ......................***Failed    0.41 sec
  -- >> Running:
  	/home/coder/cccl/build/cuda13.0-gcc14/cub/bin/cub.test.device.segmented_scan_api.lid_0
  /home/coder/cccl/lib/cmake/libcudacxx/../../../libcudacxx/include/cuda/__cmath/fast_modulo_division.h:112: operator/: block: [0,0,0], thread: [0,0,0] Assertion `dividend must be non-negative` failed.

Funny, it's an API test. The test was added recently in #6022.

Also, the test runtime here seems a bit excessive. Is this normal?

          Start 455: cub.test.device.radix_sort_decomposer_fail.lid_0
  119/123 Test #455: cub.test.device.radix_sort_decomposer_fail.lid_0 ..............   Passed  1258.18 sec

@alliepiper
Copy link
Contributor Author

alliepiper commented Dec 2, 2025

@oleksandr-pavlyk can you take a look at the segmented scan issue? This is failing on RTX PRO 6000 (sm120).

Also, the test runtime here seems a bit excessive. Is this normal? cub.test.device.radix_sort_decomposer_fail.lid_0 1258.18 sec

Those _fail tests invoke compilers internally and check that compilation fails with a specific error. That does seem excessive, even for the less-powerful CPUs on the GPU runners. Looking at results on other runners from the nightlies, it takes about half as long on RTXA6000 + H100

H100: https://github.com/NVIDIA/cccl/actions/runs/19846009437/job/56871704747
A6000: https://github.com/NVIDIA/cccl/actions/runs/19846009437/job/56871704740

It's a very simple TU, might be worth seeing if we can trigger the failure sooner.

@bernhardmgruber
Copy link
Contributor

I opened #6845 for the long test time.

@oleksandr-pavlyk
Copy link
Contributor

I opened #6868 to fix assertions failing the segmented scan API test.

@alliepiper
Copy link
Contributor Author

/ok to test

@github-actions

This comment has been minimized.

The cuda-python team offered to let us use their hw while we wait for our much larger order to arrive. Adding minimal CUB coverage to ensure that our blackwell implementations don't regress.
@alliepiper alliepiper marked this pull request as ready for review December 4, 2025 19:01
@alliepiper alliepiper requested a review from a team as a code owner December 4, 2025 19:01
@alliepiper alliepiper requested a review from jrhemstad December 4, 2025 19:01
@alliepiper alliepiper enabled auto-merge (squash) December 4, 2025 19:01
@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

🥳 CI Workflow Results

🟩 Finished in 12h 45m: Pass: 100%/270 | Total: 3d 18h | Max: 4h 15m | Hits: 97%/374695

See results here.

@alliepiper alliepiper merged commit afbd94d into NVIDIA:main Dec 5, 2025
637 of 642 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Dec 5, 2025
@bernhardmgruber
Copy link
Contributor

Thank you @alliepiper and @kkraus14 for making that happen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants