Skip to content

Conversation

@Binyang2014
Copy link
Contributor

Fix: #654. Address correctness_test.py crash issue

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a crash issue in the correctness test script by properly initializing the distributed process group with explicit parameters and adding proper cleanup logic.

Key changes:

  • Explicit initialization of distributed process group with rank, world_size, and device_id parameters
  • Addition of proper test cleanup with barrier synchronization and process group destruction

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Binyang2014 Binyang2014 enabled auto-merge (squash) October 21, 2025 19:57
@Binyang2014 Binyang2014 merged commit 610db6f into main Oct 21, 2025
14 checks passed
@Binyang2014 Binyang2014 deleted the binyli/test-fix branch October 21, 2025 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] mscclpp can not replace nccl in torchrun cases

3 participants