Skip to content

Conversation

@ever4Kenny
Copy link

@ever4Kenny ever4Kenny commented Jan 9, 2026

What changes were proposed in this pull request?

This PR converts the static initialization of columnar shuffle class constructors
and skew shuffle method to lazy initialization using the initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.

Specifically, the following changes were made:

  1. Introduced ColumnarHashBasedShuffleWriterConstructorHolder static inner class
    to lazily initialize the constructor for ColumnarHashBasedShuffleWriter

  2. Introduced ColumnarShuffleReaderConstructorHolder static inner class to lazily
    initialize the constructor for CelebornColumnarShuffleReader

  3. Introduced CelebornSkewShuffleMethodHolder static inner class to lazily
    initialize the isCelebornSkewedShuffle method reference

  4. Modified createColumnarHashBasedShuffleWriter(), createColumnarShuffleReader(),
    and isCelebornSkewShuffleOrChildShuffle() methods to use the holder pattern for
    lazy initialization

  5. Added JavaDoc comments explaining the lazy loading mechanism

Why are the changes needed?

The current implementation statically initializes columnar shuffle class constructors
and the skew shuffle method at SparkUtils class loading time, which means these
classes/methods are loaded regardless of whether they are actually used.

This lazy loading approach ensures that:

  • Columnar shuffle classes are only loaded when actually needed (when
    celeborn.columnarShuffle.enabled is true and the create methods are called)
  • CelebornShuffleState class is only loaded when skew shuffle detection is needed
  • Reduces unnecessary class loading overhead for users not using these features
  • Improves startup performance and memory footprint
  • Aligns with the conditional usage pattern already present in SparkShuffleManager

The static holder pattern (initialization-on-demand holder idiom) provides several
advantages:

  • Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism)
  • No synchronization overhead at runtime (no volatile reads or lock acquisition)
  • Simpler and more concise code compared to double-checked locking
  • Recommended by Effective Java (Item 83) for lazy initialization

Does this PR resolve a correctness bug?

No, this is a performance optimization.

Does this PR introduce any user-facing change?

No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.

How was this patch tested?

  • Code review to verify correct implementation of the initialization-on-demand holder pattern
  • Verified that JVM class loading guarantees thread safety
  • The changes are backward compatible and don't alter functionality, only initialization timing

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the initialization of optional columnar shuffle features by converting static initialization to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern). This ensures that columnar shuffle classes and skew shuffle detection logic are only loaded when actually needed, reducing unnecessary class loading overhead for users not utilizing these features.

Changes:

  • Introduced three static inner holder classes to lazily initialize columnar shuffle constructors and skew shuffle method
  • Refactored createColumnarHashBasedShuffleWriter(), createColumnarShuffleReader(), and isCelebornSkewShuffleOrChildShuffle() to use the holder pattern
  • Added comprehensive JavaDoc comments explaining the lazy loading mechanism

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nd skew shuffle method using static holder pattern

### What changes were proposed in this pull request?

This PR converts the static initialization of columnar shuffle class constructors
and skew shuffle method to lazy initialization using the initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.

Specifically, the following changes were made:

1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner class
   to lazily initialize the constructor for ColumnarHashBasedShuffleWriter

2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to lazily
   initialize the constructor for CelebornColumnarShuffleReader

3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily
   initialize the `isCelebornSkewedShuffle` method reference

4. Modified `createColumnarHashBasedShuffleWriter()`, `createColumnarShuffleReader()`,
   and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder pattern for
   lazy initialization

5. Added JavaDoc comments explaining the lazy loading mechanism

### Why are the changes needed?

The current implementation statically initializes columnar shuffle class constructors
and the skew shuffle method at SparkUtils class loading time, which means these
classes/methods are loaded regardless of whether they are actually used.

This lazy loading approach ensures that:
- Columnar shuffle classes are only loaded when actually needed (when
  `celeborn.columnarShuffle.enabled` is true and the create methods are called)
- CelebornShuffleState class is only loaded when skew shuffle detection is needed
- Reduces unnecessary class loading overhead for users not using these features
- Improves startup performance and memory footprint
- Aligns with the conditional usage pattern already present in SparkShuffleManager

The static holder pattern (initialization-on-demand holder idiom) provides several
advantages:
- Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism)
- No synchronization overhead at runtime (no volatile reads or lock acquisition)
- Simpler and more concise code compared to double-checked locking
- Recommended by Effective Java (Item 83) for lazy initialization

### Does this PR resolve a correctness bug?

No, this is a performance optimization.

### Does this PR introduce any user-facing change?

No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.

### How was this patch tested?

- Code review to verify correct implementation of the initialization-on-demand holder pattern
- Verified that JVM class loading guarantees thread safety (JLS §12.4.2)
- Analyzed existing columnar shuffle and skew shuffle test coverage in the codebase
- The changes are backward compatible and don't alter functionality, only initialization timing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +237 to +247
private static final DynConstructors.Builder INSTANCE =
DynConstructors.builder()
.impl(
COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CLASS,
int.class,
CelebornShuffleHandle.class,
TaskContext.class,
CelebornConf.class,
ShuffleClient.class,
ShuffleWriteMetricsReporter.class,
SendBufferPool.class);
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The holder should store the built DynConstructors.Ctor instead of the DynConstructors.Builder. This is inconsistent with the established pattern in this file where all other dynamic reflection fields store the built result (see GET_READER_METHOD at line 176, UnregisterAllMapAndMergeOutput_METHOD at line 312, and even CelebornSkewShuffleMethodHolder at line 563).

Change INSTANCE to private static final DynConstructors.Ctor<?> INSTANCE and append .build() at the end of line 247. Then update line 258 to remove the .build() call and invoke directly on INSTANCE. This avoids the unnecessary method call overhead on every invocation and follows the established convention.

Copilot uses AI. Check for mistakes.
Comment on lines +271 to +283
private static final DynConstructors.Builder INSTANCE =
DynConstructors.builder()
.impl(
COLUMNAR_SHUFFLE_READER_CLASS,
CelebornShuffleHandle.class,
int.class,
int.class,
int.class,
int.class,
TaskContext.class,
CelebornConf.class,
ShuffleReadMetricsReporter.class,
ExecutorShuffleIdTracker.class);
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The holder should store the built DynConstructors.Ctor instead of the DynConstructors.Builder. This is inconsistent with the established pattern in this file where all other dynamic reflection fields store the built result (see GET_READER_METHOD at line 176, UnregisterAllMapAndMergeOutput_METHOD at line 312, and even CelebornSkewShuffleMethodHolder at line 563).

Change INSTANCE to private static final DynConstructors.Ctor<?> INSTANCE and append .build() at the end of line 283. Then update line 296 to remove the .build() call and invoke directly on INSTANCE. This avoids the unnecessary method call overhead on every invocation and follows the established convention.

Copilot uses AI. Check for mistakes.
SendBufferPool.class);

/**
* Lazy holder for ColumnarHashBasedShuffleWriter constructor builder. The builder is initialized
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc comment describes a "constructor builder" but should describe a "constructor" for consistency with the recommended implementation. After storing the built DynConstructors.Ctor instead of the builder (as suggested in the comment for lines 237-247), update this comment to read "Lazy holder for ColumnarHashBasedShuffleWriter constructor. The constructor is initialized..." to accurately reflect what is being stored.

Suggested change
* Lazy holder for ColumnarHashBasedShuffleWriter constructor builder. The builder is initialized
* Lazy holder for ColumnarHashBasedShuffleWriter constructor. The constructor is initialized

Copilot uses AI. Check for mistakes.
ExecutorShuffleIdTracker.class);

/**
* Lazy holder for CelebornColumnarShuffleReader constructor builder. The builder is initialized
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JavaDoc comment describes a "constructor builder" but should describe a "constructor" for consistency with the recommended implementation. After storing the built DynConstructors.Ctor instead of the builder (as suggested in the comment for lines 271-283), update this comment to read "Lazy holder for CelebornColumnarShuffleReader constructor. The constructor is initialized..." to accurately reflect what is being stored.

Suggested change
* Lazy holder for CelebornColumnarShuffleReader constructor builder. The builder is initialized
* Lazy holder for CelebornColumnarShuffleReader constructor. The constructor is initialized

Copilot uses AI. Check for mistakes.
* Lazy holder for ColumnarHashBasedShuffleWriter constructor builder. The builder is initialized
* only when this class is first accessed, ensuring lazy loading without explicit synchronization.
*/
private static class ColumnarHashBasedShuffleWriterConstructorBuilderHolder {
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The holder class is named ColumnarHashBasedShuffleWriterConstructorBuilderHolder, but it actually holds a DynConstructors.Builder instance, not a constructor. Consider renaming to ColumnarHashBasedShuffleWriterBuilderHolder for clarity and accuracy, as "ConstructorBuilder" is redundant when the type already indicates it's a Builder for constructors.

Copilot uses AI. Check for mistakes.
* Lazy holder for CelebornColumnarShuffleReader constructor builder. The builder is initialized
* only when this class is first accessed, ensuring lazy loading without explicit synchronization.
*/
private static class ColumnarShuffleReaderConstructorBuilderHolder {
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The holder class is named ColumnarShuffleReaderConstructorBuilderHolder, but it actually holds a DynConstructors.Builder instance, not a constructor. Consider renaming to ColumnarShuffleReaderBuilderHolder for clarity and accuracy, as "ConstructorBuilder" is redundant when the type already indicates it's a Builder for constructors.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant