-
Notifications
You must be signed in to change notification settings - Fork 416
[CELEBORN-2248] Implement lazy loading for columnar shuffle classes and skew shuffle method using static holder pattern #3581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
212f823 to
77447b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes the initialization of optional columnar shuffle features by converting static initialization to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern). This ensures that columnar shuffle classes and skew shuffle detection logic are only loaded when actually needed, reducing unnecessary class loading overhead for users not utilizing these features.
Changes:
- Introduced three static inner holder classes to lazily initialize columnar shuffle constructors and skew shuffle method
- Refactored
createColumnarHashBasedShuffleWriter(),createColumnarShuffleReader(), andisCelebornSkewShuffleOrChildShuffle()to use the holder pattern - Added comprehensive JavaDoc comments explaining the lazy loading mechanism
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nd skew shuffle method using static holder pattern ### What changes were proposed in this pull request? This PR converts the static initialization of columnar shuffle class constructors and skew shuffle method to lazy initialization using the initialization-on-demand holder idiom (static inner class pattern) in SparkUtils.java. Specifically, the following changes were made: 1. Introduced `ColumnarHashBasedShuffleWriterConstructorHolder` static inner class to lazily initialize the constructor for ColumnarHashBasedShuffleWriter 2. Introduced `ColumnarShuffleReaderConstructorHolder` static inner class to lazily initialize the constructor for CelebornColumnarShuffleReader 3. Introduced `CelebornSkewShuffleMethodHolder` static inner class to lazily initialize the `isCelebornSkewedShuffle` method reference 4. Modified `createColumnarHashBasedShuffleWriter()`, `createColumnarShuffleReader()`, and `isCelebornSkewShuffleOrChildShuffle()` methods to use the holder pattern for lazy initialization 5. Added JavaDoc comments explaining the lazy loading mechanism ### Why are the changes needed? The current implementation statically initializes columnar shuffle class constructors and the skew shuffle method at SparkUtils class loading time, which means these classes/methods are loaded regardless of whether they are actually used. This lazy loading approach ensures that: - Columnar shuffle classes are only loaded when actually needed (when `celeborn.columnarShuffle.enabled` is true and the create methods are called) - CelebornShuffleState class is only loaded when skew shuffle detection is needed - Reduces unnecessary class loading overhead for users not using these features - Improves startup performance and memory footprint - Aligns with the conditional usage pattern already present in SparkShuffleManager The static holder pattern (initialization-on-demand holder idiom) provides several advantages: - Thread-safe without explicit synchronization (guaranteed by JVM class loading mechanism) - No synchronization overhead at runtime (no volatile reads or lock acquisition) - Simpler and more concise code compared to double-checked locking - Recommended by Effective Java (Item 83) for lazy initialization ### Does this PR resolve a correctness bug? No, this is a performance optimization. ### Does this PR introduce any user-facing change? No. This change only affects when certain classes are loaded internally. The functionality and API remain unchanged. ### How was this patch tested? - Code review to verify correct implementation of the initialization-on-demand holder pattern - Verified that JVM class loading guarantees thread safety (JLS §12.4.2) - Analyzed existing columnar shuffle and skew shuffle test coverage in the codebase - The changes are backward compatible and don't alter functionality, only initialization timing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
77447b0 to
b53a8cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private static final DynConstructors.Builder INSTANCE = | ||
| DynConstructors.builder() | ||
| .impl( | ||
| COLUMNAR_HASH_BASED_SHUFFLE_WRITER_CLASS, | ||
| int.class, | ||
| CelebornShuffleHandle.class, | ||
| TaskContext.class, | ||
| CelebornConf.class, | ||
| ShuffleClient.class, | ||
| ShuffleWriteMetricsReporter.class, | ||
| SendBufferPool.class); |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The holder should store the built DynConstructors.Ctor instead of the DynConstructors.Builder. This is inconsistent with the established pattern in this file where all other dynamic reflection fields store the built result (see GET_READER_METHOD at line 176, UnregisterAllMapAndMergeOutput_METHOD at line 312, and even CelebornSkewShuffleMethodHolder at line 563).
Change INSTANCE to private static final DynConstructors.Ctor<?> INSTANCE and append .build() at the end of line 247. Then update line 258 to remove the .build() call and invoke directly on INSTANCE. This avoids the unnecessary method call overhead on every invocation and follows the established convention.
| private static final DynConstructors.Builder INSTANCE = | ||
| DynConstructors.builder() | ||
| .impl( | ||
| COLUMNAR_SHUFFLE_READER_CLASS, | ||
| CelebornShuffleHandle.class, | ||
| int.class, | ||
| int.class, | ||
| int.class, | ||
| int.class, | ||
| TaskContext.class, | ||
| CelebornConf.class, | ||
| ShuffleReadMetricsReporter.class, | ||
| ExecutorShuffleIdTracker.class); |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The holder should store the built DynConstructors.Ctor instead of the DynConstructors.Builder. This is inconsistent with the established pattern in this file where all other dynamic reflection fields store the built result (see GET_READER_METHOD at line 176, UnregisterAllMapAndMergeOutput_METHOD at line 312, and even CelebornSkewShuffleMethodHolder at line 563).
Change INSTANCE to private static final DynConstructors.Ctor<?> INSTANCE and append .build() at the end of line 283. Then update line 296 to remove the .build() call and invoke directly on INSTANCE. This avoids the unnecessary method call overhead on every invocation and follows the established convention.
| SendBufferPool.class); | ||
|
|
||
| /** | ||
| * Lazy holder for ColumnarHashBasedShuffleWriter constructor builder. The builder is initialized |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc comment describes a "constructor builder" but should describe a "constructor" for consistency with the recommended implementation. After storing the built DynConstructors.Ctor instead of the builder (as suggested in the comment for lines 237-247), update this comment to read "Lazy holder for ColumnarHashBasedShuffleWriter constructor. The constructor is initialized..." to accurately reflect what is being stored.
| * Lazy holder for ColumnarHashBasedShuffleWriter constructor builder. The builder is initialized | |
| * Lazy holder for ColumnarHashBasedShuffleWriter constructor. The constructor is initialized |
| ExecutorShuffleIdTracker.class); | ||
|
|
||
| /** | ||
| * Lazy holder for CelebornColumnarShuffleReader constructor builder. The builder is initialized |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc comment describes a "constructor builder" but should describe a "constructor" for consistency with the recommended implementation. After storing the built DynConstructors.Ctor instead of the builder (as suggested in the comment for lines 271-283), update this comment to read "Lazy holder for CelebornColumnarShuffleReader constructor. The constructor is initialized..." to accurately reflect what is being stored.
| * Lazy holder for CelebornColumnarShuffleReader constructor builder. The builder is initialized | |
| * Lazy holder for CelebornColumnarShuffleReader constructor. The constructor is initialized |
| * Lazy holder for ColumnarHashBasedShuffleWriter constructor builder. The builder is initialized | ||
| * only when this class is first accessed, ensuring lazy loading without explicit synchronization. | ||
| */ | ||
| private static class ColumnarHashBasedShuffleWriterConstructorBuilderHolder { |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The holder class is named ColumnarHashBasedShuffleWriterConstructorBuilderHolder, but it actually holds a DynConstructors.Builder instance, not a constructor. Consider renaming to ColumnarHashBasedShuffleWriterBuilderHolder for clarity and accuracy, as "ConstructorBuilder" is redundant when the type already indicates it's a Builder for constructors.
| * Lazy holder for CelebornColumnarShuffleReader constructor builder. The builder is initialized | ||
| * only when this class is first accessed, ensuring lazy loading without explicit synchronization. | ||
| */ | ||
| private static class ColumnarShuffleReaderConstructorBuilderHolder { |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The holder class is named ColumnarShuffleReaderConstructorBuilderHolder, but it actually holds a DynConstructors.Builder instance, not a constructor. Consider renaming to ColumnarShuffleReaderBuilderHolder for clarity and accuracy, as "ConstructorBuilder" is redundant when the type already indicates it's a Builder for constructors.
What changes were proposed in this pull request?
This PR converts the static initialization of columnar shuffle class constructors
and skew shuffle method to lazy initialization using the initialization-on-demand
holder idiom (static inner class pattern) in SparkUtils.java.
Specifically, the following changes were made:
Introduced
ColumnarHashBasedShuffleWriterConstructorHolderstatic inner classto lazily initialize the constructor for ColumnarHashBasedShuffleWriter
Introduced
ColumnarShuffleReaderConstructorHolderstatic inner class to lazilyinitialize the constructor for CelebornColumnarShuffleReader
Introduced
CelebornSkewShuffleMethodHolderstatic inner class to lazilyinitialize the
isCelebornSkewedShufflemethod referenceModified
createColumnarHashBasedShuffleWriter(),createColumnarShuffleReader(),and
isCelebornSkewShuffleOrChildShuffle()methods to use the holder pattern forlazy initialization
Added JavaDoc comments explaining the lazy loading mechanism
Why are the changes needed?
The current implementation statically initializes columnar shuffle class constructors
and the skew shuffle method at SparkUtils class loading time, which means these
classes/methods are loaded regardless of whether they are actually used.
This lazy loading approach ensures that:
celeborn.columnarShuffle.enabledis true and the create methods are called)The static holder pattern (initialization-on-demand holder idiom) provides several
advantages:
Does this PR resolve a correctness bug?
No, this is a performance optimization.
Does this PR introduce any user-facing change?
No. This change only affects when certain classes are loaded internally.
The functionality and API remain unchanged.
How was this patch tested?