[DO-NOT-MERGE] Prototype: runtime profiling of Python workers #52679

HyukjinKwon · 2025-10-21T08:24:08Z

What changes were proposed in this pull request?

TBD

Why are the changes needed?

TBD

Does this PR introduce any user-facing change?

TBD

How was this patch tested?

Screen.Recording.2025-10-24.at.12.18.24.PM.mov

Was this patch authored or co-authored using generative AI tooling?

No.

holdenk

Looks really interesting, did a start of a read through but I'll wait for a more complete description so I understand the in/out of scope a bit better.

holdenk · 2025-10-31T22:33:02Z

core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala

    }
    // allow the user to set the batch size for the BatchedSerializer on UDFs
    envVars.put("PYTHON_UDF_BATCH_SIZE", batchSizeForPythonUDF.toString)
+    envVars.put("PYSPARK_RUNTIME_PROFILE", true.toString)


obviously make configurable later

holdenk · 2025-10-31T22:33:33Z

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

-case class PythonWorker(channel: SocketChannel) {
+case class PythonWorker(
+    channel: SocketChannel,
+    extraChannel: Option[SocketChannel] = None) {


Let's call this profile data channel or something?

holdenk · 2025-10-31T22:35:32Z

python/pyspark/worker.py

+        time.sleep(1)
+
+
 def main(infile, outfile):


Maybe rename to outputs

holdenk · 2025-10-31T22:36:05Z

python/pyspark/worker.py

+        pickled = pickle.dumps(stats)
+        write_with_length(pickled, outfile)
+        outfile.flush()
+        time.sleep(1)


I know yappi says it's fast but 1 second busy loop seems maybe overkill or should be configurable?

holdenk · 2025-10-31T22:37:22Z

python/pyspark/worker.py

+        for thread in yappi.get_thread_stats():
+            data = list(yappi.get_func_stats(ctx_id=thread.id))
+            stats.extend([{str(k): str(v) for k, v in d.items()} for d in data])
+        pickled = pickle.dumps(stats)


While pickle? Would JSON maybe make more sense so we can interpert more easily in say the Spark UI in the future?

HyukjinKwon · 2025-11-04T02:03:23Z

Let me make a new PR when it's ready

HyukjinKwon marked this pull request as draft October 21, 2025 08:24

github-actions bot added SQL STRUCTURED STREAMING CORE PYTHON labels Oct 21, 2025

HyukjinKwon added 2 commits October 22, 2025 13:09

Prototype of runtime profiler

1cdbe7d

working version

39fb9dc

HyukjinKwon force-pushed the yappi-prototype branch from 6b9e9ae to 39fb9dc Compare October 22, 2025 08:00

fixup

b2d1e01

holdenk reviewed Oct 31, 2025

View reviewed changes

HyukjinKwon closed this Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO-NOT-MERGE] Prototype: runtime profiling of Python workers #52679

[DO-NOT-MERGE] Prototype: runtime profiling of Python workers #52679

HyukjinKwon commented Oct 21, 2025 •

edited

Loading

Uh oh!

holdenk left a comment

Uh oh!

holdenk Oct 31, 2025

Uh oh!

holdenk Oct 31, 2025

Uh oh!

holdenk Oct 31, 2025

Uh oh!

holdenk Oct 31, 2025

Uh oh!

holdenk Oct 31, 2025

Uh oh!

HyukjinKwon commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[DO-NOT-MERGE] Prototype: runtime profiling of Python workers #52679

[DO-NOT-MERGE] Prototype: runtime profiling of Python workers #52679

Conversation

HyukjinKwon commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

holdenk Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HyukjinKwon commented Oct 21, 2025 •

edited

Loading