GH-44910: [Swift] Fix IPC stream reader and writer impl #45029

abandy · 2024-12-14T19:09:05Z

Rationale for this change

Fixes IPC incorrect stream format issue.

Changes have been tested with:

directions from GH-40488: [Swift] Add simple get swift example arrow-experiments#41 (comment)
generated file using generate.py from https://github.com/apache/arrow-experiments/tree/main/data/rand-many-types (removed currently unsupported Swift types)

This PR includes breaking changes to public APIs.
Writer and reader APIs have changed:
Reader:
fromStream -> fromFileStream
Writer:
toStream -> toFileStream

GitHub Issue: [Swift] Swift Arrow IPC stream implementation is not in sync with other languages #44910

github-actions · 2024-12-14T19:09:33Z

⚠️ GitHub issue #44910 has been automatically assigned in GitHub to PR creator.

kou · 2024-12-15T01:41:24Z

swift/Arrow/Sources/Arrow/ArrowReader.swift

Could you add a comment that explains the difference between fromMemoryStream and fromFileStream?

kou · 2024-12-15T01:42:05Z

swift/Arrow/Sources/Arrow/ArrowReader.swift

Is this the size of length data?
How about using UInt32 not Int32 because length data is UInt32 not Int32?

I looked at the length var and it is already UInt32. From a couple of lines above: var length = getUInt32(fileData, offset: offset). Please let me know if this matches what you are seeing.

Sorry. I don't remember this but I think that I referred var offset: Int = 0...

I will change offset += Int(MemoryLayout.size) to offset += Int(MemoryLayout.size). The variable offset is an Int due to the parameter type in the call to the buffers loadUnaligned.

swift/Arrow/Sources/Arrow/ArrowReaderHelper.swift

abandy · 2025-05-01T23:02:15Z

@kou I hope all is well. Please review again when you get a chance.

kou · 2025-05-02T02:19:09Z

swift/Arrow/Sources/Arrow/ArrowReader.swift

Sorry. I don't remember this but I think that I referred var offset: Int = 0...

kou · 2025-05-02T02:23:58Z

swift/Arrow/Sources/Arrow/ArrowReader.swift

Can we use different name for this? This may be confused named because Apache Arrow specification uses:

"IPC Streaming Format" https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format

"IPC File Format" https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format

If we use "File" and "Stream" in this method name, users may think that this is for "IPC Streaming Format" that is stored in a file.

Gotcha, I will change the name to fromStream.

kou · 2025-05-03T21:48:54Z

swift/Arrow/Sources/Arrow/ArrowReader.swift

How about using readStreaming (for the Arrow streaming format) and readFile (for the Arrow file format) instead of fromMemoryStream (for the Arrow streaming format) and fromStream (for the Arrow file format)?

Suggested change

/*

The Memory stream format is for reading the arrow streaming protocol. This

format is slightly different from the File format protocol as it doesn't contain

a header and footer

*/

public func fromMemoryStream( // swiftlint:disable:this function_body_length

/*

This is for reading the Arrow streaming format. The Arrow streaming format

is slightly different from the Arrow File format as it doesn't contain a header

and footer.

*/

public func readStreaming( // swiftlint:disable:this function_body_length

kou · 2025-05-03T21:51:15Z

swift/Arrow/Sources/Arrow/ArrowReader.swift

Suggested change

/*

The File stream format supports random accessing the data. This format contains

a header and footer around the streaming format.

*/

public func fromStream( // swiftlint:disable:this function_body_length

/*

This is for reading the Arrow file format. The Arrow file format supports

random accessing the data. The Arrow file format contains a header and

footer around the Arrow streaming format.

*/

public func readFile( // swiftlint:disable:this function_body_length

ianmcook · 2025-05-06T19:09:03Z

I see that @dongjoon-hyun is using this Swift Arrow implementation in the Spark Connect Client for Swift. Has this issue been fixed downstream in that repo?

dongjoon-hyun · 2025-05-06T20:19:59Z

I've been following up Apache Arrow activity already in order to consume the official Apache Arrow release eventually when it's ready. 😄

dongjoon-hyun · 2025-05-06T20:26:19Z

For the record, Apache Spark Connect for Swift is a user of Apache Arrow. For the required changes, I've already contributed back except one thing (Swift 6 compilation stuff). Other than that, there is no new feature or bug fixes for this layer.

ianmcook · 2025-05-06T20:32:22Z

Thanks very much for your contributions @dongjoon-hyun!

dongjoon-hyun

Thank you, @abandy and all . It looks good to me.

kou · 2025-05-07T00:41:13Z

It seems that apache/spark-connect-swift bundles Apache Arrow Swift apache/spark-connect-swift@fe8322d instead of referring a package in https://github.com/apache/spark-connect-swift/blob/main/Package.swift .

Is it only for backporting unreleased features/fixes? (Will apache/spark-connect-swift use Apache Arrow Swift as a package when we release 21.0.0?)

abandy · 2025-05-07T11:11:38Z

I do not have privileges to merge. @dongjoon-hyun or @kou can you please merge when you get a chance?

dongjoon-hyun · 2025-05-07T16:26:43Z

To @kou and @abandy , as a user, I really appreciated your efforts on Apache Arrow.

To @kou ,

Will apache/spark-connect-swift use Apache Arrow Swift as a package when we release 21.0.0?

As a member of Apache Spark PMC, I can say that Apache Spark community has no intention to duplicate Apache Arrow. I clearly mentioned in the following PR from the beginning when I started with 19.0.1.

[SPARK-51465] Use Apache Arrow Swift 19.0.1 spark-connect-swift#6

Apache Spark community uses only the committed Apache Arrow codebase. To be honest, I've been monitoring, evaluating and waiting for Apache Arrow Swift for a long time than you guess, but it didn't meet my expectation. There are a few reasons why we couldn't start as a package consumer. The most important thing is the lack of Swift 6 support. In addition, some instability in Linux environments (due to the potential data race).

I started inevitably Spark Connect Swift Client as v0.1 because Apache Spark 4.0 is already RC4. In the end, I hope and I will remove all copied content from Apache Spark repository when Apache Arrow Swift is ready to be used directly.

To @abandy ,

Sorry for making you confused. Although I'm a ASF member and Apache Spark PMC member, I'm just a user in Apache Arrow community. I just tested your PR and approved it as an audience. I have no permissions here.
I'm wondering if we have a roadmap or ETA for the remaining development. For example, list type support?

dongjoon-hyun · 2025-05-07T16:34:06Z

As a side note, @kou , as a user, I hope Apache Arrow community publishes Apache Arrow package in Swift Package Index site under Apache namespace as least. That could be the beginning of consumable Apache Arrow package.

https://swiftpackageindex.com/apache/

As of now, you can see that Apache Spark Connect Client for Swift is the only registered Swift Package under Apache.

https://swiftpackageindex.com/apache/spark-connect-swift

kou

Sorry for hijacking this PR for Apache Spark Connect Client for Swift.

@dongjoon-hyun Could you open an issue for remained issues for Apache Spark Connect Client for Swift? Let's use the issue for further discussion.

kou · 2025-05-08T06:34:35Z

swift/Arrow/Sources/Arrow/ArrowWriter.swift

Can we use the same naming rules as reader for writer too?

dongjoon-hyun · 2025-05-08T14:36:02Z

Oh, not at all. Apache Arrow community is a big eco-system. I'm happy to monitor the community decision and collaborate as a user.

Sorry for hijacking this PR for Apache Spark Connect Client for Swift.

Definitely, will do in a proper way.

Could you open an issue for remained issues for Apache Spark Connect Client for Swift?

kou · 2025-05-09T00:44:46Z

Could you open an issue for remained issues for Apache Spark Connect Client for Swift?

I clarify this: "remained Apache Arrow Swift issues such as publishing to Swift Package Index"

abandy · 2025-05-20T02:25:21Z

@kou please review and merge when you get a chance.

kou

+1

kou · 2025-05-20T02:50:55Z

Ah, we should have updated the PR description before we merge this...

conbench-apache-arrow · 2025-05-20T10:37:23Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 8893e88.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

abandy requested a review from kou as a code owner December 14, 2024 19:09

github-actions bot added Component: Swift awaiting review Awaiting review labels Dec 14, 2024

abandy force-pushed the GH-44910 branch 2 times, most recently from eb8d290 to fc1e28e Compare December 14, 2024 19:38

kou changed the title ~~GH-44910: [Swift] fix ipc stream reader and writer impl~~ GH-44910: [Swift] Fix IPC stream reader and writer impl Dec 15, 2024

kou reviewed Dec 15, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Dec 15, 2024

abandy force-pushed the GH-44910 branch from fc1e28e to f9e029e Compare January 17, 2025 02:24

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jan 17, 2025

abandy force-pushed the GH-44910 branch from f9e029e to e167f7d Compare January 17, 2025 02:25

kou reviewed May 2, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 2, 2025

abandy force-pushed the GH-44910 branch from e167f7d to 8b39179 Compare May 3, 2025 15:01

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 3, 2025

kou reviewed May 3, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 3, 2025

abandy force-pushed the GH-44910 branch from 8b39179 to f75e9f3 Compare May 4, 2025 00:44

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 4, 2025

dongjoon-hyun approved these changes May 6, 2025

View reviewed changes

kou reviewed May 8, 2025

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 8, 2025

abandy force-pushed the GH-44910 branch from f75e9f3 to 8cf9554 Compare May 12, 2025 17:40

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 12, 2025

apacheGH-44910: [Swift] fix ipc stream reader and writer impl

c01776e

abandy force-pushed the GH-44910 branch from 8cf9554 to c01776e Compare May 12, 2025 17:44

kou approved these changes May 20, 2025

View reviewed changes

kou merged commit 8893e88 into apache:main May 20, 2025
7 checks passed

kou removed the awaiting change review Awaiting change review label May 20, 2025

kou mentioned this pull request May 20, 2025

[Swift] Swift Arrow IPC stream implementation is not in sync with other languages #44910

Closed

github-actions bot added the awaiting merge Awaiting merge label May 20, 2025

ianmcook mentioned this pull request May 20, 2025

GH-40488: [Swift] Add simple get swift example apache/arrow-experiments#41

Open

-    /*
-     The Memory stream format is for reading the arrow streaming protocol.  This
-     format is slightly different from the File format protocol as it doesn't contain
-     a header and footer
-     */
-    public func fromMemoryStream( // swiftlint:disable:this function_body_length
+    /*
+     This is for reading the Arrow streaming format. The Arrow streaming format
+     is slightly different from the Arrow File format as it doesn't contain a header
+     and footer.
+     */
+    public func readStreaming( // swiftlint:disable:this function_body_length

-    /*
-     The File stream format supports random accessing the data.  This format contains
-     a header and footer around the streaming format.
-     */
-    public func fromStream( // swiftlint:disable:this function_body_length
+    /*
+     This is for reading the Arrow file format. The Arrow file format supports
+     random accessing the data.  The Arrow file format contains a header and
+     footer around the Arrow streaming format.
+     */
+    public func readFile( // swiftlint:disable:this function_body_length

GH-44910: [Swift] Fix IPC stream reader and writer impl #45029

GH-44910: [Swift] Fix IPC stream reader and writer impl #45029

Uh oh!

Conversation

abandy commented Dec 14, 2024 • edited by kou Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Uh oh!

github-actions bot commented Dec 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abandy commented May 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ianmcook commented May 6, 2025

Uh oh!

dongjoon-hyun commented May 6, 2025

Uh oh!

dongjoon-hyun commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianmcook commented May 6, 2025

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

kou commented May 7, 2025

Uh oh!

abandy commented May 7, 2025

Uh oh!

dongjoon-hyun commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented May 8, 2025

Uh oh!

kou commented May 9, 2025

Uh oh!

abandy commented May 20, 2025

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kou commented May 20, 2025

Uh oh!

conbench-apache-arrow bot commented May 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

abandy commented Dec 14, 2024 •

edited by kou

Loading

dongjoon-hyun commented May 6, 2025 •

edited

Loading

dongjoon-hyun commented May 7, 2025 •

edited

Loading

dongjoon-hyun commented May 7, 2025 •

edited

Loading