Skip to content
Discussion options

You must be logged in to vote

That's interesting, the reason that might be is because it will read the footer once for every file group, which means it is doing that 20 times. Normally that is outweighed by the additional parallelism, but it is possible that the parquet file has been written in such a way that this isn't possible. Arrow-cpp had a bug for a very long time where it produced massive row groups, and DuckDB has an interesting approach to the spec 😅

Couple of questions:

Couple of…

Replies: 2 comments 9 replies

Comment options

You must be logged in to vote
9 replies
@devoxi
Comment options

@tustvold
Comment options

tustvold Oct 4, 2023
Collaborator

@devoxi
Comment options

@tustvold
Comment options

tustvold Oct 4, 2023
Collaborator

Answer selected by Jefffrey
@devoxi
Comment options

@devoxi
Comment options

@Ted-Jiang
Comment options

@tustvold
Comment options

tustvold Oct 7, 2023
Collaborator

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants