Skip to content

Download directly from S3 for faster query times #65

@jankaifer

Description

@jankaifer

Opportunity

Whenever you query with Athena it will save the result into some S3 bucket. You can simply download this .csv file like any normal file from S3.

Another observation is that Athena API can return at most 1000 rows on one single page. This has a significant performance impact if you try to download 100k + rows. There need to be 100+ requests, even if it's just a few MB.

In our case, we are querying Athena from a different region (and different continent) so just the latency alone on those 100+ requests is multiple seconds.

Downloading from S3 is a single request, which is faster. There are almost no downsides.

Result

After I implemented fetching directly from Athena we observed a significant speed-up in our query times. For queries that ~100k rows, it went from 38 seconds to just 18 seconds which is more than a 2x improvement. This is even more significant for queries that return more rows (in some places it was even 4x speed-up).

Request

It would be nice if some form of S3 fetching would be implemented upstream. I have opened PR with my implementation, it's not in a mergeable state right now. I will not have time to clean it up and create a proper PR, but I wanted to share my code anyway in case it helps someone or someone finds the time to properly integrate that functionality into athenadriver API.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions