-
Notifications
You must be signed in to change notification settings - Fork 32
Optimize Postgres compactor queries #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
4189ea7 to
266ff72
Compare
266ff72 to
033b1bc
Compare
| } | ||
| } | ||
|
|
||
| const COMPACT_ROW_CODEC = pick(models.BucketData, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to global scope since it never changes.
| const paramPart = bucket.slice(bracketIndex); | ||
|
|
||
| try { | ||
| const parsed = JSON.parse(paramPart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using JSON.parse to avoid writing a custom parsing function, let me know what do you think.
Problem Statement
In the context of #400, the current compaction query uses an OR condition with COLLATE "C" for cursor-based pagination:
The OR condition prevents Postgres from using the index bounds and removes the majority of the rows, as we can see in the query plan with EXPLAIN ANALYZE in below sample query executed during compaction:
Output:
PostgreSQL scans all rows matching the Index Cond and then applies the Filter, discarding most rows as we can see in the Rows Removed by Filter section.
Performance Impact
Solution
To optimize this query, I'm proposing to split the code into 3 specialized queries. Each query will handle a specific parameter case passed to
compactcommand. Check the changes in the PostgresCompactor.ts file to see the changes. I've also created unit and integration tests to verify correctness.Results
By running both the code currently in master branch and this branch I got these numbers locally when compacting a database with 1 million bucket records.