-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Optimize memory use and configure postgres for maximum performance #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
polastre
wants to merge
21
commits into
main
Choose a base branch
from
memory-optimization
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
this reduces memory by not re-creating a buffer on every single tile compression. Instead, it reuses the buffer and reuses the writer.
Previously we were scanning into a new []byte slice which gets created for every tile. Since the pgx.Conn is open for the entirety of the export process (per worker), we can use its underlying buffer since no other queries will be made during that time. If a BulkWriter is used (the only one is mbtiles), then compression is also used, which means the resulting byte buffer that is ultimately written is the gzip compression output (and not the raw result from postgres). If mbtiles format is not selected, the tile is immediately written to disk.
Since tiles are written in batches, we need to hold on to the buffer until the tile is written. This creates an issue since the gzip implementation wants to release the buffer immediately and copy the results to yet another byte slice. Instead, keep track of the buffers and their corresponding byte slices and don't release them until the batch is written. Then release all the buffers and clear the references, and get ready for the next batch.
This is a merge of branch 'include-sqlite' into memory-optimization
The SQL query strings are the same for each zoom level, however the coordiantes (z, x, y) change. Before, a new query string was generated for every tile. This had two bad side effects: 1. A new string was allocated (and had to be garbage collected) for each tile's query 2. Postgres couldn't re-use the query plan it prepared because the z/x/y coordinates were included in the query string itself The fix here is to: 1. Generate query strings prior to kicking off the export, store them in a map, then retrieve them from the map for each tile. This is O(1) time and zero space allocated per tile. 2. Use query strings with argument placeholders (like `$1`), then pass z/x/y coordinates in as query parameters. This also has zero allocation and allows postgres/pgx to prepare the statement. pgx works by caching the statements it has previously prepared and then using those prepared statements. If every tile has a different query, then the cache grows like crazy. If each zoom has its own query, then the cache size is only related to the number of zooms being processed.
… should be preferred
This allows for sending custom session configuration and tuning commands to improve export speed for your specific tileset
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change introduces a number of performance optimizations to reduce the time to generate tiles.
pgtype.DriverBytesfrom pgx for postgres queries. Instead of allocating a buffer to write the tile data into from postgres, reuse the underlying pgx buffer which results in no allocations.($1, $2, $3)so that a single statement per zoom can be prepared and reused by pgx. Before, every tile query had the coordinate embedded in the query, so each query was a new prepared statement and didn't benefit from the reuse of previous queries.--initcommand that accepts a sql statement to run on connection session start. This can be used to further optimize the performance of postgres queries based on the behavior of your tileset.json.