Commit 1f9c46b
authored
perf: Avoid copying metadata for each data file in summary (#2674)
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
Resolves #2673
# Rationale for this change
`_SnapshotProducer._summary()` copies the metadata for _every_ added /
deleted DataFile. This is pretty expensive. Instead we just copy it once
at the beginning of the function and use the same value each DataFile.
On my data, which overwrites a few million rows at a time, I saw the
time for `table.overwrite` go from ~20 seconds to ~6 seconds.
## Are these changes tested?
Yes, existing unit / integration tests
## Are there any user-facing changes?
Just faster writes :)
<!-- In the case of user-facing changes, please add the changelog label.
-->1 parent d3eb149 commit 1f9c46b
1 file changed
+9
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
243 | 246 | | |
244 | | - | |
| 247 | + | |
245 | 248 | | |
246 | 249 | | |
247 | 250 | | |
| |||
250 | 253 | | |
251 | 254 | | |
252 | 255 | | |
253 | | - | |
254 | | - | |
| 256 | + | |
| 257 | + | |
255 | 258 | | |
256 | 259 | | |
257 | 260 | | |
258 | | - | |
| 261 | + | |
259 | 262 | | |
260 | 263 | | |
261 | 264 | | |
262 | 265 | | |
263 | | - | |
| 266 | + | |
264 | 267 | | |
265 | 268 | | |
266 | 269 | | |
267 | | - | |
268 | | - | |
269 | | - | |
| 270 | + | |
270 | 271 | | |
271 | 272 | | |
272 | 273 | | |
| |||
0 commit comments