-
Notifications
You must be signed in to change notification settings - Fork 35
Implement multipart copy and copying a particular version #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
bors try |
|
Relevant: JuliaCloud/AWS.jl#695 |
| [multipart copy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html). | ||
| # Optional Arguments | ||
| - `part_size_mb`: maximum size per uploaded part, in mebibytes (MiB). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth exposing an option that allows matching the part size between the source and destination. IIUC, that should make the range-based accesses faster while copying. If a file is big enough for a multipart copy, it was probably uploaded with a multipart upload, in which case the parts and their sizes can be obtained with S3.get_object_attributes. Lacking that permission, one can also get the part size with S3.head_object by passing Dict("partNumber" => 1) as a query parameter, and the number of parts will be in the entity tag of the source object.
| to_bucket, | ||
| to_path, | ||
| "$bucket/$path", | ||
| source, | ||
| Dict("headers" => headers); | ||
| aws_config=aws, | ||
| kwargs..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
| to_bucket, | |
| to_path, | |
| "$bucket/$path", | |
| source, | |
| Dict("headers" => headers); | |
| aws_config=aws, | |
| kwargs..., | |
| to_bucket, to_path, source, Dict("headers" => headers); aws_config=aws, kwargs... |
| "x-amz-copy-source-range" => string( | ||
| "bytes=", first(byte_range), '-', last(byte_range) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
| "x-amz-copy-source-range" => string( | |
| "bytes=", first(byte_range), '-', last(byte_range) | |
| ) | |
| "x-amz-copy-source-range" => | |
| string("bytes=", first(byte_range), '-', last(byte_range)), |
Summary of changes: - `s3_copy` now supports a `version` keyword argument that facilitates copying a specified version of an object. - A new function `s3_multipart_copy` to mirror `s3_multipart_upload` has been added, which calls `UploadPartCopy` in the API. - An explicit `cp(::S3Path, ::S3Path)` method has been implemented, which avoids the fallback `cp(::AbstractPath, ::AbstractPath)` method that reads the source file into memory before writing to the destination. - `cp(::S3Path, ::S3Path)` allows the user to opt into a multipart copy, in which case multipart is used when the source is larger than the specified part size (50 MiB by default). A multipart copy is unconditionally used when the source is at least 5 GiB. This behavior mimics that of the AWS CLI. Note that this now requires an additional API call to `HeadObject` in order to retrieve the source size.
57bf305 to
27ad265
Compare
I've had these changes locally for months (possibly a year or more?) but hadn't committed or pushed them. I don't know if/when I'll have the bandwidth to ensure this gets over the finish line, so if someone is interested in picking this up then please feel free to do so.
Summary of changes:
s3_copynow supports aversionkeyword argument that facilitates copying a specified version of an object.s3_multipart_copyto mirrors3_multipart_uploadhas been added, which callsUploadPartCopyin the API.cp(::S3Path, ::S3Path)method has been implemented, which avoids the fallbackcp(::AbstractPath, ::AbstractPath)method that reads the source file into memory before writing to the destination.cp(::S3Path, ::S3Path)allows the user to opt into a multipart copy, in which case multipart is used when the source is larger than the specified part size (50 MiB by default). A multipart copy is unconditionally used when the source is at least 5 GiB. This behavior mimics that of the AWS CLI. Note that this now requires an additional API call toHeadObjectin order to retrieve the source size.