Skip to content

Conversation

@das-Abroxas
Copy link
Contributor

@das-Abroxas das-Abroxas commented Jan 15, 2026

This integrates checksum integrity validation from the S3 specification into Aruna's S3-compatible interface for the implemented upload and get operations. The implementation validates data integrity using checksums provided by clients during PutObject, UploadPart, and CompleteMultipartUpload operations and provides all calculated checksums with the HeadObject and GetObject operations.

Changes

  • Implemented ChecksumHandler utility for evaluating and validating checksums according to S3 spec
  • Integrated hash handling into protobuf Object to Dataproxy Object conversion
  • Extended UploadPart struct to optionally store checksums for multipart uploads
  • Added checksum calculation/validation in PutObject operation
  • Added checksum calculation/validation in UploadPart operation
  • Integrated checksum calculation in CompleteMultipartUpload operation
  • Provide all calculated checksums in the response headers of HeadObject and GetObject operation
  • Extended ChecksumHandler to handle empty/zero-byte objects
  • Improved 0-byte object handling
  • Added local hash upserting functionality
  • Refactored to use only API-conform hashes for server communication
  • Added checksum type to output

Notes

Currently only FULL_OBJECT checksum type is supported.

Closes

Resolves #225

@das-Abroxas das-Abroxas self-assigned this Jan 15, 2026
@das-Abroxas das-Abroxas added enhancement New feature or request fix Bug issue fix labels Jan 15, 2026
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 63.79310% with 189 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.56%. Comparing base (1719e11) to head (cab8bf3).

Files with missing lines Patch % Lines
...nents/data_proxy/src/s3_frontend/utils/checksum.rs 78.79% 58 Missing and 16 partials ⚠️
...ponents/data_proxy/src/s3_frontend/data_handler.rs 0.00% 40 Missing ⚠️
components/data_proxy/src/s3_frontend/s3service.rs 0.00% 33 Missing ⚠️
components/data_proxy/src/structs.rs 64.44% 32 Missing ⚠️
components/data_proxy/src/caching/cache.rs 0.00% 9 Missing ⚠️
components/data_proxy/src/s3_frontend/s3server.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #228      +/-   ##
==========================================
+ Coverage   33.44%   34.56%   +1.11%     
==========================================
  Files         129      130       +1     
  Lines       19570    20046     +476     
  Branches    19570    20046     +476     
==========================================
+ Hits         6546     6929     +383     
- Misses      12244    12316      +72     
- Partials      780      801      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@lfbrehm lfbrehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to also include all calculated hashes for GetObject and HeadObject calls, but I am aware of the ongoing discussion about these and multipart objects.

}

impl ChecksumHandler {
pub fn new(required_checksum: Option<IntegrityChecksum>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pub if you only use from_headers outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is used outside, for example at:

ChecksumHandler::new(None),

😐

Comment on lines 198 to 210
pub fn get_validation_checksum(&self) -> &Option<String> {
if let Some(checksum) = &self.required_checksum {
match checksum {
IntegrityChecksum::CRC32(val)
| IntegrityChecksum::CRC32C(val)
| IntegrityChecksum::CRC64NVME(val)
| IntegrityChecksum::SHA1(val)
| IntegrityChecksum::SHA256(val) => val,
}
} else {
&None
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this unused? If this is here for potential future use, can you at least disable the warnings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions were intended for future full implementation of checksum integrity validation in order to validate the checksum provided by the request against the calculated checksum. However, I have now included the functionality in the PR, at least for PutFile and UploadPart.

Comment on lines 216 to 222
pub fn get_calculated_checksum(&self) -> Option<String> {
if let Some(algo) = &self.required_checksum {
self.calculated_checksums.get(&algo.to_string()).cloned()
} else {
None
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation above.

Comment on lines 232 to 249
pub fn upsert_checksum(&mut self, key: &str, checksum: &str) -> Option<String> {
self.calculated_checksums
.insert(key.to_string(), checksum.to_string())
}

pub fn validate_checksum(&self) -> bool {
if let Some(checksum) = &self.required_checksum {
let calculated = self
.calculated_checksums
.get(&checksum.to_string())
.cloned();
return self.get_validation_checksum().eq(&calculated);
}

// If no checksum is required
true
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the unused upsert_checksum(...) function and refactored the validate_checksum(...) function which is now in use.

@das-Abroxas das-Abroxas changed the title Checksum Integrity Validation for Upload Operations Checksum Integrity Validation Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request fix Bug issue fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] Extend data model for object/part checksums

3 participants