Skip to content

Conversation

@josephschorr
Copy link
Member

Description

Adds a chunk bytes library to the common datastore code for easy storing and loading of byte data that will be chunked across multiple rows in the database, as some databases have soft or hard limits on the size of BLOB fields.

This will be used by singleton schemas to store schemas and, likely eventually, schema metadata

Testing

Unit tests. This is very early and not used yet.

@josephschorr josephschorr requested a review from a team as a code owner October 20, 2025 22:17
@github-actions github-actions bot added area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Oct 20, 2025
@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 85.16484% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.42%. Comparing base (01d72a1) to head (a94abd9).

Files with missing lines Patch % Lines
internal/datastore/common/chunkbytes.go 85.17% 17 Missing and 10 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2627      +/-   ##
==========================================
+ Coverage   77.39%   77.42%   +0.03%     
==========================================
  Files         449      450       +1     
  Lines       56613    56795     +182     
==========================================
+ Hits        43811    43968     +157     
- Misses      10045    10059      +14     
- Partials     2757     2768      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@tstirrat15 tstirrat15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, looks good to me

}

numChunks := (len(data) + c.maxChunkSize - 1) / c.maxChunkSize
chunks := make([][]byte, 0, numChunks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own edification: when you allocate a slice of slices, is that essentially a slice of pointers?

require.Contains(t, updateSQL, "SET deleted_at = ?")
require.Contains(t, updateSQL, "WHERE name = ?")
require.Contains(t, updateSQL, "AND deleted_at = ?")
require.Equal(t, []any{createdAt, "test-key", ^uint64(0)}, txn.capturedArgs[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an any? and what is ^uint64(0) supposed to represent? Is that a clever way of writing maxint?

},
{
name: "multiple exact chunks",
data: []byte("1234567890"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these happen to line up because utf-8 is one byte per char for ascii characters?

@miparnisari miparnisari changed the title feat: add chunk bytes library in prep for singleton schema work chore: add chunk bytes library in prep for singleton schema work Oct 23, 2025
Copy link
Contributor

@miparnisari miparnisari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've amended your PR title from feat: to chore: because, for customers, this isn't a new feature yet

Comment on lines +87 to +97
tableName string
nameColumn string
chunkIndexColumn string
chunkDataColumn string
maxChunkSize int
placeholderFormat sq.PlaceholderFormat
executor ChunkedBytesExecutor
writeMode WriteMode
createdAtColumn string
deletedAtColumn string
aliveValue T
Copy link
Contributor

@miparnisari miparnisari Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these fields are already defined in SQLByteChunkerConfig. why not just hold a pointer to it? this would reduce duplicate code and prevent you from having to write

return &SQLByteChunker[T]{
		tableName:         config.TableName,
		nameColumn:        config.NameColumn,
		chunkIndexColumn:  config.ChunkIndexColumn,
		chunkDataColumn:   config.ChunkDataColumn,
		maxChunkSize:      config.MaxChunkSize,
		placeholderFormat: config.PlaceholderFormat,
		executor:          config.Executor,
		writeMode:         config.WriteMode,
		createdAtColumn:   config.CreatedAtColumn,
		deletedAtColumn:   config.DeletedAtColumn,
		aliveValue:        config.AliveValue,
	}

which is dangerous, because if you add a new configuration you have to remember to update that code too

Comment on lines +84 to +86
// SQLByteChunker provides methods for reading and writing byte data
// that is chunked across multiple rows in a SQL table.
type SQLByteChunker[T any] struct {
Copy link
Contributor

@miparnisari miparnisari Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SQLByteChunker provides methods for reading and writing byte data
// that is chunked across multiple rows in a SQL table.
type SQLByteChunker[T any] struct {
type IntOrString interface {
~uint64 | ~string
}
// SQLByteChunker provides methods for reading and writing byte data
// that is chunked across multiple rows in a SQL table.
type SQLByteChunker[T IntOrString] struct {

Comment on lines +331 to +356
// Validate that we have all chunks from 0 to N-1
maxIndex := -1
for index := range chunks {
if index > maxIndex {
maxIndex = index
}
}

// Check for missing chunks
for i := 0; i <= maxIndex; i++ {
if _, exists := chunks[i]; !exists {
return nil, fmt.Errorf("missing chunk at index %d", i)
}
}

// Calculate total size
totalSize := 0
for _, chunk := range chunks {
totalSize += len(chunk)
}

// Reassemble
result := make([]byte, 0, totalSize)
for i := 0; i <= maxIndex; i++ {
result = append(result, chunks[i]...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Validate that we have all chunks from 0 to N-1
maxIndex := -1
for index := range chunks {
if index > maxIndex {
maxIndex = index
}
}
// Check for missing chunks
for i := 0; i <= maxIndex; i++ {
if _, exists := chunks[i]; !exists {
return nil, fmt.Errorf("missing chunk at index %d", i)
}
}
// Calculate total size
totalSize := 0
for _, chunk := range chunks {
totalSize += len(chunk)
}
// Reassemble
result := make([]byte, 0, totalSize)
for i := 0; i <= maxIndex; i++ {
result = append(result, chunks[i]...)
}
// Validate that we have all chunks from 0 to N-1 and calculate total size
maxIndex := -1
totalSize := 0
for index := range chunks {
maxIndex = max(maxIndex, index)
totalSize += len(chunks[index])
}
// Reassemble, while checking for missing chunks
result := make([]byte, 0, totalSize)
for i := 0; i <= maxIndex; i++ {
if _, exists := chunks[i]; !exists {
return nil, fmt.Errorf("missing chunk at index %d", i)
}
result = append(result, chunks[i]...)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datastore Affects the storage system area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants