Skip to content

Commit 6b66c5a

Browse files
kevinelliottclaude
andcommitted
Add type conversion for PostgreSQL driver to Parquet type matching
## Problem Panic: "cannot create parquet value of type INT32 from go value of type int64" The PostgreSQL driver (lib/pq) returns int64 for ALL integer types (int2, int4, int8) to avoid overflow issues. However, Parquet expects exact type matches: - INT32 for PostgreSQL int2/int4 - INT64 for PostgreSQL int8 ## Solution Added `convertPostgreSQLValue()` function that converts driver types to appropriate Go types based on the PostgreSQL column type from the schema: ### Type Conversions - **int2, int4**: int64 → int32 (safe, PostgreSQL int4 max is 2^31-1) - **int8**: Keep as int64 - **float4**: float64 → float32 - **float8, numeric, decimal**: Keep as float64 - **timestamp, date**: Keep as time.Time - **bool**: Keep as bool - **bytea**: Keep as []byte - **strings**: Keep as-is ### Integration The conversion is called when building rowData in the streaming extraction: ```go rowData[col.GetName()] = convertPostgreSQLValue(scanValues[i], col.GetType()) ``` ## Benefits ✅ **Type Precision**: Correct integer sizes (int32 vs int64) ✅ **Parquet Compatibility**: Types match schema expectations ✅ **No Panics**: Proper type conversions prevent runtime errors ✅ **Safe**: Conversions are within PostgreSQL type bounds ## Testing - All tests pass - Linters pass (with nolint for safe int64→int32 conversion) - Build succeeds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 5ab8ac8 commit 6b66c5a

File tree

1 file changed

+46
-2
lines changed

1 file changed

+46
-2
lines changed

cmd/archiver.go

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1852,6 +1852,49 @@ func (a *Archiver) extractRowsWithDateFilter(partition PartitionInfo, startTime,
18521852
return result, nil
18531853
}
18541854

1855+
// convertPostgreSQLValue converts PostgreSQL driver values to appropriate Go types
1856+
// The pq driver returns int64 for all integer types, but formatters (especially Parquet)
1857+
// need specific types based on the actual PostgreSQL column type
1858+
func convertPostgreSQLValue(value interface{}, pgType string) interface{} {
1859+
if value == nil {
1860+
return nil
1861+
}
1862+
1863+
switch pgType {
1864+
case "int2", "int4":
1865+
// PostgreSQL driver returns int64, but we need int32 for Parquet INT32
1866+
// Safe conversion: PostgreSQL int2 is -32768 to 32767, int4 is -2147483648 to 2147483647
1867+
if v, ok := value.(int64); ok {
1868+
return int32(v) //nolint:gosec // G115: Safe conversion, PostgreSQL int4 fits in int32
1869+
}
1870+
case "int8":
1871+
// Keep as int64
1872+
return value
1873+
case "float4":
1874+
// PostgreSQL driver returns float64, convert to float32
1875+
if v, ok := value.(float64); ok {
1876+
return float32(v)
1877+
}
1878+
case "float8", "numeric", "decimal":
1879+
// Keep as float64
1880+
return value
1881+
case "bool":
1882+
// Keep as bool
1883+
return value
1884+
case "timestamp", "timestamptz", "date":
1885+
// Keep as time.Time
1886+
return value
1887+
case "bytea":
1888+
// Keep as []byte
1889+
return value
1890+
default:
1891+
// For strings and other types, keep as-is
1892+
return value
1893+
}
1894+
1895+
return value
1896+
}
1897+
18551898
// extractPartitionDataStreaming extracts partition data using streaming architecture
18561899
// This streams data in chunks to a temp file, avoiding loading everything into memory
18571900
//
@@ -2008,10 +2051,11 @@ func (a *Archiver) extractPartitionDataStreaming(partition PartitionInfo, progra
20082051
return
20092052
}
20102053

2011-
// Convert to map[string]interface{}
2054+
// Convert to map[string]interface{} with type conversion
20122055
rowData := make(map[string]interface{}, len(columns))
20132056
for i, col := range columns {
2014-
rowData[col.GetName()] = scanValues[i]
2057+
// Convert PostgreSQL driver types to appropriate Go types for formatters
2058+
rowData[col.GetName()] = convertPostgreSQLValue(scanValues[i], col.GetType())
20152059
}
20162060

20172061
chunk = append(chunk, rowData)

0 commit comments

Comments
 (0)