From ad1af30c8e5f57825a2bafe07cab13ecaa2c704d Mon Sep 17 00:00:00 2001 From: xloya Date: Wed, 17 Dec 2025 12:06:21 +0800 Subject: [PATCH 1/4] add docs --- docs/src/format/file/encoding.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/docs/src/format/file/encoding.md b/docs/src/format/file/encoding.md index 61012b9230..b670bab01f 100644 --- a/docs/src/format/file/encoding.md +++ b/docs/src/format/file/encoding.md @@ -32,6 +32,38 @@ string array might both be stored in the same layout but, at read time, we will the size of the offsets returned to the user. There is no requirement the output Arrow type matches the input Arrow type. For example, it is acceptable to write an array as "large string" and then read it back as "string". +Lance supports the following Arrow data types, organized by their encoding strategy: + +### Primitive Types +These types are encoded using `PrimitiveStructuralEncoder`: + +| Data Type Category | Data Types | Notes | +|--------------------|----------------------------------------------------------|-------------------------| +| **Integer** | Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64 | Basic integer types | +| **Float** | Float16, Float32, Float64 | Floating-point types | +| **Boolean** | Boolean | Boolean type | +| **Date/Time** | Date32, Date64, Time32, Time64, Timestamp, Duration | Temporal types | +| **Decimal** | Decimal128, Decimal256 | High-precision decimals | +| **Binary/String** | Binary, LargeBinary, Utf8, LargeUtf8 | Variable-width data | +| **Fixed-size** | FixedSizeBinary, FixedSizeList | Fixed-width arrays | +| **Special** | Null | Null type | + +### Nested Types +These types use specialized logical encoders: + +| Data Type | Encoding Strategy | Version Notes | +|--------------------------|------------------------------------------------------------------------------|--------------------------------------------------------| +| **Struct** | `StructStructuralEncoder` (or `PrimitiveStructuralEncoder` for packed/empty) | Supported in 2.1+ | +| **List** / **LargeList** | `ListStructuralEncoder` | Supported in 2.1+ | +| **Dictionary** | `PrimitiveStructuralEncoder` (for primitive values only) | Supported in 2.1+ | +| **Map** | `MapStructuralEncoder` | **Supported in 2.2+** *(keys_sorted=false only)* | +| **Blob v2 struct** | `BlobV2StructuralEncoder` | **Supported in 2.2+** *(must be marked as blob field)* | + +**Important limitations:** +- Dictionary with logical/non-primitive value types is not supported +- Map type requires `keys_sorted=false` +- Blob v2 struct requires the field to be marked as blob metadata + ## Search Cache The search cache is a key component of the Lance file reader. Random access requires that we locate the physical From 9d52f9d7ff06994ae4a23e3ec26e3e9403dc8112 Mon Sep 17 00:00:00 2001 From: xloya Date: Wed, 17 Dec 2025 12:24:48 +0800 Subject: [PATCH 2/4] update --- docs/src/format/file/encoding.md | 36 +++++++++++++++++--------------- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/docs/src/format/file/encoding.md b/docs/src/format/file/encoding.md index b670bab01f..f6398c322d 100644 --- a/docs/src/format/file/encoding.md +++ b/docs/src/format/file/encoding.md @@ -37,31 +37,33 @@ Lance supports the following Arrow data types, organized by their encoding strat ### Primitive Types These types are encoded using `PrimitiveStructuralEncoder`: -| Data Type Category | Data Types | Notes | -|--------------------|----------------------------------------------------------|-------------------------| -| **Integer** | Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64 | Basic integer types | -| **Float** | Float16, Float32, Float64 | Floating-point types | -| **Boolean** | Boolean | Boolean type | -| **Date/Time** | Date32, Date64, Time32, Time64, Timestamp, Duration | Temporal types | -| **Decimal** | Decimal128, Decimal256 | High-precision decimals | -| **Binary/String** | Binary, LargeBinary, Utf8, LargeUtf8 | Variable-width data | -| **Fixed-size** | FixedSizeBinary, FixedSizeList | Fixed-width arrays | -| **Special** | Null | Null type | +| Data Type Category | Data Types | Notes | +|--------------------|----------------------------------------------------------|---------------------------| +| **Integer** | Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64 | Basic integer types | +| **Float** | Float16, Float32, Float64 | Floating-point types | +| **Boolean** | Boolean | Boolean type | +| **Date/Time** | Date32, Date64, Time32, Time64, Timestamp, Duration | Temporal types | +| **Decimal** | Decimal128, Decimal256 | High-precision decimals | +| **Binary/String** | Binary, LargeBinary, Utf8, LargeUtf8 | Variable-width data | +| **Fixed-size** | FixedSizeBinary, FixedSizeList | Fixed-width arrays | +| **Special** | Null | Null type | +| **Dictionary** | Dictionary | For primitive values only | ### Nested Types These types use specialized logical encoders: -| Data Type | Encoding Strategy | Version Notes | -|--------------------------|------------------------------------------------------------------------------|--------------------------------------------------------| -| **Struct** | `StructStructuralEncoder` (or `PrimitiveStructuralEncoder` for packed/empty) | Supported in 2.1+ | -| **List** / **LargeList** | `ListStructuralEncoder` | Supported in 2.1+ | -| **Dictionary** | `PrimitiveStructuralEncoder` (for primitive values only) | Supported in 2.1+ | -| **Map** | `MapStructuralEncoder` | **Supported in 2.2+** *(keys_sorted=false only)* | -| **Blob v2 struct** | `BlobV2StructuralEncoder` | **Supported in 2.2+** *(must be marked as blob field)* | +| Logical Data Type | Arrow Data Type | Encoding Strategy | Version Notes | +|--------------------------|----------------------|------------------------------------------------------------------------------|--------------------------------------------------------| +| **Struct** | Struct | `StructStructuralEncoder` (or `PrimitiveStructuralEncoder` for packed/empty) | Supported in 2.1+ | +| **List** / **LargeList** | List / LargeList | `ListStructuralEncoder` | Supported in 2.1+ | +| **Map** | Map | `MapStructuralEncoder` | **Supported in 2.2+** *(keys_sorted=false only)* | +| **Blob v1** | Binary / LargeBinary | `BlobStructuralEncoder` | Supported in 2.1+ (must be marked as blob field) | +| **Blob v2** | Struct | `BlobV2StructuralEncoder` | **Supported in 2.2+** *(must be marked as blob field)* | **Important limitations:** - Dictionary with logical/non-primitive value types is not supported - Map type requires `keys_sorted=false` +- Blob v1 only supports Arrow Binary/LargeBinary data types - Blob v2 struct requires the field to be marked as blob metadata ## Search Cache From 44b2242c9bb28d1b48f9b41fc63ad8dadff860ed Mon Sep 17 00:00:00 2001 From: xloya Date: Wed, 17 Dec 2025 12:27:24 +0800 Subject: [PATCH 3/4] update --- docs/src/format/file/encoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/format/file/encoding.md b/docs/src/format/file/encoding.md index f6398c322d..d9f95b6e7f 100644 --- a/docs/src/format/file/encoding.md +++ b/docs/src/format/file/encoding.md @@ -64,7 +64,7 @@ These types use specialized logical encoders: - Dictionary with logical/non-primitive value types is not supported - Map type requires `keys_sorted=false` - Blob v1 only supports Arrow Binary/LargeBinary data types -- Blob v2 struct requires the field to be marked as blob metadata +- Blob v2 only supports Arrow Struct data types ## Search Cache From 1b7110eaf24b084aeda797fef38bb62c8a1892ac Mon Sep 17 00:00:00 2001 From: xloya Date: Wed, 17 Dec 2025 12:28:36 +0800 Subject: [PATCH 4/4] update --- docs/src/format/file/encoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/format/file/encoding.md b/docs/src/format/file/encoding.md index d9f95b6e7f..cae5e66b4a 100644 --- a/docs/src/format/file/encoding.md +++ b/docs/src/format/file/encoding.md @@ -52,7 +52,7 @@ These types are encoded using `PrimitiveStructuralEncoder`: ### Nested Types These types use specialized logical encoders: -| Logical Data Type | Arrow Data Type | Encoding Strategy | Version Notes | +| Logical Data Type | Arrow Data Type | Encoding Strategy | Notes | |--------------------------|----------------------|------------------------------------------------------------------------------|--------------------------------------------------------| | **Struct** | Struct | `StructStructuralEncoder` (or `PrimitiveStructuralEncoder` for packed/empty) | Supported in 2.1+ | | **List** / **LargeList** | List / LargeList | `ListStructuralEncoder` | Supported in 2.1+ |