Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/src/format/file/encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,40 @@ string array might both be stored in the same layout but, at read time, we will
the size of the offsets returned to the user. There is no requirement the output Arrow type matches the input Arrow
type. For example, it is acceptable to write an array as "large string" and then read it back as "string".

Lance supports the following Arrow data types, organized by their encoding strategy:

### Primitive Types
These types are encoded using `PrimitiveStructuralEncoder`:

| Data Type Category | Data Types | Notes |
|--------------------|----------------------------------------------------------|---------------------------|
| **Integer** | Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64 | Basic integer types |
| **Float** | Float16, Float32, Float64 | Floating-point types |
| **Boolean** | Boolean | Boolean type |
| **Date/Time** | Date32, Date64, Time32, Time64, Timestamp, Duration | Temporal types |
| **Decimal** | Decimal128, Decimal256 | High-precision decimals |
| **Binary/String** | Binary, LargeBinary, Utf8, LargeUtf8 | Variable-width data |
| **Fixed-size** | FixedSizeBinary, FixedSizeList | Fixed-width arrays |
| **Special** | Null | Null type |
| **Dictionary** | Dictionary | For primitive values only |

### Nested Types
These types use specialized logical encoders:

| Logical Data Type | Arrow Data Type | Encoding Strategy | Notes |
|--------------------------|----------------------|------------------------------------------------------------------------------|--------------------------------------------------------|
| **Struct** | Struct | `StructStructuralEncoder` (or `PrimitiveStructuralEncoder` for packed/empty) | Supported in 2.1+ |
| **List** / **LargeList** | List / LargeList | `ListStructuralEncoder` | Supported in 2.1+ |
| **Map** | Map | `MapStructuralEncoder` | **Supported in 2.2+** *(keys_sorted=false only)* |
| **Blob v1** | Binary / LargeBinary | `BlobStructuralEncoder` | Supported in 2.1+ (must be marked as blob field) |
| **Blob v2** | Struct | `BlobV2StructuralEncoder` | **Supported in 2.2+** *(must be marked as blob field)* |

**Important limitations:**
- Dictionary with logical/non-primitive value types is not supported
- Map type requires `keys_sorted=false`
- Blob v1 only supports Arrow Binary/LargeBinary data types
- Blob v2 only supports Arrow Struct data types

## Search Cache

The search cache is a key component of the Lance file reader. Random access requires that we locate the physical
Expand Down