[SEP-54] Ledger Metadata Storage #1678
Replies: 6 comments 19 replies
-
|
@urvisavla pointed out the following differences between the spec and what we have implemented in galexie:
The reason I did not include the Regarding the root directory, I thought it would be useful to have a ledgers directory to separate the ledger keys from the config key. But I am open to feedback on both these points. |
Beta Was this translation helpful? Give feedback.
-
|
In what use cases can clients expect data providers to batch more than one ledger into a file? As ledgers get bigger and scale increases over time, will it still make sense to batch ledgers? I noticed on the AWS public deployment it is already only one ledger per file and it's tempting to write code that makes the assumption it'll always be one. I'm asking not to change the format, but wondering if for the most part clients can just assume it'll be one ledger per batch in the near future. |
Beta Was this translation helpful? Give feedback.
-
|
For Partition and Batch Formats, rather than expressing in go fmt, can it be plain text like: |
Beta Was this translation helpful? Give feedback.
-
The ledgers path seems somewhat unnecessary. The parameter seems to be a way to store the config file in a different location as the ledgers files. But I would think that has little utility. In the examples it's shown like:
Which implies the config is at the root, and then maybe there are ledgers for pubnet and testnet in different directories. However it wouldn't be possible to have two config files at the root where one references testnet and one references pubnet. In another part of the SEP it examples having the config located closer, which I think is more useful and really reduces the need for the ledger path.
It's a minor detail, I don't feel strongly about this, but it seems like an unnecessary field and the config could be located right next to the partitions. |
Beta Was this translation helpful? Give feedback.
-
|
The stellar history archives contain something similar to the config.json, in the form of the "History Archive State" or "HAS" JSON file, and we can learn a little from what that system found helpful to include: When looking at the HAS file format I see it records the For example: {
"version": 1,
"server": "stellar-core 23.0.0.rc4 (dcf3669574084a50f30d0d5f4dbf35af548486b2)",Ref: https://history.stellar.org/prd/core-live/core_live_001/.well-known/stellar-history.json The version of the file format could be the version of the SEP. The name could be a space separated string of the server name and version of the server. |
Beta Was this translation helpful? Give feedback.
-
|
The current Metadata section states:
This creates a gap for storage backends that don't support user-defined object metadata, such as:
When implementing a filesystem-based datastore for use in stellar/quickstart, I needed a way to store the per-object metadata (start-ledger, end-ledger, protocol-version, etc.) without relying on GCS/S3-specific features. For storage backends that don't support user-defined object metadata, metadata could be stored in a sidecar JSON file adjacent to the data file with a Example: Sidecar file contents: {
"start-ledger": "2",
"end-ledger": "3",
"start-ledger-close-time": "1622547800",
"end-ledger-close-time": "1622548900",
"protocol-version": "21",
"core-version": "v21.0.0",
"network-passphrase": "Public Global Stellar Network ; September 2015",
"compression-type": "zstd",
"version": "1.0.0"
}A diff to add this to SEP-0054 could be something like: ### Metadata
Every `LedgerCloseMetaBatch` value also has the following metadata associated
with it:
- `start-ledger` - (string) the starting ledger of the range of ledgers
enclosed in the `LedgerCloseMetaBatch` value.
- `end-ledger` - (string) the last ledger of the range of ledgers enclosed in
the `LedgerCloseMetaBatch` value.
- `start-ledger-close-time` - (string) the unix time stamp of the close time
for the starting ledger.
- `end-ledger-close-time` - (string) the unix time stamp of the close time for
the last ledger.
- `protocol-version` - (string) the protocol version of the last ledger.
- `core-version` - (string) the version of stellar-core which produced the
ledgers enclosed in the `LedgerCloseMetaBatch` value.
- `network-passphrase` - (string) the network passphrase for the Stellar
network which produced the ledgers.
- `compression-type` - (string) the compression algorithm used to compress the
`LedgerCloseMetaBatch` value (currently only
[`zstd`]([https://facebook.github.io/zstd/) is supported).
- `version` - (string) the version of
[galexie](https://github.com/stellar/go/tree/master/services/galexie) used to
insert the `LedgerCloseMetaBatch` value in the data store.
-These metadata key-value pairs can be implemented as user-defined object
-metadata in GCS or S3.
+#### Storage Methods
+
+Metadata can be stored using one of the following methods:
+
+1. **Native object metadata** (preferred for GCS/S3): Use user-defined object
+ metadata features provided by the storage backend.
+
+2. **Sidecar JSON files** (for filesystems and other backends): Store metadata
+ in a JSON file adjacent to the data file with a `.metadata.json` suffix
+ appended to the full filename (e.g., `FFFFFFFD--2-3.xdr.zst.metadata.json`).
+ The JSON file contains a single object with the metadata keys and string
+ values as specified above.
+
+When writing data with sidecar files, the metadata file SHOULD be written
+before the data file to ensure atomicity guarantees.
+
+When listing files, implementations SHOULD exclude `.metadata.json` files
+from the results.Another alternative might be to say the metadata is optional for other datastores. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Discussion for #1677
Simple Summary
A standard for how
LedgerCloseMetaobjects can be stored so that ledgers can be easily and efficiently ingested by downstream systems.
Dependencies
None.
Motivation
galexie is a service which publishes
LedgerCloseMetaXDR objects to a GCS(Google Cloud Storage) bucket. However, the data format and layout of the XDR objects are not formally documented. This
SEP aims to provide a comprehensive specification for storing LedgerCloseMeta objects, enabling third-party developers
to build compatible data stores and clients for retrieving ledger metadata.
Specification
The data store is a key-value store where:
LedgerCloseMetaBatchXDR values.The key-value store must support:
Examples of compatible key-value stores include Google Cloud Storage (GCS) and Amazon S3.
Value Format
Each value in the key-value store is the Zstandard compressed binary encoding of
the following XDR structure:
A LedgerCloseMetaBatch represents a contiguous range of one or more consecutive ledgers.
All batches in a data store instance contain the same number of ledgers.
Currently only Zstandard compression is supported but it is possible to extend
the SEP in the future to allow other compression algorithms.
Key Format
Keys follow a hierarchical directory structure, effectively acting as file paths within the data store. All the ledgers are stored under a configurable
/<ledgers-path>. Within the/<ledgers-path>directory there are subdirectories which represent partitions. Each partition contains a fixed number of batches:If the partition size is 1, the partition is omitted, resulting in:
Partition Format:
Example for
partitionStartLedgerSequence=0andpartitionEndLedgerSequence=15:FFFFFFFF--0-15Batch Format:
Example for
batchStartLedgerSequence=2andbatchEndLedgerSequence=3:FFFFFFFD--2-3.xdr.zstIf the batch size is 1, the format simplifies to:
Example for
batchStartLedgerSequence=2:FFFFFFFD--2.xdr.zstNote the
.zstsuffix is the filename extension defined in the ZstandardRFC. If this SEP is extended to support another compression algorithm
then the standard filename extension for the given compression algorithm will be used as a suffix in the batch name.
Configuration File
The data store includes a configuration JSON object stored under the key
/<config-path>. This file contains thefollowing properties:
networkPassphrase- (string) the passphrase for the Stellar network associated with the ledgers.ledgersPath- (string) the/<ledgers-path>directory which contains all the partitions and batches.compression- (string) the compression algorithm used to compress ledger objects (currently onlyzstdis supported).ledgersPerBatch- (integer) the number of ledgers bundled into eachLedgerCloseMetaBatch.batchesPerPartition- (integer) the number of batches in a partition.Example Configuration:
{ "networkPassphrase": "Public Global Stellar Network ; September 2015", "ledgersPath": "/stellar/pubnet/ledgers", "compression": "zstd", "ledgersPerBatch": 2, "batchesPerPartition": 8 }Example Key Structure
Below is an example list of keys (with
/<config-path>set to/stellar/pubnet/config.json) for ledger batches based on the above example configuration:Note: The genesis ledger starts at sequence number 2, so the oldest batch must have a
batchStartLedgerSequenceof 2.
Design Rationale
Key Encoding (Reversed Ledger Sequence)
encoding the most recent ledgers first, clients can efficiently retrieve the latest data without scanning the entire
dataset.
math.MaxUint32 - startLedgerensures that newer ledgers (with higher sequence numbers)appear before older ones when sorted lexicographically. This avoids the need for additional metadata or indexes to
determine the latest ledger.
Compression Algorithm
zstdwas chosen after evaluatingzstd,lz4, andgzip. It provides the best balance between compression ratioand decompression speed.
Security Concerns
Verifying the validity of the ledgers contained within the data store is outside the scope of this SEP. In otherwords,
this SEP does not provide any mechanism for validating that the ledgers obtained from a data store have not been
altered.
Beta Was this translation helpful? Give feedback.
All reactions