44
55# Long-term store
66
7+ :::{toctree}
8+ :hidden:
9+ retention
10+ :::
11+
712:::{div} sd-text-muted
813Never retire data just because your other systems can't handle the cardinality.
914:::
@@ -73,43 +78,6 @@ Set up CrateDB as a long-term observability backend for OpenTelemetry.
7378
7479::::
7580
76- ## Tools
77-
78- ### Automatic retention and expiration
79-
80- When operating a system storing and processing large amounts of data,
81- it is crucial to manage data flows and life-cycles well, which includes
82- handling concerns of data expiry, size reduction, and archival.
83-
84- Optimally, corresponding tasks are automated rather than manually
85- performed. CrateDB provides relevant integrations and standalone
86- applications for automatic data retention purposes.
87-
88- :::{rubric} Apache Airflow
89- :::
90-
91- {ref}` Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold> `
92- describes how to manage aging data by leveraging CrateDB cluster
93- features to mix nodes with different hardware setups, i.e. hot
94- nodes using the latest generation of NVMe drives for responding
95- to analytics queries quickly, and cold nodes that have access to
96- cheap mass storage for retaining historic data.
97-
98- :::{rubric} CrateDB Toolkit
99- :::
100-
101- [ CrateDB Toolkit Retention and Expiration] is a data retention and
102- expiration policy management system for CrateDB, providing multiple
103- retention strategies.
104-
105- :::{note}
106- The system derives its concepts from [ InfluxDB data retention] ideas and
107- from the {ref}` Airflow-based data retention tasks for CrateDB <airflow-data-retention-policy> ` ,
108- but aims to be usable as a standalone system in different software environments.
109- Effectively, it is a Python library and CLI around a policy management
110- table defined per [ retention-policy-ddl.sql] .
111- :::
112-
11381## Related sections
11482
11583{ref}` metrics-store ` includes information about how to
@@ -120,6 +88,10 @@ like metrics and log data with CrateDB.
12088CrateDB provides real-time analytics on raw data stored for the long term.
12189Keep massive amounts of data ready in the hot zone for analytics purposes.
12290
91+ {ref}` retention ` illustrates how to optimally implement data retention
92+ procedures, to manage the life-cycle of data stored in CrateDB, handling
93+ concerns of data expiry, size reduction, and archival.
94+
12395[ Optimizing storage efficiency for historic time series data]
12496illustrates how to reduce table storage size by 80%,
12597by using arrays for time-based bucketing, a historical table having
@@ -130,7 +102,4 @@ use CrateDB for mass storage of synoptic weather observations,
130102allowing you to query them efficiently.
131103
132104
133- [ CrateDB Toolkit Retention and Expiration ] : https://cratedb-toolkit.readthedocs.io/retention.html
134- [ InfluxDB data retention ] : https://docs.influxdata.com/influxdb/v1/guides/downsample_and_retain/
135105[ Optimizing storage efficiency for historic time series data ] : https://community.cratedb.com/t/optimizing-storage-for-historic-time-series-data/762
136- [ retention-policy-ddl.sql ] : https://github.com/crate/cratedb-toolkit/blob/main/cratedb_toolkit/retention/setup/schema.sql
0 commit comments