Skip to content

Commit 41f5039

Browse files
committed
Long-term store: "Automatic retention and expiration" to separate page
1 parent 5230f1a commit 41f5039

File tree

2 files changed

+49
-40
lines changed

2 files changed

+49
-40
lines changed

docs/solution/longterm/index.md

Lines changed: 9 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,11 @@
44

55
# Long-term store
66

7+
:::{toctree}
8+
:hidden:
9+
retention
10+
:::
11+
712
:::{div} sd-text-muted
813
Never retire data just because your other systems can't handle the cardinality.
914
:::
@@ -73,43 +78,6 @@ Set up CrateDB as a long-term observability backend for OpenTelemetry.
7378

7479
::::
7580

76-
## Tools
77-
78-
### Automatic retention and expiration
79-
80-
When operating a system storing and processing large amounts of data,
81-
it is crucial to manage data flows and life-cycles well, which includes
82-
handling concerns of data expiry, size reduction, and archival.
83-
84-
Optimally, corresponding tasks are automated rather than manually
85-
performed. CrateDB provides relevant integrations and standalone
86-
applications for automatic data retention purposes.
87-
88-
:::{rubric} Apache Airflow
89-
:::
90-
91-
{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
92-
describes how to manage aging data by leveraging CrateDB cluster
93-
features to mix nodes with different hardware setups, i.e. hot
94-
nodes using the latest generation of NVMe drives for responding
95-
to analytics queries quickly, and cold nodes that have access to
96-
cheap mass storage for retaining historic data.
97-
98-
:::{rubric} CrateDB Toolkit
99-
:::
100-
101-
[CrateDB Toolkit Retention and Expiration] is a data retention and
102-
expiration policy management system for CrateDB, providing multiple
103-
retention strategies.
104-
105-
:::{note}
106-
The system derives its concepts from [InfluxDB data retention] ideas and
107-
from the {ref}`Airflow-based data retention tasks for CrateDB <airflow-data-retention-policy>`,
108-
but aims to be usable as a standalone system in different software environments.
109-
Effectively, it is a Python library and CLI around a policy management
110-
table defined per [retention-policy-ddl.sql].
111-
:::
112-
11381
## Related sections
11482

11583
{ref}`metrics-store` includes information about how to
@@ -120,6 +88,10 @@ like metrics and log data with CrateDB.
12088
CrateDB provides real-time analytics on raw data stored for the long term.
12189
Keep massive amounts of data ready in the hot zone for analytics purposes.
12290

91+
{ref}`retention` illustrates how to optimally implement data retention
92+
procedures, to manage the life-cycle of data stored in CrateDB, handling
93+
concerns of data expiry, size reduction, and archival.
94+
12395
[Optimizing storage efficiency for historic time series data]
12496
illustrates how to reduce table storage size by 80%,
12597
by using arrays for time-based bucketing, a historical table having
@@ -130,7 +102,4 @@ use CrateDB for mass storage of synoptic weather observations,
130102
allowing you to query them efficiently.
131103

132104

133-
[CrateDB Toolkit Retention and Expiration]: https://cratedb-toolkit.readthedocs.io/retention.html
134-
[InfluxDB data retention]: https://docs.influxdata.com/influxdb/v1/guides/downsample_and_retain/
135105
[Optimizing storage efficiency for historic time series data]: https://community.cratedb.com/t/optimizing-storage-for-historic-time-series-data/762
136-
[retention-policy-ddl.sql]: https://github.com/crate/cratedb-toolkit/blob/main/cratedb_toolkit/retention/setup/schema.sql
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
(expiration)=
2+
(retention)=
3+
4+
# Automatic retention and expiration
5+
6+
When operating a system storing and processing large amounts of data,
7+
it is crucial to manage data flows and life-cycles well, which includes
8+
handling concerns of data expiry, size reduction, and archival.
9+
10+
Optimally, corresponding tasks are automated rather than manually
11+
performed. CrateDB provides relevant integrations and standalone
12+
applications for automatic data retention purposes.
13+
14+
:::{rubric} Apache Airflow
15+
:::
16+
17+
{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
18+
describes how to manage aging data by leveraging CrateDB cluster
19+
features to mix nodes with different hardware setups, i.e. hot
20+
nodes using the latest generation of NVMe drives for responding
21+
to analytics queries quickly, and cold nodes that have access to
22+
cheap mass storage for retaining historic data.
23+
24+
:::{rubric} CrateDB Toolkit
25+
:::
26+
27+
[CrateDB Toolkit Retention and Expiration] is a data retention and
28+
expiration policy management system for CrateDB, providing multiple
29+
retention strategies.
30+
31+
The system derives its concepts from [InfluxDB data retention] ideas and
32+
from the {ref}`Airflow-based data retention tasks for CrateDB <airflow-data-retention-policy>`,
33+
but aims to be usable as a standalone system in different software environments.
34+
Effectively, it is a Python library and CLI around a policy management
35+
table defined per [retention-policy-ddl.sql].
36+
37+
38+
[CrateDB Toolkit Retention and Expiration]: https://cratedb-toolkit.readthedocs.io/retention.html
39+
[InfluxDB data retention]: https://docs.influxdata.com/influxdb/v1/guides/downsample_and_retain/
40+
[retention-policy-ddl.sql]: https://github.com/crate/cratedb-toolkit/blob/main/cratedb_toolkit/retention/setup/schema.sql

0 commit comments

Comments
 (0)