Skip to content

Commit 16d03d5

Browse files
committed
Solutions: Refurbish "Long-term store"
1 parent 9aa307a commit 16d03d5

File tree

6 files changed

+153
-132
lines changed

6 files changed

+153
-132
lines changed

docs/integrate/airflow/data-retention-hot-cold.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
(airflow-data-retention-hot-cold)=
2-
# Build a hot and cold storage data retention policy in CrateDB with Apache Airflow
2+
# Build a hot/cold storage data retention policy in CrateDB with Apache Airflow
33

44
This fourth article on automating recurring CrateDB queries with [Apache Airflow](https://airflow.apache.org/) presents a second data‑retention strategy. Previously, the {ref}`Data Retention Delete DAG <airflow-data-retention-policy>` dropped old partitions after a set period. This article adds a complementary hot/cold storage approach.
55

docs/solution/index.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
:hidden:
88
time-series/index
99
industrial/index
10+
longterm/index
1011
analytics/index
1112
machine-learning/index
1213
:::
@@ -15,7 +16,7 @@ machine-learning/index
1516
## Explanations
1617

1718
:::{div} sd-text-muted
18-
About time series data storage and analytics, and machine learning.
19+
About time series and long-term data storage, real-time analytics, and machine learning.
1920
:::
2021

2122
::::{grid} 1 2 2 2
@@ -36,6 +37,20 @@ and how to apply time series modeling and analysis procedures to your data.
3637
- Scientific computing
3738
:::
3839

40+
:::{grid-item-card} {material-outlined}`manage_history;2em` Long-term store
41+
:link: longterm-store
42+
:link-type: ref
43+
:link-alt: About storing time series data for the long term
44+
Permanently keeping your raw data accessible for querying yields insightful
45+
analysis opportunities other systems can't provide easily.
46+
+++
47+
**What's inside:**
48+
- Time-based bucketing.
49+
- Advanced querying.
50+
- Import data using Dask.
51+
- Optimizing storage for historic time series data.
52+
:::
53+
3954
:::{grid-item-card} {material-outlined}`model_training;2em` Machine learning
4055
:link: machine-learning
4156
:link-type: ref

docs/solution/longterm/index.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
(longterm-store)=
2+
(timeseries-longterm)=
3+
(timeseries-long-term-storage)=
4+
5+
# Long-term store
6+
7+
:::{div} sd-text-muted
8+
Never retire data just because your other systems can't handle the cardinality.
9+
:::
10+
11+
CrateDB stores large volumes of data, keeping it accessible for querying
12+
and insightful analysis, even considering historic data records.
13+
14+
Many organizations need to retain data for years or decades to meet regulatory
15+
requirements, support historical analysis, or preserve valuable insights for
16+
future use. However, traditional storage systems force you to choose between
17+
accessibility and affordability, often leading to data exports, archival
18+
systems, or downsampling that sacrifice query capabilities.
19+
20+
CrateDB eliminates this trade-off by storing large volumes of data efficiently
21+
while keeping it fully accessible for querying and analysis. Unlike systems
22+
that struggle with high cardinality or require expensive tiered architectures,
23+
CrateDB handles billions of unique records in a single platform, maintaining
24+
fast query performance even on historic datasets spanning years.
25+
26+
By keeping all your data in one place, you avoid the complexity and costs of
27+
exporting to specialized long-term storage systems, data lakes, or cold storage
28+
tiers. Your historical data remains as queryable as your recent data, enabling
29+
seamless analysis across any time range without data movement, ETL pipelines,
30+
or rehydration processes.
31+
32+
With CrateDB, compatible to PostgreSQL, you can do all of that using plain SQL.
33+
Other than integrating well with commodity systems using standard database
34+
access interfaces like ODBC or JDBC, it provides a proprietary HTTP interface
35+
on top.
36+
37+
## Use cases
38+
39+
:::{rubric} Metrics and monitoring
40+
:::
41+
42+
::::{grid} 1 1 1 2
43+
:gutter: 2
44+
:padding: 0
45+
46+
:::{grid-item-card} Prometheus
47+
:link: prometheus
48+
:link-type: ref
49+
Prometheus and similar monitoring systems excel at real-time alerting but face challenges
50+
with long-term metric retention due to storage costs and query performance at scale. CrateDB
51+
addresses these challenges by providing:
52+
- **Scalable long-term storage**: Store years of metrics without compromising query performance.
53+
- **High cardinality support**: Handle millions of unique label combinations that would overwhelm traditional TSDBs.
54+
- **Rich SQL analytics**: Perform complex analytical queries on historic metrics using standard SQL.
55+
- **Seamless integration**: Use CrateDB's Prometheus Adapter for transparent remote write/read operations.
56+
+++
57+
Set up CrateDB as a long-term metrics store for Prometheus.
58+
:::
59+
60+
:::{grid-item-card} OpenTelemetry
61+
:link: opentelemetry
62+
:link-type: ref
63+
OpenTelemetry and similar observability frameworks excel at generating rich telemetry data
64+
but face challenges with long-term retention due to storage scale and query complexity.
65+
CrateDB addresses these challenges by providing:
66+
- **Scalable long-term storage**: Store large volumes of telemetry through CrateDB's distributed architecture.
67+
- **Vendor-neutral ingestion**: Use OpenTelemetry SDKs/agents and Telegraf to send telemetry into your CrateDB observability pipeline.
68+
- **Rich SQL analytics**: Run SQL/time-series queries, aggregations and joins on telemetry data for troubleshooting and analytics.
69+
- **Flexible attribute mapping**: Customize which span/log/profile attributes become columns/tags for dimensional queries.
70+
+++
71+
Set up CrateDB as a long-term observability backend for OpenTelemetry.
72+
:::
73+
74+
::::
75+
76+
## Related sections
77+
78+
{ref}`metrics-store` includes information about how to
79+
store and analyze high volumes of system monitoring information
80+
like metrics and log data with CrateDB.
81+
82+
{ref}`analytics` describes how
83+
CrateDB provides real-time analytics on raw data stored for the long term.
84+
Keep massive amounts of data ready in the hot zone for analytics purposes.
85+
86+
[Optimizing storage efficiency for historic time series data]
87+
illustrates how to reduce table storage size by 80%,
88+
by using arrays for time-based bucketing, a historical table having
89+
a dedicated layout, and querying using the UNNEST table function.
90+
91+
{ref}`Build a hot/cold storage data retention policy <airflow-data-retention-hot-cold>`
92+
describes how to manage aging data by leveraging CrateDB cluster
93+
features to mix nodes with different hardware setups, i.e. hot
94+
nodes using the latest generation of NVMe drives for responding
95+
to analytics queries quickly, and cold nodes that have access to
96+
cheap mass storage for retaining historic data.
97+
98+
{ref}`weather-data-storage` provides information about how to
99+
use CrateDB for mass storage of synoptic weather observations,
100+
allowing you to query them efficiently.
101+
102+
103+
[Optimizing storage efficiency for historic time series data]: https://community.cratedb.com/t/optimizing-storage-for-historic-time-series-data/762

docs/solution/time-series/index.md

Lines changed: 1 addition & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -69,21 +69,6 @@ Machine Learning on Time Series Data: EDA, Decomposition, AutoML.
6969
:::
7070

7171

72-
:::{grid-item-card} {material-outlined}`manage_history;2em` Long-term storage
73-
:link: timeseries-longterm
74-
:link-type: ref
75-
:link-alt: About storing time series data for the long term
76-
77-
Run efficient data operations for current and historical time series data.
78-
79-
+++
80-
**What's inside:**
81-
Time-based bucketing.
82-
Import data using Dask.
83-
Optimizing storage for historic time series data.
84-
:::
85-
86-
8772
::::
8873

8974

@@ -92,6 +77,7 @@ Optimizing storage for historic time series data.
9277
**Domains:**
9378
{ref}`analytics`
9479
{ref}`industrial`
80+
{ref}`longterm-store`
9581
{ref}`machine-learning`
9682
{ref}`metrics-store`
9783

@@ -114,7 +100,6 @@ Optimizing storage for historic time series data.
114100
Fundamentals <fundamentals>
115101
Advanced analysis <analysis>
116102
video
117-
Long-term store <longterm>
118103
:::
119104

120105

docs/solution/time-series/longterm.md

Lines changed: 0 additions & 113 deletions
This file was deleted.

docs/start/application/index.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
(example-applications)=
2-
# Sample Applications
2+
# Sample applications
33

44

55
:::{rubric} Starter
@@ -87,3 +87,34 @@ Users can ask questions of the knowledge base using natural language.
8787
:::
8888

8989
::::
90+
91+
92+
:::{rubric} Community
93+
:::
94+
95+
:::::{grid} 1 2 2 3
96+
:gutter: 2
97+
98+
::::{grid-item-card}
99+
:link: https://wetterdienst.readthedocs.io/en/latest/usage/python-api.html#export
100+
:link-type: url
101+
(weather-data-storage)=
102+
:::{rubric} Store and analyze massive amounts of synoptic weather data
103+
:::
104+
Wetterdienst uses CrateDB for mass storage of weather data, allowing you to
105+
query it efficiently. It provides access to data at more than ten canonical
106+
sources of raw weather data from domestic weather agencies.
107+
+++
108+
**What's inside:**
109+
110+
{tags-primary}`Earth observations`
111+
{tags-primary}`Metadata`
112+
{tags-primary}`Sensor data`
113+
{tags-primary}`Time series`
114+
115+
{tags-secondary}`pandas`
116+
{tags-secondary}`Polars`
117+
{tags-secondary}`SQL`
118+
::::
119+
120+
:::::

0 commit comments

Comments
 (0)