Skip to content

An elastic and reliable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy

License

Notifications You must be signed in to change notification settings

ZhiHanZ/databend

 
 

Repository files navigation

Databend

ANY DATA. ANY SCALE. ONE DATABASE.

Blazing analytics, fast search, geo insights, vector AI — supercharged in a new-era Snowflake-compatible warehouse


databend

Why Databend?

Multimodal Data Warehouse: Analyze structured, semi-structured, vector, and geospatial data with unified Snowflake-compatible SQL.

AI-Native Platform: Built-in vector search, AI functions, embedding generation, and full-text search - no separate systems needed.

10x Faster & 90% Cost Reduction: Rust-powered vectorized execution with S3-native storage eliminates vendor lock-in and proprietary overhead.

Deploy Anywhere, Connect Everything: 100% open source - run locally with pip install databend, self-host, or use managed cloud clusters. All instances share the same data seamlessly.

Production Proven: Trusted by world-class enterprises managing 800+ petabytes and 100+ million queries daily.

Enterprise Ready: Fine-grained access control, data masking, and audit logging with complete data sovereignty.

Quick Start

Option 1: Databend Cloud Warehouse (Recommended)

Start with Databend Cloud - Serverless warehouse clusters, production-ready in 60 seconds

Option 2: Local Development with Python

pip install databend
import databend

ctx = databend.SessionContext()

# Local table for quick testing
ctx.sql("CREATE TABLE products (id INT, name STRING, price FLOAT)").collect()
ctx.sql("INSERT INTO products VALUES (1, 'Laptop', 1299.99), (2, 'Phone', 899.50)").collect()
ctx.sql("SELECT * FROM products").show()

# S3 remote table (same as cloud warehouse)
ctx.create_s3_connection("s3", "your_key", "your_secret")
ctx.sql("CREATE TABLE sales (id INT, revenue FLOAT) 's3://bucket/sales/' CONNECTION=(connection_name='s3')").collect()
ctx.sql("SELECT COUNT(*) FROM sales").show()

Option 3: Docker (Self-Host Experience)

docker run -p 8000:8000 datafuselabs/databend

Experience the full warehouse capabilities locally - same features as cloud clusters.

Benchmarks

Performance: TPC-H vs Snowflake | ClickBench Results Cost: 90% Cost Reduction

Architecture

Databend Architecture

Multimodal Cloud Warehouse: Production clusters analyze structured, semi-structured, vector, and geospatial data with Snowflake-compatible SQL. Local development environments can attach to the same warehouse data for seamless development.

Use Cases

  • Data Analytics: Snowflake alternative with significant cost reduction
  • AI/ML Pipelines: Vector search and AI functions built-in
  • Real-time Analytics: High-performance queries on petabyte-scale data
  • Data Lake Analytics: Query Parquet, CSV, TSV, NDJSON, Avro, ORC directly from S3

Community

Contributors get immortalized in system.contributors table! 🏆

📄 License

Apache License 2.0 + Elastic License 2.0 Licensing FAQs


Built by engineers who redefine what's possible with data
🌐 Website🐦 Twitter🗺️ Roadmap

About

An elastic and reliable Cloud Warehouse, offers Blazing Fast Query and combines Elasticity, Simplicity, Low cost of the Cloud, built to make the Data Cloud easy

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 96.3%
  • Shell 2.1%
  • Python 1.5%
  • Jinja 0.1%
  • Lua 0.0%
  • Dockerfile 0.0%