A Kubernetes operator image for executing dbt workloads and CI/CD metadata operations, supporting multiple data warehouse platforms with comprehensive data transformation and metadata management capabilities.
This Docker image provides a complete dbt execution environment for Kubernetes-based data workflows and CI/CD pipelines. It's designed to be used as a Kubernetes operator for running dbt workloads efficiently, while also supporting metadata operations and sharing across different Fast.BI services. The image includes support for multiple data warehouse platforms, automated secret management, and comprehensive error handling for production-grade data transformation workflows.
DBT Execution Engine: Provides a complete dbt runtime environment with support for multiple data warehouse adapters including BigQuery, Snowflake, Redshift, and Fabric.
Kubernetes Operator: Designed to run as a Kubernetes pod with proper resource management, secret handling, and lifecycle management for dbt workloads.
Metadata Management System: Integrates with DataHub for metadata collection and sharing across Fast.BI services, enabling comprehensive data lineage tracking.
Secret Management: Handles secure authentication for multiple data warehouse platforms through Kubernetes secrets and service account management.
Git Integration: Supports repository cloning, branch management, and project configuration for CI/CD workflows.
- Base: Python 3.11.11-slim-bullseye
# Build the image
./build.sh
# Or manually
docker build -t dbt-workflow-core .| Argument | Description | Default |
|---|---|---|
build_for |
Target platform for the build | linux/amd64 |
The container expects the following environment variables:
GITLINK_SECRET- Git repository URL for dbt project cloningGIT_BRANCH- Git branch to clone (optional, defaults to main)DBT_PROJECT_DIRECTORY- Directory within the repository containing the dbt projectPROFILES_DIR_PATH- Path to dbt profiles (default, repo, or custom)DATA_WAREHOUSE_PLATFORM- Target data warehouse platform (bigquery, snowflake, redshift, fabric)DBT_COMMAND- dbt command to execute (run, test, seed, etc.)MODEL- Specific model or selection criteria for dbt executionGIT_URL- Git repository URL for DNS resolution testing
The image supports configuration through the following files:
macros/*.sql: Custom dbt macros for data transformation logicdbt_lint/*.py: Python scripts for dbt project linting and validationmetadata_cli/: DataHub metadata collection configurationsdbt-refresh-incremental/: Incremental model refresh utilities
The image orchestrates a complete dbt workflow execution process:
- Environment Setup: Configures data warehouse authentication and project environment
- Repository Cloning: Clones dbt project from Git repository with branch support
- Project Configuration: Sets up dbt profiles and project structure
- Secret Management: Handles authentication for multiple data warehouse platforms
- DBT Execution: Runs specified dbt commands with proper error handling
- Metadata Collection: Collects and shares metadata with DataHub for lineage tracking
BigQuery: Full support with GCP service account authentication and dataset management Snowflake: Complete dbt integration with Snowflake-specific optimizations Redshift: AWS Redshift support with proper connection management Fabric: Microsoft Fabric data warehouse integration
- Automated Repository Management: Git cloning with branch support and DNS resolution
- Secret Injection: Kubernetes secret management for secure authentication
- Error Handling: Comprehensive error logging and handling for production workflows
- Metadata Sharing: Integration with DataHub for cross-service metadata management
- Resource Cleanup: Proper cleanup of temporary files and authentication credentials
- Comprehensive error logging to dedicated log files
- Graceful handling of authentication failures
- DNS resolution testing for Git repositories
- Proper cleanup of temporary resources and secrets
- Detailed error reporting for troubleshooting
- Workflow Health Monitoring: Monitors dbt execution status and performance
- Resource Management: Manages Kubernetes resources and memory usage
- Secret Rotation: Supports service account key rotation and credential updates
- Metadata Synchronization: Ensures metadata consistency across Fast.BI services
The image includes built-in health checks:
# Check container health
docker inspect --format='{{.State.Health.Status}}' dbt-workflow-core
# View health check logs
docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' dbt-workflow-core# Run dbt models
docker run dbt-workflow-core dbt run --select model_name
# Run dbt tests
docker run dbt-workflow-core dbt test --select model_name
# Run dbt seeds
docker run dbt-workflow-core dbt seed --select seed_name
# Run dbt docs
docker run dbt-workflow-core dbt docs generateProblem: Cannot clone dbt project from Git repository
Solution: Verify GITLINK_SECRET is correct and repository is accessible
Problem: Failed to authenticate with data warehouse platform
Solution: Check mounted secrets and service account configuration
Problem: dbt project structure or profiles configuration is invalid
Solution: Verify DBT_PROJECT_DIRECTORY and PROFILES_DIR_PATH settings
Problem: Cannot resolve Git repository URL
Solution: Check network connectivity and DNS configuration
Problem: Failed to collect or share metadata with DataHub
Solution: Verify DataHub connectivity and metadata configuration
- Documentation: Fast.BI Documentation
- Issues: GitHub Issues
- Email: [email protected]
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025 Fast.BI
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
This project is part of the FastBI platform infrastructure.
For support and questions, contact: [email protected]