- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.3k
 
feat(wren-launcher): Expanded dbt<>Wren MDL conversions, added BigQuery support #1965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit incorporates a series of changes based on code review feedback to improve the dbt to Wren MDL converter. The updates focus on enhancing security, adding more robust validation and parsing, and increasing the completeness of the generated MDL file. In `wren_mdl.go`: - feat: Adds a `DisplayName` field to the `WrenColumn` struct to support user-friendly labels for columns. In `data_source.go`: - feat: Adds JSON validation for the `keyfile_json` field in BigQuery data sources to provide earlier feedback on malformed credentials. - refactor: Implements BigQuery-specific data type mappings in the `MapType` method to correctly convert types like `INT64`, `TIMESTAMP`, and `NUMERIC`. In `converter.go`: - fix(security): Omits the `keyfile_json` content from the generated `wren-datasource.json` to prevent exposing sensitive credentials in the output file. - feat: Adds mapping for a dbt column's `meta.label` to the `DisplayName` property in the corresponding Wren `WrenColumn`, improving the usability of the output. - refactor: Enhances enum name generation with more robust sanitization, replacing all non-alphanumeric characters to ensure valid identifiers. - refactor: Improves the `ref()` parsing regex to handle optional whitespace, making it more resilient to formatting variations. - refactor: Adds validation for metric `input_measures` to log a warning if a referenced measure cannot be found, improving debuggability. - fix: Resolves a persistent compilation error related to an incorrect type assertion on `inputMeasures` by refactoring the logic to correctly identify the base model for a metric.
* fix: Adjusted some behaviors around keyfiles vs. inlined JSON * fix: Corrected a logical flaw where ratio metrics in dbt were calculated at a row level (numerator / denominator) instead of after aggregation. The code now correctly generates expressions like SUM(numerator) / SUM(denominator). * perf: Improved performance by pre-compiling a regular expression for parsing ref() functions at the package level, preventing it from being re-compiled on every function call.
test: Adjusted data source testing fix: Go fmt linting
chore: Lint updates
chore: Ran go fmt one more time for good measure chore(test): Adjusted test on abs_path to work cross-platform
          
WalkthroughAdds a CLI/runtime flag to include staging models in dbt conversion, threads it through APIs and converter. Enhances converter to optionally load semantic_manifest.json, generate metrics/enums/relationships, and enrich MDL structures. Introduces BigQuery data source support with credentials handling. Expands profiles parsing, tests, MDL schema, and interactive launcher prompt. Changes
 Sequence Diagram(s)sequenceDiagram
  autonumber
  actor User
  participant CLI as wren-launcher
  participant DBT as dbt converter
  participant Conv as Converter Core
  participant FS as Filesystem
  User->>CLI: run dbt convert
  CLI->>User: Prompt include staging models?
  User-->>CLI: yes/no
  CLI->>DBT: DbtConvertProject(..., includeStagingModels)
  DBT->>Conv: ConvertOptions{ IncludeStagingModels: bool }
  Conv->>FS: Read manifest.json, catalog.json
  alt semantic manifest present
    Conv->>FS: Read semantic_manifest.json
    Conv->>Conv: Build metrics/time dimensions
  end
  Conv->>Conv: Convert nodes (apply IncludeStagingModels)
  Conv->>Conv: Generate enums/relationships
  Conv-->>DBT: WrenMDLManifest
  DBT-->>CLI: ConvertResult
  CLI-->>User: Output generated
    sequenceDiagram
  autonumber
  participant Prof as dbt profiles.yml
  participant Parser as profiles_analyzer
  participant DS as DataSource Builder
  Prof-->>Parser: connection map
  Parser->>Parser: Extract type, method, project, dataset, keyfile_json/keyfile
  alt type == "bigquery"
    Parser-->>DS: DbtConnection{Method, Project, Dataset, Keyfile/JSON}
    DS->>DS: Resolve keyfile path (abs/rel), read JSON
    DS->>DS: Base64-encode credentials
    DS-->>Parser: WrenBigQueryDataSource
  else other types
    Parser-->>DS: DbtConnection{...}
    DS-->>Parser: Existing data source type
  end
    Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
 Suggested reviewers
 Poem
 Pre-merge checks and finishing touches✅ Passed checks (3 passed)
 ✨ Finishing touches
 🧪 Generate unit tests
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment   | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (5)
wren-launcher/utils/docker.go (4)
93-111: downloadFile: no timeout and no HTTP status check. Can hang or save 404/HTML.Use an HTTP client with timeout and validate
StatusCodebefore writing to disk.import ( "context" "fmt" "io" "net/http" + "time" "os" "path" "regexp" "strings" ) @@ func downloadFile(filepath string, url string) error { - // Get the data - resp, err := http.Get(url) // #nosec G107 -- URL is from trusted source constants + // Get the data + client := &http.Client{Timeout: 30 * time.Second} + resp, err := client.Get(url) // #nosec G107 -- URL is from trusted source constants if err != nil { return err } defer func() { _ = resp.Body.Close() }() + if resp.StatusCode != http.StatusOK { + return fmt.Errorf("download failed: %s returned %d", url, resp.StatusCode) + } @@ // Write the body to file _, err = io.Copy(out, resp.Body) return err }
180-183: mergeEnvContent: swallows non-ENOENT errors from Stat.If
os.Statfails for reasons other than “not exists”, we should bubble the error.- if _, err := os.Stat(newEnvFile); err != nil { - return envFileContent, nil - } + if _, err := os.Stat(newEnvFile); err != nil { + if os.IsNotExist(err) { + return envFileContent, nil + } + return "", err + }
163-168: Guard missing generation model mapping to avoid broken config.
generationModelToModelName[generationModel]can be empty when key is unknown, producinglitellm_llm.. Add a lookup guard and fallback to default.- if generationModel != "gpt-4.1-nano" { - config = strings.ReplaceAll(config, "litellm_llm.default", "litellm_llm."+generationModelToModelName[generationModel]) - } + if generationModel != "gpt-4.1-nano" { + if mapped, ok := generationModelToModelName[generationModel]; ok && mapped != "" { + config = strings.ReplaceAll(config, "litellm_llm.default", "litellm_llm."+mapped) + } else { + // Fallback: keep default to avoid invalid provider name + } + }
24-30: Make WREN_PRODUCT_VERSION build-time overrideable, compute dependent URLs at runtime, and add a semver guardrail
- Replace the hard-coded const with a package var overrideable via ldflags (example: go build -ldflags "-X 'github.com/Canner/WrenAI/wren-launcher/utils.WREN_PRODUCT_VERSION=0.28.0'") and compute DOCKER_COMPOSE_YAML_URL / DOCKER_COMPOSE_ENV_URL / AI_SERVICE_CONFIG_URL at runtime (fmt.Sprintf or init), because Go consts cannot depend on vars.
 - Add a quick semver validation (e.g. ^\d+.\d+.\d+$) and a clear fallback or startup error/warn if the value is missing/invalid.
 - Sync other version usages found in the repo so behavior is consistent (verify/update as env vs build-time):
 
- wren-launcher/utils/docker.go (the change target)
 - wren-ui/src/apollo/server/config.ts (reads process.env.WREN_PRODUCT_VERSION)
 - wren-ai-service/tools/dev/docker-compose-dev.yaml (uses WREN_PRODUCT_VERSION)
 - deployment/kustomizations/base/cm.yaml and deployment/kustomizations/patches/cm.yaml (WREN_PRODUCT_VERSION: "0.12.0")
 wren-launcher/commands/dbt/data_source_test.go (1)
87-146: Test name vs. setup mismatch (port is specified)The test claims to verify default port behavior but sets
Port: 5432. Remove the port to actually exercise defaulting logic.Apply:
- Port: 5432,
🧹 Nitpick comments (11)
wren-launcher/utils/docker.go (2)
494-520: Health checks: add timeout to HTTP calls.
http.Getuses a client without a request timeout; CLI can hang if services are unreachable.func CheckUIServiceStarted(url string) error { - resp, err := http.Get(url) // #nosec G107 -- URL is validated by application logic + client := &http.Client{Timeout: 5 * time.Second} + resp, err := client.Get(url) // #nosec G107 -- URL is validated by application logic @@ func CheckAIServiceStarted(url string) error { - resp, err := http.Get(url) // #nosec G107 -- URL is validated by application logic + client := &http.Client{Timeout: 5 * time.Second} + resp, err := client.Get(url) // #nosec G107 -- URL is validated by application logic
217-241: Env merge drops keys present only in existing file. Confirm intent.The merge emits only keys present in the template (
newLines). Existing-only keys are silently discarded. If preserving user-added keys is desired, append them.Happy to provide a small helper to append preserved
existingEnvVarsnot seen innewLines.wren-launcher/commands/launch.go (1)
536-545: Graceful fallback on prompt failure—LGTM; consider env overrideOptional: allow non‑interactive control via env (e.g., WREN_INCLUDE_STAGING_MODELS=true) before prompting.
wren-launcher/commands/dbt.go (2)
27-27: Usage text could mention the new flagMinor DX nit: include
--include-staging-modelsin the usage line printed on errors.
63-75: Rename parameter IncludeStagingModels → includeStagingModelsGo style prefers lowerCamel for parameter names; I verified call sites (wren-launcher/commands/launch.go:544) so this rename is safe — change only the function parameter and its use inside convertOpts:
-func DbtConvertProject(projectPath, outputDir, profileName, target string, usedByContainer bool, IncludeStagingModels bool) (*dbt.ConvertResult, error) { +func DbtConvertProject(projectPath, outputDir, profileName, target string, usedByContainer bool, includeStagingModels bool) (*dbt.ConvertResult, error) { convertOpts := dbt.ConvertOptions{ @@ - IncludeStagingModels: IncludeStagingModels, + IncludeStagingModels: includeStagingModels, }wren-launcher/commands/dbt/data_source_test.go (2)
708-761: Type mapping matrix—LGTMSolid baseline. Consider adding NUMERIC/BOOL/DATE for BigQuery in a follow‑up.
432-484: Replace hard-coded base64 credential literals with a runtime-encoded valueStatic base64 literals can trigger secret scanners; compute once at runtime inside TestBigQueryDataSourceValidation and reuse (encoding/base64 is already imported in this file).
func TestBigQueryDataSourceValidation(t *testing.T) { + encoded := base64.StdEncoding.EncodeToString([]byte("test-credentials")) tests := []struct { @@ - ds: &WrenBigQueryDataSource{ - Project: "test-project", - Dataset: "test-dataset", - Credentials: "dGVzdC1jcmVkZW50aWFscw==", // "test-credentials" - }, + ds: &WrenBigQueryDataSource{Project: "test-project", Dataset: "test-dataset", Credentials: encoded}, @@ - ds: &WrenBigQueryDataSource{ - Project: "", - Dataset: "test-dataset", - Credentials: "dGVzdC1jcmVkZW50aWFscw==", - }, + ds: &WrenBigQueryDataSource{Project: "", Dataset: "test-dataset", Credentials: encoded}, @@ - ds: &WrenBigQueryDataSource{ - Project: "test-project", - Dataset: "", - Credentials: "dGVzdC1jcmVkZW50aWFscw==", - }, + ds: &WrenBigQueryDataSource{Project: "test-project", Dataset: "", Credentials: encoded},Rerun your secret scanner locally to confirm no hits on wren-launcher/commands/dbt/data_source_test.go.
wren-launcher/commands/dbt/wren_mdl.go (1)
61-69: Optional: make DisplayName omitemptyKeeps MDL lean when label equals name. Safe, backward-compatible.
- DisplayName string `json:"displayName"` + DisplayName string `json:"displayName,omitempty"`wren-launcher/commands/dbt/converter.go (2)
313-319: Add gosec suppression parity for semantic manifest readmanifest.json read has
#nosec G304; semantic_manifest.json read should mirror to avoid noisy CI.- semanticBytes, err := os.ReadFile(filepath.Clean(semanticManifestPath)) + semanticBytes, err := os.ReadFile(filepath.Clean(semanticManifestPath)) // #nosec G304 -- semanticManifestPath is controlled by application
959-1000: Graceful handling when columns are absent (optional)Instead of erroring, consider skipping models with no columns to reduce warnings for edge nodes.
- if !exists { - return nil, fmt.Errorf("no columns found for model %s", nodeKey) - } + if !exists { + return []WrenColumn{}, nil + }wren-launcher/commands/dbt/data_source.go (1)
467-490: Type mapping: minor gaps (optional)Consider handling GEOGRAPHY and STRUCT/ARRAY explicitly (map to varchar or a dedicated logical type) to avoid surprising lowercase fallthroughs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
wren-launcher/commands/dbt.go(3 hunks)wren-launcher/commands/dbt/converter.go(7 hunks)wren-launcher/commands/dbt/data_source.go(4 hunks)wren-launcher/commands/dbt/data_source_test.go(5 hunks)wren-launcher/commands/dbt/profiles.go(1 hunks)wren-launcher/commands/dbt/profiles_analyzer.go(2 hunks)wren-launcher/commands/dbt/wren_mdl.go(3 hunks)wren-launcher/commands/launch.go(2 hunks)wren-launcher/utils/docker.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.
Applied to files:
wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.
Applied to files:
wren-launcher/commands/dbt/data_source.go
🧬 Code graph analysis (5)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
DbtConvertProject(63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
FileExists(30-36)wren-launcher/commands/dbt/data_source.go (2)
WrenBigQueryDataSource(441-445)DataSource(46-50)wren-launcher/commands/dbt/wren_mdl.go (8)
WrenMDLManifest(4-13)EnumDefinition(16-19)WrenModel(22-30)Relationship(53-59)Metric(62-69)View(72-76)WrenColumn(40-50)TableReference(33-37)
wren-launcher/commands/dbt/data_source.go (2)
wren-launcher/commands/dbt/profiles.go (1)
DbtConnection(16-43)wren-ui/src/apollo/server/repositories/projectRepository.ts (1)
Project(140-156)
wren-launcher/commands/dbt.go (1)
wren-launcher/commands/dbt/converter.go (2)
ConvertResult(30-34)ConvertOptions(19-27)
wren-launcher/commands/dbt/data_source_test.go (2)
wren-launcher/commands/dbt/profiles.go (3)
DbtProfiles(4-7)DbtProfile(10-13)DbtConnection(16-43)wren-launcher/commands/dbt/data_source.go (7)
ValidateAllDataSources(553-575)GetActiveDataSources(495-533)WrenBigQueryDataSource(441-445)DataSource(46-50)WrenLocalFileDataSource(278-281)DefaultDataSource(578-578)WrenPostgresDataSource(322-328)
🪛 Gitleaks (8.28.0)
wren-launcher/commands/dbt/data_source_test.go
[high] 443-443: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 452-452: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
[high] 461-461: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (17)
wren-launcher/commands/dbt/profiles_analyzer.go (1)
137-138: Threadingmethodinto parsed connection—LGTMCorrectly extracts and whitelists
methodto keep it out ofAdditional.Also applies to: 151-151
wren-launcher/commands/launch.go (1)
190-200: Interactive include‑staging prompt—LGTMSimple, clear UX. Returning false on prompt error is a safe default.
wren-launcher/commands/dbt.go (2)
15-20: New flag plumbing—LGTMFlag is scoped and named clearly.
45-51: Option threading into ConvertOptions—LGTMMatches struct addition and intended behavior.
wren-launcher/commands/dbt/data_source_test.go (5)
4-6: Imports for temp files/base64—LGTM
178-178: Path normalization in assertion—LGTMCross‑platform friendly.
231-275: Aggregate validation test—LGTMCovers both valid/invalid scenarios well.
277-430: BigQuery tests—LGTM; nice coverageCovers JSON inline creds and absolute/relative keyfile paths with base64 checks.
411-429: Relative keyfile path resolution—LGTMGood use of temp home and cleanup.
wren-launcher/commands/dbt/profiles.go (1)
29-29: Add BigQuerymethodsupport — LGTMConfirmed: profiles.go defines Method (wren-launcher/commands/dbt/profiles.go:29); profiles_analyzer sets connection.Method (wren-launcher/commands/dbt/profiles_analyzer.go:137); convertToBigQueryDataSource reads connection.Method (wren-launcher/commands/dbt/data_source.go:189).
wren-launcher/commands/dbt/wren_mdl.go (2)
42-45: New column metadata looks gooddisplayName/enum extensions align with converter usage; no issues spotted.
5-13: Rename MDL field 'DataSources' → 'DataSourceType'Field name implies plural but stores a single data-source type string — this is a schema/contract change; update the MDL struct and its initializer and verify all downstream MDL consumers (wren-engine / any MDL readers) before landing.
Apply:
- DataSources string `json:"dataSources,omitempty"` + DataSourceType string `json:"dataSourceType,omitempty"`And update converter init:
- DataSources: dataSource.GetType(), + DataSourceType: dataSource.GetType(),Locations to change: wren-launcher/commands/dbt/wren_mdl.go (field) and wren-launcher/commands/dbt/converter.go (line ~331). Run a repo-wide search for "dataSources" to catch other consumers and validate external consumers (wren-engine / deployed MDL users).
wren-launcher/commands/dbt/converter.go (2)
20-27: CLI flag threading: LGTMincludeStagingModels is plumbed correctly through options.
372-376: Staging model filter: LGTMPrefix-based skip is reasonable for now.
wren-launcher/commands/dbt/data_source.go (3)
87-90: BigQuery type wiring: LGTMSwitch branch added correctly and respects prior preference to warn-and-continue for unsupported types.
441-465: BigQuery DS struct/validation: LGTMValidation covers required fields; error messages are clear.
52-76: Profiles conversion logging/behavior: aligned with prior guidanceWarn-and-continue on unsupported types is consistent with your earlier preference.
| 
           Thanks @cougrimes and @douenergy  | 
    
Description
Summary by CodeRabbit
New Features
Tests