Skip to content

Conversation

@goldmedal
Copy link
Collaborator

@goldmedal goldmedal commented Sep 22, 2025

Description

Summary by CodeRabbit

  • New Features

    • Added option to include staging models during dbt conversion (new CLI flag and interactive prompt).
    • BigQuery data source support, including service account credential handling.
    • Optional semantic manifest ingestion to enrich outputs with metrics, relationships, primary keys, enums, and time dimensions.
    • Enhanced model output: column display names and enum references; manifest now includes metrics and enum definitions.
  • Tests

    • Added comprehensive tests for BigQuery handling, data source validation, type mapping, and default port behavior.

cougrimes and others added 18 commits September 22, 2025 11:32
This commit incorporates a series of changes based on code review feedback to improve the dbt to Wren MDL converter. The updates focus on enhancing security, adding more robust validation and parsing, and increasing the completeness of the generated MDL file.

In `wren_mdl.go`:
- feat: Adds a `DisplayName` field to the `WrenColumn` struct to support user-friendly labels for columns.

In `data_source.go`:
- feat: Adds JSON validation for the `keyfile_json` field in BigQuery data sources to provide earlier feedback on malformed credentials.
- refactor: Implements BigQuery-specific data type mappings in the `MapType` method to correctly convert types like `INT64`, `TIMESTAMP`, and `NUMERIC`.

In `converter.go`:
- fix(security): Omits the `keyfile_json` content from the generated `wren-datasource.json` to prevent exposing sensitive credentials in the output file.
- feat: Adds mapping for a dbt column's `meta.label` to the `DisplayName` property in the corresponding Wren `WrenColumn`, improving the usability of the output.
- refactor: Enhances enum name generation with more robust sanitization, replacing all non-alphanumeric characters to ensure valid identifiers.
- refactor: Improves the `ref()` parsing regex to handle optional whitespace, making it more resilient to formatting variations.
- refactor: Adds validation for metric `input_measures` to log a warning if a referenced measure cannot be found, improving debuggability.
- fix: Resolves a persistent compilation error related to an incorrect type assertion on `inputMeasures` by refactoring the logic to correctly identify the base model for a metric.
* fix: Adjusted some behaviors around keyfiles vs. inlined JSON
* fix: Corrected a logical flaw where ratio metrics in dbt were calculated at a row level (numerator / denominator) instead of after aggregation. The code now correctly generates expressions like SUM(numerator) / SUM(denominator).
* perf: Improved performance by pre-compiling a regular expression for parsing ref() functions at the package level, preventing it from being re-compiled on every function call.
test: Adjusted data source testing
fix: Go fmt linting
chore: Lint updates
chore: Ran go fmt one more time for good measure
chore(test): Adjusted test on abs_path to work cross-platform
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2025

Walkthrough

Adds a CLI/runtime flag to include staging models in dbt conversion, threads it through APIs and converter. Enhances converter to optionally load semantic_manifest.json, generate metrics/enums/relationships, and enrich MDL structures. Introduces BigQuery data source support with credentials handling. Expands profiles parsing, tests, MDL schema, and interactive launcher prompt.

Changes

Cohort / File(s) Summary of changes
CLI and Public API flag plumbing
wren-launcher/commands/dbt.go, wren-launcher/commands/launch.go
Adds --include-staging-models flag and interactive prompt; extends DbtConvertProject signature to accept includeStagingModels; forwards option through DbtAutoConvert and ConvertOptions.
Converter enhancements and semantic manifest
wren-launcher/commands/dbt/converter.go
Extends ConvertOptions with IncludeStagingModels; updates ConvertDbtCatalogToWrenMDL signature to accept semanticManifestPath and includeStagingModels; loads semantic_manifest.json when present; generates metrics, relationships, enums; enriches models/columns; adds helpers and logging.
BigQuery data source support
wren-launcher/commands/dbt/data_source.go
Adds WrenBigQueryDataSource with GetType/Validate/MapType; implements convertToBigQueryDataSource; handles service-account JSON and keyfile paths with base64-encoded credentials; integrates into convertConnectionToDataSource.
Profiles schema and parsing
wren-launcher/commands/dbt/profiles.go, wren-launcher/commands/dbt/profiles_analyzer.go
Adds Method field to DbtConnection; parser populates method and treats it as a known field.
MDL schema extensions
wren-launcher/commands/dbt/wren_mdl.go
Adds EnumDefinition and Metric types; extends WrenMDLManifest with EnumDefinitions and Metrics; extends WrenColumn with DisplayName and Enum.
Tests for data sources and mapping
wren-launcher/commands/dbt/data_source_test.go
Adds BigQuery tests (credentials paths/encoding, validation); adds MapType tests; adjusts Postgres default port test; normalizes path comparisons.
Misc formatting
wren-launcher/utils/docker.go
Whitespace-only formatting tweak to a constant; no behavioral change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as wren-launcher
  participant DBT as dbt converter
  participant Conv as Converter Core
  participant FS as Filesystem

  User->>CLI: run dbt convert
  CLI->>User: Prompt include staging models?
  User-->>CLI: yes/no
  CLI->>DBT: DbtConvertProject(..., includeStagingModels)
  DBT->>Conv: ConvertOptions{ IncludeStagingModels: bool }
  Conv->>FS: Read manifest.json, catalog.json
  alt semantic manifest present
    Conv->>FS: Read semantic_manifest.json
    Conv->>Conv: Build metrics/time dimensions
  end
  Conv->>Conv: Convert nodes (apply IncludeStagingModels)
  Conv->>Conv: Generate enums/relationships
  Conv-->>DBT: WrenMDLManifest
  DBT-->>CLI: ConvertResult
  CLI-->>User: Output generated
Loading
sequenceDiagram
  autonumber
  participant Prof as dbt profiles.yml
  participant Parser as profiles_analyzer
  participant DS as DataSource Builder

  Prof-->>Parser: connection map
  Parser->>Parser: Extract type, method, project, dataset, keyfile_json/keyfile
  alt type == "bigquery"
    Parser-->>DS: DbtConnection{Method, Project, Dataset, Keyfile/JSON}
    DS->>DS: Resolve keyfile path (abs/rel), read JSON
    DS->>DS: Base64-encode credentials
    DS-->>Parser: WrenBigQueryDataSource
  else other types
    Parser-->>DS: DbtConnection{...}
    DS-->>Parser: Existing data source type
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • wwwy3y3
  • douenergy
  • onlyjackfrost

Poem

A rabbit taps keys with delighted squeaks,
Staging stars hop into conversion peaks.
Metrics bloom, enums twirl in tune,
BigQuery burrows under a JSON moon.
Manifests whisper, relations align—
Thump-thump! Our MDL now shines fine. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately summarizes the primary work in this PR—expanded dbt⇄Wren MDL conversion functionality and added BigQuery support—and aligns with the changes to converters, data sources, CLI flags, and tests. It is concise, focused, and clearly conveys the main intent for a teammate scanning history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cougrimes/main

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
wren-launcher/utils/docker.go (4)

93-111: downloadFile: no timeout and no HTTP status check. Can hang or save 404/HTML.

Use an HTTP client with timeout and validate StatusCode before writing to disk.

 import (
   "context"
   "fmt"
   "io"
   "net/http"
+  "time"
   "os"
   "path"
   "regexp"
   "strings"
 )
@@
 func downloadFile(filepath string, url string) error {
-  // Get the data
-  resp, err := http.Get(url) // #nosec G107 -- URL is from trusted source constants
+  // Get the data
+  client := &http.Client{Timeout: 30 * time.Second}
+  resp, err := client.Get(url) // #nosec G107 -- URL is from trusted source constants
   if err != nil {
     return err
   }
   defer func() { _ = resp.Body.Close() }()
+  if resp.StatusCode != http.StatusOK {
+    return fmt.Errorf("download failed: %s returned %d", url, resp.StatusCode)
+  }
@@
   // Write the body to file
   _, err = io.Copy(out, resp.Body)
   return err
 }

180-183: mergeEnvContent: swallows non-ENOENT errors from Stat.

If os.Stat fails for reasons other than “not exists”, we should bubble the error.

- if _, err := os.Stat(newEnvFile); err != nil {
-   return envFileContent, nil
- }
+ if _, err := os.Stat(newEnvFile); err != nil {
+   if os.IsNotExist(err) {
+     return envFileContent, nil
+   }
+   return "", err
+ }

163-168: Guard missing generation model mapping to avoid broken config.

generationModelToModelName[generationModel] can be empty when key is unknown, producing litellm_llm.. Add a lookup guard and fallback to default.

- if generationModel != "gpt-4.1-nano" {
-   config = strings.ReplaceAll(config, "litellm_llm.default", "litellm_llm."+generationModelToModelName[generationModel])
- }
+ if generationModel != "gpt-4.1-nano" {
+   if mapped, ok := generationModelToModelName[generationModel]; ok && mapped != "" {
+     config = strings.ReplaceAll(config, "litellm_llm.default", "litellm_llm."+mapped)
+   } else {
+     // Fallback: keep default to avoid invalid provider name
+   }
+ }

24-30: Make WREN_PRODUCT_VERSION build-time overrideable, compute dependent URLs at runtime, and add a semver guardrail

  • Replace the hard-coded const with a package var overrideable via ldflags (example: go build -ldflags "-X 'github.com/Canner/WrenAI/wren-launcher/utils.WREN_PRODUCT_VERSION=0.28.0'") and compute DOCKER_COMPOSE_YAML_URL / DOCKER_COMPOSE_ENV_URL / AI_SERVICE_CONFIG_URL at runtime (fmt.Sprintf or init), because Go consts cannot depend on vars.
  • Add a quick semver validation (e.g. ^\d+.\d+.\d+$) and a clear fallback or startup error/warn if the value is missing/invalid.
  • Sync other version usages found in the repo so behavior is consistent (verify/update as env vs build-time):
    • wren-launcher/utils/docker.go (the change target)
    • wren-ui/src/apollo/server/config.ts (reads process.env.WREN_PRODUCT_VERSION)
    • wren-ai-service/tools/dev/docker-compose-dev.yaml (uses WREN_PRODUCT_VERSION)
    • deployment/kustomizations/base/cm.yaml and deployment/kustomizations/patches/cm.yaml (WREN_PRODUCT_VERSION: "0.12.0")
wren-launcher/commands/dbt/data_source_test.go (1)

87-146: Test name vs. setup mismatch (port is specified)

The test claims to verify default port behavior but sets Port: 5432. Remove the port to actually exercise defaulting logic.

Apply:

-                    Port:     5432,
🧹 Nitpick comments (11)
wren-launcher/utils/docker.go (2)

494-520: Health checks: add timeout to HTTP calls.

http.Get uses a client without a request timeout; CLI can hang if services are unreachable.

 func CheckUIServiceStarted(url string) error {
-  resp, err := http.Get(url) // #nosec G107 -- URL is validated by application logic
+  client := &http.Client{Timeout: 5 * time.Second}
+  resp, err := client.Get(url) // #nosec G107 -- URL is validated by application logic
@@
 func CheckAIServiceStarted(url string) error {
-  resp, err := http.Get(url) // #nosec G107 -- URL is validated by application logic
+  client := &http.Client{Timeout: 5 * time.Second}
+  resp, err := client.Get(url) // #nosec G107 -- URL is validated by application logic

217-241: Env merge drops keys present only in existing file. Confirm intent.

The merge emits only keys present in the template (newLines). Existing-only keys are silently discarded. If preserving user-added keys is desired, append them.

Happy to provide a small helper to append preserved existingEnvVars not seen in newLines.

wren-launcher/commands/launch.go (1)

536-545: Graceful fallback on prompt failure—LGTM; consider env override

Optional: allow non‑interactive control via env (e.g., WREN_INCLUDE_STAGING_MODELS=true) before prompting.

wren-launcher/commands/dbt.go (2)

27-27: Usage text could mention the new flag

Minor DX nit: include --include-staging-models in the usage line printed on errors.


63-75: Rename parameter IncludeStagingModels → includeStagingModels

Go style prefers lowerCamel for parameter names; I verified call sites (wren-launcher/commands/launch.go:544) so this rename is safe — change only the function parameter and its use inside convertOpts:

-func DbtConvertProject(projectPath, outputDir, profileName, target string, usedByContainer bool, IncludeStagingModels bool) (*dbt.ConvertResult, error) {
+func DbtConvertProject(projectPath, outputDir, profileName, target string, usedByContainer bool, includeStagingModels bool) (*dbt.ConvertResult, error) {
   convertOpts := dbt.ConvertOptions{
@@
-    IncludeStagingModels: IncludeStagingModels,
+    IncludeStagingModels: includeStagingModels,
   }
wren-launcher/commands/dbt/data_source_test.go (2)

708-761: Type mapping matrix—LGTM

Solid baseline. Consider adding NUMERIC/BOOL/DATE for BigQuery in a follow‑up.


432-484: Replace hard-coded base64 credential literals with a runtime-encoded value

Static base64 literals can trigger secret scanners; compute once at runtime inside TestBigQueryDataSourceValidation and reuse (encoding/base64 is already imported in this file).

 func TestBigQueryDataSourceValidation(t *testing.T) {
+    encoded := base64.StdEncoding.EncodeToString([]byte("test-credentials"))
     tests := []struct {
@@
-      ds: &WrenBigQueryDataSource{
-        Project:     "test-project",
-        Dataset:     "test-dataset",
-        Credentials: "dGVzdC1jcmVkZW50aWFscw==", // "test-credentials"
-      },
+      ds: &WrenBigQueryDataSource{Project: "test-project", Dataset: "test-dataset", Credentials: encoded},
@@
-      ds: &WrenBigQueryDataSource{
-        Project:     "",
-        Dataset:     "test-dataset",
-        Credentials: "dGVzdC1jcmVkZW50aWFscw==",
-      },
+      ds: &WrenBigQueryDataSource{Project: "", Dataset: "test-dataset", Credentials: encoded},
@@
-      ds: &WrenBigQueryDataSource{
-        Project:     "test-project",
-        Dataset:     "",
-        Credentials: "dGVzdC1jcmVkZW50aWFscw==",
-      },
+      ds: &WrenBigQueryDataSource{Project: "test-project", Dataset: "", Credentials: encoded},

Rerun your secret scanner locally to confirm no hits on wren-launcher/commands/dbt/data_source_test.go.

wren-launcher/commands/dbt/wren_mdl.go (1)

61-69: Optional: make DisplayName omitempty

Keeps MDL lean when label equals name. Safe, backward-compatible.

-    DisplayName string   `json:"displayName"`
+    DisplayName string   `json:"displayName,omitempty"`
wren-launcher/commands/dbt/converter.go (2)

313-319: Add gosec suppression parity for semantic manifest read

manifest.json read has #nosec G304; semantic_manifest.json read should mirror to avoid noisy CI.

-        semanticBytes, err := os.ReadFile(filepath.Clean(semanticManifestPath))
+        semanticBytes, err := os.ReadFile(filepath.Clean(semanticManifestPath)) // #nosec G304 -- semanticManifestPath is controlled by application

959-1000: Graceful handling when columns are absent (optional)

Instead of erroring, consider skipping models with no columns to reduce warnings for edge nodes.

-    if !exists {
-        return nil, fmt.Errorf("no columns found for model %s", nodeKey)
-    }
+    if !exists {
+        return []WrenColumn{}, nil
+    }
wren-launcher/commands/dbt/data_source.go (1)

467-490: Type mapping: minor gaps (optional)

Consider handling GEOGRAPHY and STRUCT/ARRAY explicitly (map to varchar or a dedicated logical type) to avoid surprising lowercase fallthroughs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3edba06 and e2286cb.

📒 Files selected for processing (9)
  • wren-launcher/commands/dbt.go (3 hunks)
  • wren-launcher/commands/dbt/converter.go (7 hunks)
  • wren-launcher/commands/dbt/data_source.go (4 hunks)
  • wren-launcher/commands/dbt/data_source_test.go (5 hunks)
  • wren-launcher/commands/dbt/profiles.go (1 hunks)
  • wren-launcher/commands/dbt/profiles_analyzer.go (2 hunks)
  • wren-launcher/commands/dbt/wren_mdl.go (3 hunks)
  • wren-launcher/commands/launch.go (2 hunks)
  • wren-launcher/utils/docker.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-09T02:43:20.433Z
Learnt from: goldmedal
PR: Canner/WrenAI#1827
File: wren-launcher/commands/dbt/data_source.go:50-53
Timestamp: 2025-07-09T02:43:20.433Z
Learning: In wren-launcher/commands/dbt/data_source.go, when encountering unsupported database types in convertConnectionToDataSource function, the user prefers to log a warning and continue processing instead of failing. The function should return nil, nil to allow processing to continue.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
📚 Learning: 2025-08-13T05:17:30.180Z
Learnt from: goldmedal
PR: Canner/WrenAI#1887
File: wren-launcher/commands/dbt/data_source.go:273-276
Timestamp: 2025-08-13T05:17:30.180Z
Learning: In the Wren AI codebase, there's an intentional distinction between dbt connection types and Wren AI data source types: dbt uses "sqlserver" as the connection type, while Wren AI expects "mssql" as the data source type. The conversion layer in convertConnectionToDataSource() correctly matches "sqlserver" from dbt profiles and the WrenMSSQLDataSource.GetType() method correctly returns "mssql" for Wren AI compatibility.

Applied to files:

  • wren-launcher/commands/dbt/data_source.go
🧬 Code graph analysis (5)
wren-launcher/commands/launch.go (1)
wren-launcher/commands/dbt.go (1)
  • DbtConvertProject (63-75)
wren-launcher/commands/dbt/converter.go (3)
wren-launcher/commands/dbt/utils.go (1)
  • FileExists (30-36)
wren-launcher/commands/dbt/data_source.go (2)
  • WrenBigQueryDataSource (441-445)
  • DataSource (46-50)
wren-launcher/commands/dbt/wren_mdl.go (8)
  • WrenMDLManifest (4-13)
  • EnumDefinition (16-19)
  • WrenModel (22-30)
  • Relationship (53-59)
  • Metric (62-69)
  • View (72-76)
  • WrenColumn (40-50)
  • TableReference (33-37)
wren-launcher/commands/dbt/data_source.go (2)
wren-launcher/commands/dbt/profiles.go (1)
  • DbtConnection (16-43)
wren-ui/src/apollo/server/repositories/projectRepository.ts (1)
  • Project (140-156)
wren-launcher/commands/dbt.go (1)
wren-launcher/commands/dbt/converter.go (2)
  • ConvertResult (30-34)
  • ConvertOptions (19-27)
wren-launcher/commands/dbt/data_source_test.go (2)
wren-launcher/commands/dbt/profiles.go (3)
  • DbtProfiles (4-7)
  • DbtProfile (10-13)
  • DbtConnection (16-43)
wren-launcher/commands/dbt/data_source.go (7)
  • ValidateAllDataSources (553-575)
  • GetActiveDataSources (495-533)
  • WrenBigQueryDataSource (441-445)
  • DataSource (46-50)
  • WrenLocalFileDataSource (278-281)
  • DefaultDataSource (578-578)
  • WrenPostgresDataSource (322-328)
🪛 Gitleaks (8.28.0)
wren-launcher/commands/dbt/data_source_test.go

[high] 443-443: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 452-452: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)


[high] 461-461: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🔇 Additional comments (17)
wren-launcher/commands/dbt/profiles_analyzer.go (1)

137-138: Threading method into parsed connection—LGTM

Correctly extracts and whitelists method to keep it out of Additional.

Also applies to: 151-151

wren-launcher/commands/launch.go (1)

190-200: Interactive include‑staging prompt—LGTM

Simple, clear UX. Returning false on prompt error is a safe default.

wren-launcher/commands/dbt.go (2)

15-20: New flag plumbing—LGTM

Flag is scoped and named clearly.


45-51: Option threading into ConvertOptions—LGTM

Matches struct addition and intended behavior.

wren-launcher/commands/dbt/data_source_test.go (5)

4-6: Imports for temp files/base64—LGTM


178-178: Path normalization in assertion—LGTM

Cross‑platform friendly.


231-275: Aggregate validation test—LGTM

Covers both valid/invalid scenarios well.


277-430: BigQuery tests—LGTM; nice coverage

Covers JSON inline creds and absolute/relative keyfile paths with base64 checks.


411-429: Relative keyfile path resolution—LGTM

Good use of temp home and cleanup.

wren-launcher/commands/dbt/profiles.go (1)

29-29: Add BigQuery method support — LGTM

Confirmed: profiles.go defines Method (wren-launcher/commands/dbt/profiles.go:29); profiles_analyzer sets connection.Method (wren-launcher/commands/dbt/profiles_analyzer.go:137); convertToBigQueryDataSource reads connection.Method (wren-launcher/commands/dbt/data_source.go:189).

wren-launcher/commands/dbt/wren_mdl.go (2)

42-45: New column metadata looks good

displayName/enum extensions align with converter usage; no issues spotted.


5-13: Rename MDL field 'DataSources' → 'DataSourceType'

Field name implies plural but stores a single data-source type string — this is a schema/contract change; update the MDL struct and its initializer and verify all downstream MDL consumers (wren-engine / any MDL readers) before landing.

Apply:

-    DataSources     string           `json:"dataSources,omitempty"`
+    DataSourceType  string           `json:"dataSourceType,omitempty"`

And update converter init:

-        DataSources:     dataSource.GetType(),
+        DataSourceType:  dataSource.GetType(),

Locations to change: wren-launcher/commands/dbt/wren_mdl.go (field) and wren-launcher/commands/dbt/converter.go (line ~331). Run a repo-wide search for "dataSources" to catch other consumers and validate external consumers (wren-engine / deployed MDL users).

wren-launcher/commands/dbt/converter.go (2)

20-27: CLI flag threading: LGTM

includeStagingModels is plumbed correctly through options.


372-376: Staging model filter: LGTM

Prefix-based skip is reasonable for now.

wren-launcher/commands/dbt/data_source.go (3)

87-90: BigQuery type wiring: LGTM

Switch branch added correctly and respects prior preference to warn-and-continue for unsupported types.


441-465: BigQuery DS struct/validation: LGTM

Validation covers required fields; error messages are clear.


52-76: Profiles conversion logging/behavior: aligned with prior guidance

Warn-and-continue on unsupported types is consistent with your earlier preference.

@goldmedal goldmedal merged commit 5b312ff into main Sep 22, 2025
7 checks passed
@goldmedal
Copy link
Collaborator Author

Thanks @cougrimes and @douenergy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants