8000 GitHub - Eppo-exp/dbt-eppo-sync: A Python utility to synchronize semantic definitions from your dbt project (semantic models and metrics) to your Eppo instance using Eppo's bulk metrics sync API.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A Python utility to synchronize semantic definitions from your dbt project (semantic models and metrics) to your Eppo instance using Eppo's bulk metrics sync API.

Notifications You must be signed in to change notification settings

Eppo-exp/dbt-eppo-sync

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt-Eppo Sync Tool

License: MIT A Python utility to synchronize semantic definitions from your dbt project (semantic models and metrics) to your Eppo instance using Eppo's bulk metrics sync API (/api/v1/metrics/sync).

Features

  • Parses dbt Artifacts: Reads metrics: and semantic_models: definitions from .yml files, and uses dbt's manifest.json artifact to access model metadata and compiled SQL.
  • Maps Concepts: Translates dbt semantic models into Eppo fact_sources (including compiled SQL) and dbt metrics into Eppo metrics definitions.
  • Eppo Bulk Sync API Integration: Generates a single payload and sends it to the Eppo /api/v1/metrics/sync endpoint.
  • Dry Run Mode: Allows previewing the generated bulk payload without actually sending it to Eppo.

Prerequisites

Before you begin, ensure you have the following:

  1. Python: Version 3.9 or higher.
  2. Poetry: For managing dependencies and the virtual environment (recommended for development). Install via pip install poetry.
  3. dbt Project: A dbt project with semantic layer definitions (metrics:, semantic_models:) defined in YAML files.
  4. dbt manifest.json: You need the manifest.json artifact generated by dbt. Run dbt parse or dbt compile in your dbt project to generate/update this file (usually located in the target/ directory). This tool requires the manifest to get model relationships and compiled SQL.
  5. Eppo API Key: Generate an API key from your Eppo instance. Admins can create and manage REST API Keys by visiting Admin > API Keys.

Installation

  1. Clone the repository:
    git clone <your-repository-url> 
    cd dbt-eppo-sync
  2. Install dependencies using Poetry:
    poetry install
    This creates a virtual environment, installs all necessary packages, and makes the dbt-eppo-sync command available.

Configuration

The tool is configured via command-line arguments.

Required Arguments:

  • --dbt-project-dir: Path to the root directory of your dbt project (containing dbt_project.yml). The parser uses this to locate dbt definition files.
  • --manifest-path: Path to the dbt manifest.json file (e.g., ./your_dbt_project/target/manifest.json).
  • --eppo-api-key: Your Eppo API key. Recommendation: Use the EPPO_API_KEY environment variable for your API key:
export EPPO_API_KEY="your_actual_api_key"
# The tool will pick this up automatically

Optional Arguments:

  • --sync-tag: A string tag to identify this sync operation in Eppo logs or UI. Defaults to dbt-sync-<timestamp>.
  • --dry-run: A flag to perform parsing and mapping but print the payload instead of sending it to Eppo.
  • --eppo-base-url: Override the default Eppo API base URL (https://eppo.cloud/api).

Usage

Run the sync command using poetry run, which executes the command within the project's virtual environment managed by Poetry.

Basic Sync:

# Ensure EPPO_API_KEY environment variable is set

poetry run dbt-eppo-sync \
    --dbt-project-dir "/path/to/your/dbt/project" \
    --manifest-path "/path/to/your/dbt/project/target/manifest.json" \
    # Optional: --sync-tag "my-custom-tag"

Alternatively, provide the API key directly (less secure):

poetry run dbt-eppo-sync \
    --dbt-project-dir "/path/to/your/dbt/project" \
    --manifest-path "/path/to/your/dbt/project/target/manifest.json" \
    --eppo-api-key "your_api_key_here"

Dry Run:

To generate the bulk payload and print it without sending it to Eppo, use the --dry-run flag:

# Ensure EPPO_API_KEY environment variable is set (or use --eppo-api-key)

poetry run dbt-eppo-sync \
    --dbt-project-dir "/path/to/your/dbt/project" \
    --manifest-path "/path/to/your/dbt/project/target/manifest.json" \
    --dry-run

Mapping Details

This tool maps dbt artifacts to the structure required by Eppo's /api/v1/metrics/sync endpoint:

  • dbt Semantic Model -> Eppo fact_source:
    • The name of the semantic model becomes the fact_source.name.
    • The compiled SQL for the underlying dbt model (extracted from manifest.json based on the semantic model's model reference) is placed in fact_source.sql.
    • The dbt primary entity (type: 'primary') is mapped to fact_source.entities, using the entity's name as entity_name and expr as column. Other entity types are currently ignored.
    • dbt measures are mapped to fact_source.facts. Specifically:
      • measure.name -> fact.name
      • measure.expr -> fact.column (if expr exists)
      • measure.description -> fact.description
      • measure.meta.eppo_desired_change -> fact.desired_change (defaults to 'increase' if meta tag is absent)
    • dbt dimensions are mapped to fact_source.properties. Specifically:
      • dimension.name -> property.name
      • dimension.expr -> property.column
      • dimension.description -> property.description
    • A timestamp_column is required by Eppo and is automatically inferred by looking for a dimension with type: 'time' or common names like timestamp, event_timestamp, created_at. An error is raised if none can be found.
  • dbt Metric -> Eppo metric:
    • The dbt metric name becomes the Eppo metric.name. The label field is currently ignored for naming.
    • The dbt metric type determines the Eppo metric.type and structure:
      • dbt sum, count, count_distinct map to Eppo type: simple, with the operation set to sum, count, or distinct_entity respectively. The fact_name links back to the corresponding measure in the source semantic model.
      • dbt average maps to Eppo type: ratio, constructing the numerator and denominator objects linked to the appropriate measure-derived fact_names.
      • dbt percentile maps to Eppo type: percentile, constructing the percentile object linked to the appropriate fact_name and including the percentile_value.
    • The primary entity for the Eppo metric is derived from the primary entity of the source dbt semantic model.
    • Basic dbt filter expressions matching the pattern {{ Dimension('dimension_name') }} = 'value' or != 'value' are translated to Eppo filters on the corresponding fact_property. More complex filters are currently ignored with a warning.

Important: The ac 69E0 curacy of the mapping depends on the structure of your dbt metrics and semantic models matching the expectations outlined above. Complex dbt features (e.g., intricate filters, certain derived metric types not listed) or specific Eppo features (e.g., threshold, conversion, retention operations, funnel metrics) may require adjustments to the mapping logic in mapper.py or may not be fully supported yet. Always review the generated payload (using --dry-run), consult Eppo's API docs (https://eppo.cloud/api/docs#/Metrics%20Sync/syncMetrics), and/or reach out to Eppo Support.

Development

  1. Follow the Installation steps using Poetry.
  2. Activate the virtual environment:
    $ eval $(poetry env activate)
    (test-project-for-test) $  # Virtualenv entered
  3. Run tests (once implemented): pytest
  4. Make your changes and contribute!
  5. Have an enhancement request, idea, or notice a bug? Create a Github Issue!

License

This project is licensed under the MIT License - see the LICENSE file for details (or specify your chosen license).

Troubleshooting

  • 400 Bad Request Error with "SQL validation failed": If you encounter a 400 error and the detailed response from Eppo indicates an SQL validation failure (often mentioning "Unexpected token"), check the SQL queries being sent in the payload (use the --dry-run option).
    • Cause: This commonly occurs if your dbt project is configured to use non-standard SQL quoting (like backticks `) for identifiers, especially when using Snowflake. Eppo's SQL validator might not recognize this quoting style.
    • Solution: Review your dbt project's quoting configuration (e.g., in dbt_project.yml) and ensure it generates standard SQL identifiers (usually double-quoted " " if quoting is needed, or unquoted). You may need to adjust settings related to quoting strategies for databases, schemas, and identifiers.
    • Workaround: You could manually edit the compiled_code in your manifest.json before running the sync tool, but configuring dbt correctly is the recommended long-term fix.

About

A Python utility to synchronize semantic definitions from your dbt project (semantic models and metrics) to your Eppo instance using Eppo's bulk metrics sync API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%
0