A Python utility to synchronize semantic definitions from your dbt project (semantic models and metrics) to your Eppo instance using Eppo's bulk metrics sync API (
/api/v1/metrics/sync
).
- Parses dbt Artifacts: Reads
metrics:
andsemantic_models:
definitions from.yml
files, and uses dbt'smanifest.json
artifact to access model metadata and compiled SQL. - Maps Concepts: Translates dbt semantic models into Eppo
fact_sources
(including compiled SQL) and dbt metrics into Eppometrics
definitions. - Eppo Bulk Sync API Integration: Generates a single payload and sends it to the Eppo
/api/v1/metrics/sync
endpoint. - Dry Run Mode: Allows previewing the generated bulk payload without actually sending it to Eppo.
Before you begin, ensure you have the following:
- Python: Version 3.9 or higher.
- Poetry: For managing dependencies and the virtual environment (recommended for development). Install via
pip install poetry
. - dbt Project: A dbt project with semantic layer definitions (
metrics:
,semantic_models:
) defined in YAML files. - dbt
manifest.json
: You need themanifest.json
artifact generated by dbt. Rundbt parse
ordbt compile
in your dbt project to generate/update this file (usually located in thetarget/
directory). This tool requires the manifest to get model relationships and compiled SQL. - Eppo API Key: Generate an API key from your Eppo instance. Admins can create and manage REST API Keys by visiting Admin > API Keys.
- Clone the repository:
git clone <your-repository-url> cd dbt-eppo-sync
- Install dependencies using Poetry:
This creates a virtual environment, installs all necessary packages, and makes the
poetry install
dbt-eppo-sync
command available.
The tool is configured via command-line arguments.
Required Arguments:
--dbt-project-dir
: Path to the root directory of your dbt project (containingdbt_project.yml
). The parser uses this to locate dbt definition files.--manifest-path
: Path to the dbtmanifest.json
file (e.g.,./your_dbt_project/target/manifest.json
).--eppo-api-key
: Your Eppo API key. Recommendation: Use theEPPO_API_KEY
environment variable for your API key:
export EPPO_API_KEY="your_actual_api_key"
# The tool will pick this up automatically
Optional Arguments:
--sync-tag
: A string tag to identify this sync operation in Eppo logs or UI. Defaults todbt-sync-<timestamp>
.--dry-run
: A flag to perform parsing and mapping but print the payload instead of sending it to Eppo.--eppo-base-url
: Override the default Eppo API base URL (https://eppo.cloud/api
).
Run the sync command using poetry run
, which executes the command within the project's virtual environment managed by Poetry.
Basic Sync:
# Ensure EPPO_API_KEY environment variable is set
poetry run dbt-eppo-sync \
--dbt-project-dir "/path/to/your/dbt/project" \
--manifest-path "/path/to/your/dbt/project/target/manifest.json" \
# Optional: --sync-tag "my-custom-tag"
Alternatively, provide the API key directly (less secure):
poetry run dbt-eppo-sync \
--dbt-project-dir "/path/to/your/dbt/project" \
--manifest-path "/path/to/your/dbt/project/target/manifest.json" \
--eppo-api-key "your_api_key_here"
Dry Run:
To generate the bulk payload and print it without sending it to Eppo, use the --dry-run
flag:
# Ensure EPPO_API_KEY environment variable is set (or use --eppo-api-key)
poetry run dbt-eppo-sync \
--dbt-project-dir "/path/to/your/dbt/project" \
--manifest-path "/path/to/your/dbt/project/target/manifest.json" \
--dry-run
This tool maps dbt artifacts to the structure required by Eppo's /api/v1/metrics/sync
endpoint:
- dbt Semantic Model -> Eppo
fact_source
:- The
name
of the semantic model becomes thefact_source.name
. - The compiled SQL for the underlying dbt model (extracted from
manifest.json
based on the semantic model'smodel
reference) is placed infact_source.sql
. - The dbt primary entity (
type: 'primary'
) is mapped tofact_source.entities
, using the entity'sname
asentity_name
andexpr
ascolumn
. Other entity types are currently ignored. - dbt
measures
are mapped tofact_source.facts
. Specifically:measure.name
->fact.name
measure.expr
->fact.column
(ifexpr
exists)measure.description
->fact.description
measure.meta.eppo_desired_change
->fact.desired_change
(defaults to 'increase' if meta tag is absent)
- dbt
dimensions
are mapped tofact_source.properties
. Specifically:dimension.name
->property.name
dimension.expr
->property.column
dimension.description
->property.description
- A
timestamp_column
is required by Eppo and is automatically inferred by looking for a dimension withtype: 'time'
or common names liketimestamp
,event_timestamp
,created_at
. An error is raised if none can be found.
- The
- dbt Metric -> Eppo
metric
:- The dbt metric
name
becomes the Eppometric.name
. Thelabel
field is currently ignored for naming. - The dbt metric
type
determines the Eppometric.type
and structure:- dbt
sum
,count
,count_distinct
map to Eppotype: simple
, with theoperation
set tosum
,count
, ordistinct_entity
respectively. Thefact_name
links back to the corresponding measure in the source semantic model. - dbt
average
maps to Eppotype: ratio
, constructing thenumerator
anddenominator
objects linked to the appropriate measure-derivedfact_name
s. - dbt
percentile
maps to Eppotype: percentile
, constructing thepercentile
object linked to the appropriatefact_name
and including thepercentile_value
.
- dbt
- The primary
entity
for the Eppo metric is derived from the primary entity of the source dbt semantic model. - Basic dbt
filter
expressions matching the pattern{{ Dimension('dimension_name') }} = 'value'
or!= 'value'
are translated to Eppofilters
on the correspondingfact_property
. More complex filters are currently ignored with a warning.
- The dbt metric
Important: The ac
69E0
curacy of the mapping depends on the structure of your dbt metrics and semantic models matching the expectations outlined above. Complex dbt features (e.g., intricate filters, certain derived metric types not listed) or specific Eppo features (e.g., threshold
, conversion
, retention
operations, funnel
metrics) may require adjustments to the mapping logic in mapper.py
or may not be fully supported yet. Always review the generated payload (using --dry-run
), consult Eppo's API docs (https://eppo.cloud/api/docs#/Metrics%20Sync/syncMetrics), and/or reach out to Eppo Support.
- Follow the Installation steps using Poetry.
- Activate the virtual environment:
$ eval $(poetry env activate) (test-project-for-test) $ # Virtualenv entered
- Run tests (once implemented):
pytest
- Make your changes and contribute!
- Have an enhancement request, idea, or notice a bug? Create a Github Issue!
This project is licensed under the MIT License - see the LICENSE file for details (or specify your chosen license).
- 400 Bad Request Error with "SQL validation failed": If you encounter a 400 error and the detailed response from Eppo indicates an SQL validation failure (often mentioning "Unexpected token"), check the SQL queries being sent in the payload (use the
--dry-run
option).- Cause: This commonly occurs if your dbt project is configured to use non-standard SQL quoting (like backticks
`
) for identifiers, especially when using Snowflake. Eppo's SQL validator might not recognize this quoting style. - Solution: Review your dbt project's quoting configuration (e.g., in
dbt_project.yml
) and ensure it generates standard SQL identifiers (usually double-quoted" "
if quoting is needed, or unquoted). You may need to adjust settings related toquoting
strategies for databases, schemas, and identifiers. - Workaround: You could manually edit the
compiled_code
in yourmanifest.json
before running the sync tool, but configuring dbt correctly is the recommended long-term fix.
- Cause: This commonly occurs if your dbt project is configured to use non-standard SQL quoting (like backticks