[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

About MetricFlow

This guide introduces MetricFlow's fundamental ideas for people new to this feature. MetricFlow, which powers the dbt Semantic Layer, helps you define and manage the logic for your company's metrics. It's an opinionated set of abstractions and helps data consumers retrieve metric datasets from a data platform quickly and efficiently.

MetricFlow handles SQL query construction and defines the specification for dbt semantic models and metrics. It allows you to define metrics in your dbt project and query them with MetricFlow commands whether in dbt Cloud or dbt Core.

Before you start, consider the following guidelines:

MetricFlow

MetricFlow is a SQL query generation tool designed to streamline metric creation across different data dimensions for diverse business needs.

  • It operates through YAML files, where a semantic graph links language to data. This graph comprises semantic models (data entry points) and metrics (functions for creating quantitative indicators).
  • MetricFlow is a BSL package with code source available, and compatible with dbt version 1.6 and higher. Data practitioners and enthusiasts are highly encouraged to contribute.
  • As a part of the dbt Semantic Layer, MetricFlow empowers organizations to define metrics using YAML abstractions.
  • To query metric dimensions, dimension values, and validate configurations, use MetricFlow commands.

Note — MetricFlow doesn't support dbt builtin functions or packages at this time, however, support is planned for the future.

MetricFlow abides by these principles:

  • Flexibility with completeness: Define metric logic using flexible abstractions on any data model.
  • DRY (Don't Repeat Yourself): Minimize redundancy by enabling metric definitions whenever possible.
  • Simplicity with gradual complexity: Approach MetricFlow using familiar data modeling concepts.
  • Performance and efficiency: Optimize performance while supporting centralized data engineering and distributed logic ownership.

Semantic graph

We're introducing a new concept: a "semantic graph". It's the relationship between semantic models and YAML configurations that creates a data landscape for building metrics. You can think of it like a map, where tables are like locations, and the connections between them (edges) are like roads. Although it's under the hood, the semantic graph is a subset of the DAG, and you can see the semantic models as nodes on the DAG.

The semantic graph helps us decide which information is available to use for consumption and which is not. The connections between tables in the semantic graph are more about relationships between the information. This is different from the DAG, where the connections show dependencies between tasks.

When MetricFlow generates a metric, it uses its SQL engine to figure out the best path between tables using the framework defined in YAML files for semantic models and metrics. When these models and metrics are correctly defined, they can be used downstream with dbt Semantic Layer's integrations.

Semantic models

Semantic models are the starting points of data and correspond to models in your dbt project. You can create multiple semantic models from each model. Semantic models have metadata, like a data table, that define important information such as the table name and primary keys for the graph to be navigated correctly.

For a semantic model, there are three main pieces of metadata:

  • Entities — The join keys of your semantic model (think of these as the traversal paths, or edges between semantic models).
  • Dimensions — These are the ways you want to group or slice/dice your metrics.
  • Measures — The aggregation functions that give you a numeric result and can be used to create your metrics.
A semantic model is made up of different components: Entities, Measures, and Dimensions.A semantic model is made up of different components: Entities, Measures, and Dimensions.

Metrics

Metrics, which is a key concept, are functions that combine measures, constraints, or other mathematical functions to define new quantitative indicators. MetricFlow uses measures and various aggregation types, such as average, sum, and count distinct, to create metrics. Dimensions add context to metrics and without them, a metric is simply a number for all time. You can define metrics in the same YAML files as your semantic models, or create a new file.

MetricFlow supports different metric types:

  • Conversion — Helps you track when a base event and a subsequent conversion event occurs for an entity within a set time period.
  • Cumulative — Aggregates a measure over a given window.
  • Derived — An expression of other metrics, which allows you to do calculations on top of metrics.
  • Ratio — Create a ratio out of two measures, like revenue per customer.
  • Simple — Metrics that refer directly to one measure.

Use case

In the upcoming sections, we'll show how data practitioners currently calculate metrics and compare it to how MetricFlow makes defining metrics easier and more flexible.

The following example data is based on the Jaffle Shop repo. You can view the complete dbt project. The tables we're using in our example model are:

  • orders is a production data platform export that has been cleaned up and organized for analytical consumption
  • customers is a partially denormalized table in this case with a column derived from the orders table through some upstream process

To make this more concrete, consider the metric order_total, which is defined using the SQL expression:

select sum(order_total) as order_total from orders This expression calculates the total revenue for all orders by summing the order_total column in the orders table. In a business setting, the metric order_total is often calculated according to different categories, such as"

  • Time, for example date_trunc(ordered_at, 'day')
  • Order Type, using is_food_order dimension from the orders table.

Calculate metrics

Next, we'll compare how data practitioners currently calculate metrics with multiple queries versus how MetricFlow simplifies and streamlines the process.

The following example displays how data practitioners typically would calculate the order_total metric aggregated. It's also likely that analysts are asked for more details on a metric, like how much revenue came from new customers.

Using the following query creates a situation where multiple analysts working on the same data, each using their own query method — this can lead to confusion, inconsistencies, and a headache for data management.

select
date_trunc('day',orders.ordered_at) as day,
case when customers.first_ordered_at is not null then true else false end as is_new_customer,
sum(orders.order_total) as order_total
from
orders
left join
customers
on
orders.customer_id = customers.customer_id
group by 1, 2

FAQs

 Do my datasets need to be normalized?
 Why is normalized data the ideal input?
 Why not just make metrics the same as measures?
 How does the dbt Semantic Layer handle joins?
 Are entities and join keys the same thing?
 Can a table without a primary or unique entities have dimensions?
0