You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a set of heterogeneous unpartitioned assets which are normalized downstream into a single partitioned asset.
importdagsterasdg@dg.asset(partitions_def=dg.StaticPartitionsDefinition(["a", "b", "c"]))defmerged_asset(context: dg.AssetExecutionContext, a, b, c):
ifcontext.partition_key=="a":
returnaelifcontext.partition_key=="b":
returnbelifcontext.partition_key=="c":
returncelse:
raiseNotImplementedError(f"partition key {context.partition_key} is not supported")
I would like to be able to express that a single upstream asset corresponds to a single downstream partition. Think IdentityPartitionMapping, but instead of upstream partitions I have upstream assets.
This is important because otherwise a change to one of the upstream assets marks all downstream asset partitions as stale. Solvable by specifying DataVersion manually, but still not ideal.
We can support a bunch of predefined strategies for such mapping. For example, we could match the last element of the upstream asset key with partition keys.
["my_prefix", "a"] -> "a"
["my_prefix", "b"] -> "b"
Not sure if this will require any changes in the current mapping framework (since the entity being mapped is not a partition key anymore but an asset key).
Additional information
I can only get away with the current implementation because my assets are loaded as polars.LazyFrame. It won't scale well for a lot of non-lazy input assets.
It would be great to discuss alternative approaches to this problem here.
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered:
What's the use case?
I have a set of heterogeneous unpartitioned assets which are normalized downstream into a single partitioned asset.
I would like to be able to express that a single upstream asset corresponds to a single downstream partition. Think
IdentityPartitionMapping
, but instead of upstream partitions I have upstream assets.This is important because otherwise a change to one of the upstream assets marks all downstream asset partitions as stale. Solvable by specifying
DataVersion
manually, but still not ideal.cc @cmpadden @schrockn as discussed on our call
Ideas of implementation
We can support a bunch of predefined strategies for such mapping. For example, we could match the last element of the upstream asset key with partition keys.
Not sure if this will require any changes in the current mapping framework (since the entity being mapped is not a partition key anymore but an asset key).
Additional information
I can only get away with the current implementation because my assets are loaded as
polars.LazyFrame
. It won't scale well for a lot of non-lazy input assets.It would be great to discuss alternative approaches to this problem here.
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: