8000 [Proposal] User-configurable PartitionKey for Actors · Issue #8376 · dapr/dapr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Proposal] User-configurable PartitionKey for Actors #8376
Closed
@ScottArbeit

Description

@ScottArbeit

Hi friends! I'd like to open the discussion of user-configurable partitionKey values for Actors.

TL;DR

Without user-configurable partitionKey values, every query I write for Cosmos DB (and maybe for other Actor State Stores) will be a cross-partition query, which means slower performance and higher cost in Request Units/second.

Use case

I'm using Dapr Actors extensively (through the .NET SDK) for my version control system, Grace.

My first state store for Actors is Azure CosmosDB. As I've gotten more into performance testing, I've realized that the default way that Dapr creates partitionKey values is fine for individual actor states, but leads to a cross-partition query every time when I'm writing CosmosDB SQL queries across actor instances.

My data schema looks something like this:

erDiagram
    Owner ||--|{ Organization : "has 1:N"
    Organization ||--|{ Repository : "has 1:N"
    Repository ||--|{ Branch : "has 1:N"
    Branch ||--|{ Reference : "has 1:N"
    Repository ||--|{ DirectoryVersion : "has 1:N"
    Reference ||--|| DirectoryVersion : "refers to exactly 1"
    DirectoryVersion ||--|{ FileVersion : "has 0:N"
Loading

So, let's say I'm performing a query to find all branches in a repository.

The (simplified here) SQL query I have looks like:

SELECT TOP @maxCount event.Event.created.branchId
  FROM c JOIN event IN c["value"] 
  WHERE event.Event.created.repositoryId = @repositoryId

-- Example Branch `partitionKey`: "grace-server||BranchActor||6b5f66c7-4fe6-4ebe-81fc-0a0d8da22882"
--   i.e. the BranchId I'm querying for is in the `partitionKey` so there's no way to know it before the query.

That WHERE clause does not contain a partitionKey value - and I can't specify one because of Dapr's default partitionKey values - and so every time I run this query, it's going to be a cross-partition query. Same for things like Get all references for a branch or Get DirectoryVersion that matches a RelativePath. Essentially, I don't think there are any queries in Grace where I can specify the partitionKey in the WHERE clause, so they're all cross-partition queries.

Desired state

If I could specify my own partitionKey value, for instance, using the RepositoryId as the partitionKey for all Branches / References / DirectoryVersions in a repository, I'd be able to include it in the query, limiting my query to a single partition, improving performance and lowering cost.

Considerations

User-configurable partitionKey values already exist in Dapr for non-Actor State storage, so we have prior art.

I'm aware that implementing this would require non-trivial changes both in Dapr and in the SDK's. That's why I want to open a discussion about it before I write any code for it. I do not expect that I'm aware right now of all of the downstream effects of such a change, and I don't propose it lightly.

Thoughts? I'm absolutely willing to help work on this to get it across the finish line, but I won't start without some indication that the changes would be welcomed. 😀

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues and PRs without response

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0