Description
Hi friends! I'd like to open the discussion of user-configurable partitionKey
values for Actors.
TL;DR
Without user-configurable partitionKey
values, every query I write for Cosmos DB (and maybe for other Actor State Stores) will be a cross-partition query, which means slower performance and higher cost in Request Units/second.
Use case
I'm using Dapr Actors extensively (through the .NET SDK) for my version control system, Grace.
My first state store for Actors is Azure CosmosDB. As I've gotten more into performance testing, I've realized that the default way that Dapr creates partitionKey
values is fine for individual actor states, but leads to a cross-partition query every time when I'm writing CosmosDB SQL queries across actor instances.
My data schema looks something like this:
erDiagram
Owner ||--|{ Organization : "has 1:N"
Organization ||--|{ Repository : "has 1:N"
Repository ||--|{ Branch : "has 1:N"
Branch ||--|{ Reference : "has 1:N"
Repository ||--|{ DirectoryVersion : "has 1:N"
Reference ||--|| DirectoryVersion : "refers to exactly 1"
DirectoryVersion ||--|{ FileVersion : "has 0:N"
So, let's say I'm performing a query to find all branches in a repository.
The (simplified here) SQL query I have looks like:
SELECT TOP @maxCount event.Event.created.branchId
FROM c JOIN event IN c["value"]
WHERE event.Event.created.repositoryId = @repositoryId
-- Example Branch `partitionKey`: "grace-server||BranchActor||6b5f66c7-4fe6-4ebe-81fc-0a0d8da22882"
-- i.e. the BranchId I'm querying for is in the `partitionKey` so there's no way to know it before the query.
That WHERE
clause does not contain a partitionKey
value - and I can't specify one because of Dapr's default partitionKey
values - and so every time I run this query, it's going to be a cross-partition query. Same for things like Get all references for a branch
or Get DirectoryVersion that matches a RelativePath
. Essentially, I don't think there are any queries in Grace where I can specify the partitionKey
in the WHERE
clause, so they're all cross-partition queries.
Desired state
If I could specify my own partitionKey
value, for instance, using the RepositoryId
as the partitionKey
for all Branches / References / DirectoryVersions in a repository, I'd be able to include it in the query, limiting my query to a single partition, improving performance and lowering cost.
Considerations
User-configurable partitionKey
values already exist in Dapr for non-Actor State storage, so we have prior art.
I'm aware that implementing this would require non-trivial changes both in Dapr and in the SDK's. That's why I want to open a discussion about it before I write any code for it. I do not expect that I'm aware right now of all of the downstream effects of such a change, and I don't propose it lightly.
Thoughts? I'm absolutely willing to help work on this to get it across the finish line, but I won't start without some indication that the changes would be welcomed. 😀