8000 [Feature] FederatedResourceQuota Enhancement · Issue #6350 · karmada-io/karmada · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Feature] FederatedResourceQuota Enhancement #6350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 15 tasks
mszacillo opened this issue May 6, 2025 · 16 comments
Open
2 of 15 tasks

[Feature] FederatedResourceQuota Enhancement #6350

mszacillo opened this issue May 6, 2025 · 16 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@mszacillo
Copy link
Contributor
mszacillo commented May 6, 2025

Summary
This feature aims to enhance the existing FederatedResourceQuota so it can impose namespaced resource limits directly on the Karmada control-plane level. This feature enhancement has several benefits, including:

  1. Track resource consumption and limits from a unified place
  2. Prevent users from over-consuming aggregate resources across clusters
  3. Easier for users of application and cluster failover to ensure that fallback clusters have enough resources to house all the user's workflows

Proposal:

Iteration Tasks -- Part-1: FederatedResourceQuota Enhancement MVP

Iteration Tasks -- Part-2: Further Enhancements beyond MVP

  • [API Change] Add Scope to the FederatedResourceQuotaSpec
  • Consider improvements to reconcile strategy for overall FRQ status controller (use cache instead of listing all RB each reconcile)
  • FederatedResourceQuota controller should requeue unschedulable RBs to be scheduled in the case of a change to overall quota usage
  • Determine if changes necessary to scheduler when reacting to denied RB updates
@mszacillo mszacillo added the kind/feature Categorizes issue or PR as related to a new feature. label May 6, 2025
@mszacillo
Copy link
Contributor Author

@RainbowMango Let me know if this looks alright!

Also, @zhzhuang-zju, do you have any preference on how to subdivide this work between us?

@RainbowMango RainbowMango moved this from Accepted to Planned In Release 1.14 in Karmada Overall Backlog May 7, 2025
@RainbowMango RainbowMango added this to the v1.14 milestone May 7, 2025
@RainbowMango
Copy link
Member

Generally looks great! Thanks.
I will leave comments if I want to twist or add something.
For now, I'm thinking of introducing a feature gate, like what we did for the StatefulFailoverInjection, do you have any suggestion for the feature name?

@RainbowMango
Copy link
Member

@mszacillo The origin behavior of staticAssignments might be a distraction for this feature as with this field unset, the controller still creates an empty ResourceQuota to all clusters, that doesn't make sense, because empty ResourceQuota can not do anything.

Sent #6351 for updating the comments, please take a look, and proposing two iterated tasks:

  • (API) Adjust the behavior of the legacy staticAssignments to skip creating redundant ResourceQuotas to member clusters.
  • (Controller) Adjust the behavior of the legacy staticAssignments to skip creating redundant ResourceQuotas to member clusters.

@mszacillo
Copy link
Contributor Author

Great! I've gone ahead and updated the iteration tasks to include the two other staticAssignments tasks.

Just to double check, do these iteration tasks make sense as well:

  • Update federated-resource-quota-status-controller to skip reconcile if staticAssignment is not set
  • FederatedResourceQuota validating webhook should block updates to enable staticAssignment on Update requests

The thought it is that the staticAssignment controller and new federated resource quota controller will both attempt to update the status of the federated resource quota resource. So we want to prevent users from toggling between the two features. If someone wants to use staticAssignment, they can delete their quota and reapply using staticAssignment.

For now, I'm thinking of introducing a feature gate, like what we did for the StatefulFailoverInjection, do you have any suggestion for the feature name?

Perhaps FederatedQuotaEnforcement? Or GlobalQuotaEnforcement?

@RainbowMango
Copy link
Member

The thought it is that the staticAssignment controller and new federated resource quota controller will both attempt to update the status of the federated resource quota resource.

This is a technical issue and can be resolved. Concurrent updates to the same resource object by multiple controllers are a common scenario. By clearly defining the responsibilities of each controller to ensure they do not modify the same fields simultaneously, we can resolve this problem.

In this case, we can let the federated-resource-quota-status-controller only update the .status.aggregatedStatus:

// AggregatedStatus is the observed quota usage of each cluster.
// +optional
AggregatedStatus []ClusterQuotaStatus `json:"aggregatedStatus,omitempty"`
, and the .status.overallUsed will be managed by the new controller.

FederatedResourceQuota validating webhook should block updates to enable staticAssignment on Update requests

So, we might not need to do this. The staticAssignments field is still an optional field. Once it is set, the origin controller will create ResourceQuota for member clusters.

Update federated-resource-quota-status-controller to skip reconcile if staticAssignment is not set

Basically yes, both the federated-resource-quota-status-controller and federated-resource-quota-sync-controller are for reconciling staticAssignment (I'm thinking of renaming it, by the way).

But there is a corner case that we probably can not just skip reconciliation even staticAssignment is unset, consider the following case: what if the staticAssignment is removed during an update operation, if we skip the reconciliation, the status(.status.aggregatedStatus) will not be cleaned up.

@RainbowMango
Copy link
Member

Perhaps FederatedQuotaEnforcement? Or GlobalQuotaEnforcement?

I like the FederatedQuotaEnforcement as the feature name and feature gate name.

@mszacillo
Copy link
Contributor Author

In this case, we can let the federated-resource-quota-status-controller only update the .status.aggregatedStatus:

Gotcha, I should have been more clear. The concern wasn't that two controllers are editing the status, but rather that they will be editing the same field OverallUsed. I'll edit the iteration tasks to account for this!

So, we might not need to do this. The staticAssignments field is still an optional field. Once it is set, the origin controller will create ResourceQuota for member clusters.

But there is a corner case that we probably can not just skip reconciliation even staticAssignment is unset, consider the following case: what if the staticAssignment is removed during an update operation, if we skip the reconciliation, the status(.status.aggregatedStatus) will not be cleaned up.

Makes sense! I'll edit this iteration task as well. Should I include the renaming of staticAssignment as part of this umbrella list, or should it be standalone?

@RainbowMango
Copy link
Member
RainbowMango commented May 9, 2025

I'll edit this iteration task as well. Should I include the renaming of staticAssignment as part of this umbrella list, or should it be standalone?

OK, we can have one. Refactoring/Twisting the legacy feature is also part of the new feature.

[edit]:
Renaming the legacy controller may not be mandatory. Given that the new controller might be named federated-resource-quota-enforcement-controller, it seems unlikely to cause much misunderstanding. Then we will have 3 controllers:

  • (legacy) federated-resource-quota-sync-controller: synchronizes Kubernetes ResourceQuota to member clusters based on staticAssignment field
  • (legacy) federated-resource-quota-status-controller: aggregates and reports the usage status of ResourceQuota across all member clusters.
  • (new) federated-resource-quota-enforcement-controller: Calculates and reports the usage based on the enforcement at the Federation side.

What do you think? Do you have any preference for the new controller name? @mszacillo @zhzhuang-zju

@RainbowMango
Copy link
Member

By the way, please assign the task to yourself and @liwang0513 once you have selected it. PS: Tasks without an assignment are open for anyone to pick.

@zhzhuang-zju
Copy link
Contributor
zhzhuang-zju commented May 9, 2025

@mszacillo Thanks for your work. The tasks you have outlined are very specific and detailed.

Based on that, I only have a few minor comments:

  • Add feature documentation and testing tasks to Iteration Tasks -- Part-1.
  • User awareness: If the detector fails to update the ResourceBinding, or if the karmada-scheduler fails to update the ResourceBinding, we can improve user awareness by using events and conditions. This can also be added to Iteration Tasks -- Part-1.
  • Can the task FederatedResourceQuota controller should requeue unschedulable RBs to be scheduled in the case of a change to overall quota usage be moved to Iteration Tasks -- Part-2?
    Preventing the scheduler from repeatedly attempting to schedule RBs that would exceed the remaining quota, and figuring out how to requeue unschedulable RBs, is more of a performance optimization for this feature. The design and development effort for this part is relatively significant, so it could be excluded from Part-1 and handled simply with retry-on-failure logic for now.

Besides, should we limit the resource names that FederatedResourceQuota supports? FYI, there are three types of ResourceQuota: Compute Resource Quota, Storage Resource Quota, and Object Count Quota. In Iteration Tasks -- Part-1, w 8000 e may only be able to support Compute Resource Quota.

Also, @zhzhuang-zju, do you have any preference on how to subdivide this work between us?

Please feel free to pick the tasks you're interested in. I'm glad to join in design discussions and help with code reviews. If I see something I like, I’ll just assign it to myself too.

@zhzhuang-zju
Copy link
Contributor

@mszacillo @RainbowMango I would like to improve the output of kubectl/karmadactl get federatedResourceQuota. The current output looks like this:

✗ kubectl get federatedresourcequotas.policy.karmada.io                                       
NAME   AGE
test   3m59s

There is no useful information currently. We can also add the overall and overallUsed fields to the output, like:

kubectl get federatedresourcequotas.policy.karmada.io
NAME   OVERALL          OVERALL_USED
test   {"cpu":"100m"}   

If that makes sense to you, we can include this task in Iteration Tasks -- Part-1.

@zhzhuang-zju
Copy link
Contributor

[Controller change] Adjust the behavior of the legacy staticAssignments to skip creating redundant ResourceQuotas to member clusters.

@mszacillo Can you assign this task to me? Thanks~

@mszacillo
Copy link
Contributor Author

Sounds great. I have updated the iteration tasks! Will start preparing necessary PRs shortly.

@RainbowMango
Copy link
Member

Proposal one more taskValidate FederatedResourceQuota length to no more than 63 chars (#2168), @likakuli sent about 3 years ago :).

@seanlaii
Copy link
Contributor

Hi, I am interested in

Add ResourceBinding validation webhook to monitor updates to resourcebinding.spec and resourcebinding.spec.replicaRequirements after scheduling result

  • Check delta in quota usage and deny request if above limits
  • Otherwise update federatedresourcequota.status.overallUsed and then approve update request

, could I take it?

@mszacillo
Copy link
Contributor Author

@seanlaii Please go ahead! I will assign it to you. Let me know if you have questions - we can discuss more on Monday. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Planned In Release 1.14
Development

No branches or pull requests

4 participants
0