-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propose a way to get service account token for csi driver #1855
Conversation
Hi @zshihang. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@zshihang: GitHub didn't allow me to request PR reviews from the following users: seh, kfox1111. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
4. The token is not guaranteed to be available (e.g. | ||
`automountServiceAccountToken=false`). | ||
|
||
### User stories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AWS would like this for granting projected tokens to our EFS CSI driver so the driver can get a pod's AWS credentials (https://github.com/kubernetes-sigs/aws-efs-csi-driver & aws/efs-utils#60)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@micahhausler can you clarify if efs driver would need this at CreateVolume time, or is NodePublish sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodePublish
5. Every 0.1 second, the reconciler component of volume manager will remount | ||
the volume in case the vault secrets expire and re-login is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if one of these remounts (NodePublish?) fails? What if kubelet is not able to refresh a token after expiration? There is no facility in kubelet to kill the affected pod, do you plan to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Starting separate "thread")
Does this mean that kubelet must cache "vault tokens" of all running pods and their lifetime, to be able to refresh a token when it expires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if one of these remounts (NodePublish?) fails?
return errors? do you have any suggestion?
What if kubelet is not able to refresh a token after expiration?
the token will be refreshed regularly before expiration.
Does this mean that kubelet must cache "vault tokens" of all running pods and their lifetime, to be able to refresh a token when it expires?
the token is already being cached and refreshed before expiration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning errors only produces events to users. A pod keeps running, possibly with expired / stale data, without really knowing about it. Is it something that's acceptable? Or are there cases when a pod that uses a secret must be killed when it looses access to a secret? In that case you should add some note here + code to kill pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the code, the pod will not be killed if the first volume mount fails. i expect the remounts will adopt the same behavior. @seh what will happen if the mounted secrets expire? will the secrets be deleted so that the users can be notified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
killing the container if they ever get stale
if the secret can ever get stale, the secrets provider (e.g. vault) should revoke those secrets, rather than the secrets consumers (e.g. k8s). we can not rely on the credential consumers not to use expired secrets. e.g. the pod cache the secret in memory even if the secret is replaced with another fresh one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are fine ideas, but that's not how all the secret engines in Vault work.
Consider an administrator who stores an API key in a Vault K/V secret. Perhaps that administrator usually rotates the key about every other month, leaving a couple of weeks of overlap before revoking the previous key. She might set an advisory TTL on the secret read from Vault for one week, advising consuming applications that they should probably check for a fresh key every week so. An old key might still work for a long while after that week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going back to the original question on what to do when the remount, i would still like to keep the current behavior where the error will only be logged because:
- stale secrets might still work and it is not security critical for the consumer of credentials (k8s) to stop using the stale secrets.
- the repeated remount will overcome transient errors.
- there is no guarantee that killing the container/pod will solve fatal errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of good discussion here. Can we capture all this somewhere in the KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
`RequiresRemount` is optional, which should be only set when the mounted volumes | ||
by the CRI driver have TTL and require re-validation on the token. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate on interaction between RequiresRemount
and token lifetime? What should happen when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for vault, the token is used to authenticate as the pod's service account so that the driver can fetch the secrets stored in vault. however, the secrets have TTL, so it will authenticate again to fetch the newer secrets. in this case, RequiresRemount==true
.
for other csi that doesn't need re authentication, RequiresRemount==false
, e.g. cert manager csi driver wants to create CertificateRequest on behalf of the pod the credential of which is only needed in the first mount if the certificate doesn't expire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean remounting the volume set up by the driver that consumes the service account token? In my driver that refreshes its service account tokens, I don't remount the volume; I use (a port of) AtomicWriter
to replace the files holding the secrets fetched from Vault.
With secrets (or X.509 certificates) acquired from Vault, having authenticated to Vault with a Kubernetes service account token, there are three things that can expire:
- The secret data
(By way of an optional advisory TTL.) - The Vault token
(Usually you can refresh it, but when that offer runs out, you need to authenticate again.) - The Kubernetes service account token
(Minimum validity period from the TokenRequest API is ten minutes.)
Even when the secret data changes, though, it's not necessary to unmount and remount volumes. The CSI driver can replace that data on disk "atomically".
If I've misunderstood what is being remounted here, feel free to ignore my critique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean remounting the volume set up by the driver that consumes the service account token?
the remount is triggered by kubelet, which means a sequence of NodePublishVolume
will be called. this way
can assure the token not going stale and move the common token fetching logic from different csi drivers to kubelet.
I use (a port of) AtomicWriter to replace the files holding the secrets fetched from Vault.
this is good. many existing volumes which requires remount (downwardAPI, projected service account token) are also using AtomicWriter.
it is true that there is extra overhead incurred by the periodic remount operation in this approach, but you can still use AtomicWriter right? what the difference in overhead between the writes via AtomicWriter and mounts for the secrets store csi driver? the difference in overhead seems minimal for other in tree volumes using AtomicWriter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no remounting of volumes necessary, though. The process running in the container may be reading the files that would be unmounted. It introduces a gap during which the files would not be available any longer, and in most cases slightly stale files are better than no files.
For most drivers, unmounting involves recursively deleting files from a directory, and mounting requires creating directories. There may be a mixture of files within such a mounted volume, with differing expiration times. Sweeping it all away and writing it again a short time later is not what these other drivers are doing with AtomicWriter
, and it's not how my driver works. AtomicWriter
helps facilitate in-place updates of a tree of files. Deleting all the files and writing them again later is not an atomic update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, i can add the option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, term "remount" comes from Kubernetes RequiresRemount handling
It has never remounted anything, it refreshes atomic writer volumes. Refactoring to "Refresh" would be better, but orthogonal to this enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane do you think we need to clarify in the csi spec this new refresh behavior or leave this as a Kubernetes-specific behavior? Although already, the drivers that support this would have to be Kubernetes-specific anyway since they need to make the TokenRequest API call. Right now I don't think it's expected the NodePublishVolume could get called repeatedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the csi driver would not make api calls. TokenRequest will be called by kubelet and TokenReview is called by storage provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, NodePublish will be called frequently. So far, NodePublish was called only once when mounting the volume for the first time. As long as the NodePublish is called in ephemeral CSI volumes (secrets and similar stuff), they're Kubernetes extension anyway and we don't need to update CSI spec.
AWS EFS does not seem to be interested in periodic NodePublish when the token expires, right @micahhausler?
This KEP proposes a way to obtain service account token for pods that the CSI | ||
drivers are provisioning volumes for. Since these tokens are valid only for a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence implies that the token is available in CreateVolume, while the token is passed in NodePublish (which makes more sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will reword it. :)
`mountedPods` will have `requiresRemound=true` after `MarkRemountRequired` | ||
is called. `MarkRemountRequired` will call into `RequiresRemount` of the | ||
in-tree csi plugin to fetch the `CSIDriver` object. | ||
3. Before `NodePublishVolume` call, kubelet will request token from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some CSI drivers need ControllerPublish and NodeStage calls. Do you plan to add the token to these too? If not,
can we safely assume that these calls won't ever need such a token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to add the token to these too?
at this time it is a no. in the current collected user stories, vault and cert manager only need this token in NodePublishVolume
calls to authenticate.
can we safely assume that these calls won't ever need such a token?
i didn't find a good reason to use the pod's token in the controller rpc services and NodeStage/NodeUnstage
as they are cluster operations or early stage node operations which not much related to pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's used in EFS, a real persistent volume that coincidentally uses only NodePublish, I think that there is only tiny step to other volume plugins that may require NodeStage / ControllerPublish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NodeStage/ControllerPublish may add additional challenges because multiple pods can share the same volume at those operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an explicit warning that tokens are do not cover ControllerPublish/NodeStage in the KEP (and in subsequent docs in github.com/kubernetes-csi/docs) will do for now.
4. The token is not guaranteed to be available (e.g. | ||
`automountServiceAccountToken=false`). | ||
|
||
### User stories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@micahhausler can you clarify if efs driver would need this at CreateVolume time, or is NodePublish sufficient?
... // existing fields | ||
|
||
RequiresRemount *bool | ||
ServiceAccountTokenAudience *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the valid values for this string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is just an arbitrary non empty identifier.
the SP is supposed to send a TokenReview request with that audience included to validate the token.
/ok-to-test |
8f4198f
to
7ab5beb
Compare
/assign @msau42 any more comments? shall we proceed with implementation? |
e4ad6d4
to
07f052b
Compare
`TokenRequestStatus` will be passed to CSI driver. | ||
|
||
**`RequiresRemount`**: is optional, which should be only set when the mounted | ||
volumes by the CRI driver have TTL and require re-validation on the token. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: csi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
... // existing fields | ||
|
||
RequiresRemount *bool | ||
ServiceAccountTokenAudiences []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to put the two service account related fields into another struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
5. Every 0.1 second, the reconciler component of volume manager will remount | ||
the volume in case the vault secrets expire and re-login is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of good discussion here. Can we capture all this somewhere in the KEP?
plugin to do necessary validation and every csi driver needs to reimplement | ||
the same functionality. | ||
|
||
## Implementation History |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to fill out the test plan, graduation criteria, and production readiness sections before we can move the kep to implementable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding criteria and production readiness, seems they need FeatureGate. i am not sure we need feature gate for this one as it doesn't change existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any API change we need to make would require going through the alpha/beta/ga feature gates so that we have the ability to make changes after testing and receiving feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
`mountedPods` will have `requiresRemound=true` after `MarkRemountRequired` | ||
is called. `MarkRemountRequired` will call into `RequiresRemount` of the | ||
in-tree csi plugin to fetch the `CSIDriver` object. | ||
3. Before `NodePublishVolume` call, kubelet will request token from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NodeStage/ControllerPublish may add additional challenges because multiple pods can share the same volume at those operations.
02480b1
to
963b297
Compare
/approve /hold |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msau42, zshihang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kubernetes/sig-auth-proposals |
keps/sig-storage/1855-csi-driver-service-account-token/README.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/1855-csi-driver-service-account-token/README.md
Outdated
Show resolved
Hide resolved
keps/sig-storage/1855-csi-driver-service-account-token/README.md
Outdated
Show resolved
Hide resolved
|
||
// ServiceAccountToken contains parameters of a token. | ||
type ServiceAccountToken struct { | ||
Audiences []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should probably only allow a single audience. I'd like @liggitt to weigh in on the API review. I don't think this needs to block though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the use case to consider is secrets store csi driver which possibly has multiple providers. there are two ways:
- currently the approach is to distribute one token with multiple audiences to the secrets store csi driver.
- another way would be distribute multiple tokens (each has single audience).
FYI @seh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion to allow multiple audiences was for parity with the TokenRequest API, and to allow operators greater flexibility in talking to, say, more than one Vault authentication role, where they each demand a different audience. Doing so would also allow changing the configuration of such a role, and allowing multiple CSI driver daemons to play along through the transition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both ways can suffice the multi-audience in a csi driver scenario. take the secrets store csi driver as an example, the csi driver only needs to pass the single token with multiple audiences to all the providers, while if adopting the second approach, the csi driver needs to hardcode a mapping between the tokens from kubelet and the providers. the advantage i can think of of the second approach is every provider gets its own token but i am not sure how much security sugar we get here. seems to me that the convenience of the first approach outweighs the advantages that the second approach provides to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the csi driver only needs to pass the single token with multiple audiences to all the providers
passing replayable credentials which are considered valid by multiple providers to all the providers does not seem like a good idea
allow operators greater flexibility in talking to, say, more than one Vault authentication role, where they each demand a different audience
I'm not very familiar with this... is this a scenario where you are presenting the token to a single vault server, using multiple audiences as some sort of scoping mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a scenario where you are presenting the token to a single vault server, using multiple audiences as some sort of scoping mechanism?
Yes. A Vault operator can mount the Kubernetes auth method any number of times. Within each mounted auth method, you can create any number of roles. Each such role defines which service account names, within which namespaces, and bound to which audience (and only a single audience, questioned here and here, alas too late) to accept.
In my CSI driver, a pod can read multiple Vault KV secrets and request multiple X.509 certificates, each involving authenticating as a different Vault role. Each of these roles could demand that the principal present a service account token bound to a different audience. If a given service account token is bound to multiple audiences—perhaps the union of what these Vault roles demand—then a single token could suffice for authenticating as each Vault role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing replayable credentials which are considered valid by multiple providers to all the providers does not seem like a good idea
agreed. when passing multiple tokens to the csi driver, in the case of secrets store csi driver, it is required to implement a mapping between the tokens and the providers. my current idea is to pass a map in VolumeAttributes
:
csi.storage.k8s.io/serviceAccount.tokens: {
'audience-0': {
'token': token-0,
'expiry': expiry-0,
},
'audience-1': {
'token': token-1,
'expiry': expiry-1,
},
...
}
regarding Vault roles which might have different audiences, the csi driver will pass all the tokens with audience in the format 'vault-*' to Vault provider. of course, it is a non-goal in this KEP but it is something that the csi driver needs to implement to accommodate the Vault case.
... // existing fields | ||
|
||
RequiresRemount *bool | ||
ServiceAccountToken *ServiceAccountToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a usecase for multiple service account tokens? If I'm using the secret driver with both AWS and valut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/lgtm |
This is being implemented in kubernetes/kubernetes#93130 and currently targeted for v1.20 /milestone v1.20 |
@micahhausler: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@micahhausler is there an enhancement issue open for this that is being tracked by the release team? |
Ref: kubernetes/kubernetes#86448