8000 Fix teeserver context reset issue & add container signature cache by yawangwang · Pull Request #397 · google/go-tpm-tools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix teeserver context reset issue & add container signature cache #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

yawangwang
Copy link
Collaborator
@yawangwang yawangwang commented Dec 5, 2023

Discussed offline, we decide to introduce container signatures cache to attestation agent.
So we should refresh the signatures cache every time the token refresher goroutine runs (~1/hr), and ODA will only read from the cache.
This PR also includes the following changes:

  • fixes teeserver context reset issue
  • removes the existing experiment flag EnableSignedContainerImage, add a new flag EnableSignedContainerCache

@@ -4,7 +4,7 @@ substitutions:
'_CLEANUP': 'true'
'_VM_NAME_PREFIX': 'discover-signatures'
'_ZONE': 'us-west1-a'
'_WORKLOAD_IMAGE': 'us-west1-docker.pkg.dev/confidential-space-images-dev/cs-integ-test-images/basic-test:latest'
'_WORKLOAD_IMAGE': 'us-west1-docker.pkg.dev/confidential-space-images-dev/cs-integ-test-images/ipc/happypath:latest'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have a test for the default token to ensure signed container works with it as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I created a separate test for ODA and signed container, and keep this test for the default token.

@yawangwang yawangwang force-pushed the teeserver-context-reset branch 2 times, most recently from 73390b7 to ec9ef03 Compare December 5, 2023 21:38
@yawangwang yawangwang force-pushed the teeserver-context-reset branch 2 times, most recently from 5b07759 to 99ff543 Compare December 6, 2023 00:22
@yawangwang yawangwang requested a review from alexmwu January 5, 2024 21:18
- name: 'gcr.io/cloud-builders/gcloud'
id: BasicDiscoverSignaturesTest
entrypoint: 'bash'
args: ['scripts/test_launcher_workload_discover_signatures.sh', '${_VM_NAME_PREFIX}-${BUILD_ID}', '${_ZONE}', '2']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect 2 signatures? I only see one cosign call above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is to check how many times Found container image signatures is being logged. Fetching signatures will occur on refresh the default token, and on teeserver receiving ODA requests, so the expected number is 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case better comments in the test and/or in launcher/image/test/scripts/test_launcher_workload_discover_signatures.sh would be good. When reading it, I assumed it was trying to find the number of signatures.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should compare the signatures fetched are the same somehow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@yawangwang yawangwang force-pushed the teeserver-context-reset branch 2 times, most recently from bf31121 to 8e4f201 Compare January 8, 2024 22:48
@yawangwang yawangwang changed the title Fix teeserver context reset issue Fix teeserver context reset issue & add container signature cache Jan 8, 2024
@yawangwang yawangwang requested a review from alexmwu January 8, 2024 23:43
tc := tc
t.Run(tc.name, func(t *testing.T) {
// Read cache from a separate goroutine.
t.Parallel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also read the cache before setting?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments below.

@@ -255,3 +253,58 @@ func TestFetchContainerImageSignatures(t *testing.T) {
})
}
}

func TestCache(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why we need this test. I think I would ideally want to a few things for this change: synchronization on the mutex (throwing goroutines and trying to catch a race condition), making sure Refresh updates appropriately when the backing store changes (e.g., add sig, delete sig), and finally refreshToken does indeed update the default token and the cache (unit test and perhaps a manual test).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

  • Added unit tests TestCacheConcurrentSet and TestCacheConcurrentGet that spawn goroutines for synchronization checks.
  • Added a unit test TestRefreshTokenWithSignedContainerCacheEnabled in container_runner_test.go to check refreshToken updates the default token and cache appropriately when the backing store changes (signatures get updated.)

- name: 'gcr.io/cloud-builders/gcloud'
id: BasicDiscoverSignaturesTest
entrypoint: 'bash'
args: ['scripts/test_launcher_workload_discover_signatures.sh', '${_VM_NAME_PREFIX}-${BUILD_ID}', '${_ZONE}', '2']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should compare the signatures fetched are the same somehow

@yawangwang yawangwang force-pushed the teeserver-context-reset branch 4 times, most recently from 8027f4e to d442c4c Compare January 25, 2024 18:30
@yawangwang yawangwang requested a review from alexmwu January 25, 2024 18:54
Comment on lines 260 to 309
func TestCacheConcurrentGet(t *testing.T) {
ctx := context.Background()

cache := &sigsCache{}
if sigs := cache.get(); len(sigs) != 0 {
t.Errorf("signature cache should be empty, but got: %v", sigs)
}

sdClient := signaturediscovery.NewFakeClient()
targetRepos := []string{signaturediscovery.FakeRepoWithSignatures}
original := fetchContainerImageSignatures(ctx, sdClient, targetRepos, log.Default())
cache.set(original)

wantBase64Sigs := convertOCISignatureToBase64(t, original)

group, _ := errgroup.WithContext(ctx)
for i := 0; i < runtime.NumCPU(); i++ {
group.Go(func() error {
cached := cache.get()
gotBase64Sigs := convertOCISignatureToBase64(t, cached)
if !cmp.Equal(wantBase64Sigs, gotBase64Sigs) {
return fmt.Errorf("cached base64 sigs not equal to original base64 sigs, got %v, but want %v", gotBase64Sigs, wantBase64Sigs)
}
return nil
})
}

if err := group.Wait(); err != nil {
t.Errorf("cache concurrent get failed: %v", err)
}
}

func TestCacheConcurrentSet(t *testing.T) {
ctx := context.Background()

cache := &sigsCache{}
if sigs := cache.get(); len(sigs) != 0 {
t.Errorf("signature cache should be empty, but got: %v", sigs)
}

sdClient := signaturediscovery.NewFakeClient()
targetRepos := []string{signaturediscovery.FakeRepoWithSignatures}

var wg sync.WaitGroup
for i := 0; i < runtime.NumCPU(); i++ {
wg.Add(1)
go func() {
defer wg.Done()
sigs := fetchContainerImageSignatures(ctx, sdClient, targetRepos, log.Default())
cache.set(sigs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we aren't intermingling sets and gets in a concurrent manner, we aren't actually testing the sync primitive. Consider intermingling data accesses and using the -race flag: https://go.dev/doc/articles/race_detector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logger: log.Default(),
}

if err := os.MkdirAll(launcherfile.HostTmpPath, 0744); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since refreshToken will always write OIDC token to a fixed location, I think we'll have to create a fixed directory here instead of a random temp test directory.

t.Fatalf("refreshToken returned with error: %v", err)
}

// simulate adding signatures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and below capitalize comments

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

'_CLEANUP': 'true'
'_VM_NAME_PREFIX': 'oda-signedcontainer'
'_ZONE': 'us-east1-b'
'_WORKLOAD_IMAGE': 'us-west1-docker.pkg.dev/confidential-space-images-dev/cs-integ-test-images/ipc/happypath@sha256:999831a7b8f8afd323e2359f3c1192206be2aa1d4f3b19f0739eff5f01f83b9e'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here that if the workload image changes, the commit author should change the cosign signature as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@yawangwang yawangwang force-pushed the teeserver-context-reset branch 4 times, most recently from 0688008 to fe1d199 Compare January 30, 2024 19:08
@yawangwang yawangwang force-pushed the teeserver-context-reset branch from fe1d199 to 235f223 Compare January 30, 2024 19:51
@yawangwang
Copy link
Collaborator Author

/gcbrun

@yawangwang yawangwang merged commit fd156ad into google:main Jan 30, 2024
alexmwu added a commit to alexmwu/go-tpm-tools that referenced this pull request Feb 22, 2024
New Features:
[launcher] Add TEE server IPC implementation google#367
[launcher] Enable memory monitoring in CS google#391
Use TDX quote provider to attest and verify google#405
Integrate nonce verification as part of the TDX quote validation procedure. google#395
Add RISC V support google#407
[launcher] Use resizable integrity-fs with in-memory tags google#412

Bug Fixes:
[launcher] Fix launcher exit code google#384
[launcher] Handle exit code checking during deferral evaluation google#392
[cmd] Skip tests that call setGCEAKTemplate google#402
[launcher] Fix teeserver context reset issue & add container signature cache google#397
Set all unused parameters as _ to fix CI lint failure google#411
[launcher] Make customtoken test sleep to mitigate clock skew google#413

Other Changes:
Add eventlog parse logics for memory monitoring google#404
[launcher]: Add memory monitor measurement logics google#408
Update go-tdx-guest version to v0.3.1 google#414

New Contributors:
@KeithMoyer in google#392
@vbalain in google#405
@aimixsaka in google#407
@alexmwu alexmwu mentioned this pull request Feb 22, 2024
alexmwu added a commit that referenced this pull request Feb 22, 2024
New Features:
[launcher] Add TEE server IPC implementation #367
[launcher] Enable memory monitoring in CS #391
Use TDX quote provider to attest and verify #405
Integrate nonce verification as part of the TDX quote validation procedure. #395
Add RISC V support #407
[launcher] Use resizable integrity-fs with in-memory tags #412

Bug Fixes:
[launcher] Fix launcher exit code #384
[launcher] Handle exit code checking during deferral evaluation #392
[cmd] Skip tests that call setGCEAKTemplate #402
[launcher] Fix teeserver context reset issue & add container signature cache #397
Set all unused parameters as _ to fix CI lint failure #411
[launcher] Make customtoken test sleep to mitigate clock skew #413

Other Changes:
Add eventlog parse logics for memory monitoring #404
[launcher]: Add memory monitor measurement logics #408
Update go-tdx-guest version to v0.3.1 #414

New Contributors:
@KeithMoyer in #392
@vbalain in #405
@aimixsaka in #407
alexmwu added a commit to alexmwu/go-tpm-tools that referenced this pull request Mar 29, 2024
New Features:
[launcher] Add TEE server IPC implementation google#367
[launcher] Enable memory monitoring in CS google#391
Use TDX quote provider to attest and verify google#405
Integrate nonce verification as part of the TDX quote validation procedure. google#395
Add RISC V support google#407
[launcher] Use resizable integrity-fs with in-memory tags google#412

Bug Fixes:
[launcher] Fix launcher exit code google#384
[launcher] Handle exit code checking during deferral evaluation google#392
[cmd] Skip tests that call setGCEAKTemplate google#402
[launcher] Fix teeserver context reset issue & add container signature cache google#397
Set all unused parameters as _ to fix CI lint failure google#411
[launcher] Make customtoken test sleep to mitigate clock skew google#413

Other Changes:
Add eventlog parse logics for memory monitoring google#404
[launcher]: Add memory monitor measurement logics google#408
Update go-tdx-guest version to v0.3.1 google#414

New Contributors:
@KeithMoyer in google#392
@vbalain in google#405
@aimixsaka in google#407
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0