-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Windows: Try to detach sandbox on cleanup to avoid permission denied #37712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note this might not (won't) pass vendoring until #37674 is merged. |
@carlfischer1. Another backport candidate. |
// before retrying the rename. | ||
if os.IsPermission(err) { | ||
f := filepath.Join(layerPath, "sandbox.vhdx") | ||
if _, exists := os.Stat(f); exists == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of strange to call the error exists
, since the true case is nil
, but okay, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this is confusing to read. Does can we just call DetachVhd
directly instead of checking existence first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update to address both of these.
return err // the original error from rename | ||
} | ||
} | ||
err = os.Rename(layerPath, tmpLayerPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displa 8000 yed to describe this comment to others. Learn more.
Is it really worth retrying if nothing changed (the sandbox file did not exist)?
Codecov Report
@@ Coverage Diff @@
## master #37712 +/- ##
=========================================
Coverage ? 36.02%
=========================================
Files ? 610
Lines ? 45071
Branches ? 0
=========================================
Hits ? 16238
Misses ? 26603
Partials ? 2230 |
Rebased. Windows CI and vendor passed (the only path affected here). Rest are completing. @cpuguy83 are you OK with this now? @thaJeztah Could you take a look too please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some additional context to the error messages?
if err = os.Rename(layerPath, tmpLayerPath); err != nil && !os.IsNotExist(err) { | ||
return err | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but a little cleaner if we return early when "!os.IsPermission(err)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, missed that comment @cpuguy83. Yes, that's a lot cleaner. Updated.
Signed-off-by: John Howard <jhoward@microsoft.com>
Can you just add some extra context to the new errors, e.g. |
@cpuguy83 Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -6,7 +6,7 @@ import ( | |||
"bufio" | |||
"bytes" | |||
"encoding/json" | |||
"errors" | |||
//"errors" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh. Fixing....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now. Good catch 👍
Signed-off-by: John Howard <jhoward@microsoft.com> This is a fix for a few related scenarios where it's impossible to remove layers or containers until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill while a container is running. Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and left on disk under the dataroot\windowsfilter directory after the container is removed. One such example of a failure: 1. run a long running container with the --rm flag docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30 2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle. 3. (re-)Start daemon 4. docker ps -a PS C:\control> docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7aff773d782b malloc "powershell start-sl…" 11 seconds ago Removal In Progress malloc 5. Try to remove PS C:\control> docker rm 7aff Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied. PS C:\control> Step 5 fails.
@thaJeztah Do you know if this will make it into 18.09 EE? Not sure what the cut-off was. |
@jhowardmsft looks like it didn't (but will double check when I'm back at my keyboard), I can see I opened a backport request for it, but it slipped of my radar. I think a patch release is scheduled, so I'll open backports where needed, so that it will be included in that release |
@thaJeztah Thanks. Kind of an important one as I know it fixes multiple production issues I've had to debug. |
Still seeing this with 19.03.2 on a clean Windows 2016 VM created from ISO image and fully patched. driver "windowsfilter" failed to remove root filesystem: hcsshim::GetComputeSystems: Access is denied. |
We had the same issue as @drnybble. (We used Unlocker Windows to delete the files) |
I am still facing this problem. I use the "Docker Desktop" in Windows 10 OS, I built image with "mcr.microsoft.com/windows/servercore:1709", during the build I got the following message
I could not remove the left-behind containers with command "docker container prune --force", after I restarted my computer, I could use that command to delete the left-behind containers. Below is the information of "Docker" that I am using
|
I have the same issue as #37712 (comment) .
aka
running the |
Fixes #36218
A few different scenarios, but it's possible that the daemon can leak sandbox/scratch layers in the Windows graphdriver, and have side effects where containers can't be removed. Here's one such case:
Before:
After: