8000 Windows: Try to detach sandbox on cleanup to avoid permission denied by lowenna · Pull Request #37712 · moby/moby · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Windows: Try to detach sandbox on cleanup to avoid permission denied #37712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 9, 2018

Conversation

lowenna
Copy link
Member
@lowenna lowenna commented Aug 23, 2018

Fixes #36218

A few different scenarios, but it's possible that the daemon can leak sandbox/scratch layers in the Windows graphdriver, and have side effects where containers can't be removed. Here's one such case:

Before:

C:\>start dockerd -g c:\control

C:\>docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
8058fa79bb1c1b580b32bc2e4e27f22090a27fd8cef17aaa31ae02342982e7d3

C:\>taskkill /T /IM dockerd.exe
SUCCESS: Sent termination signal to process with PID 21396, child of PID 21248.
SUCCESS: Sent termination signal to process with PID 21248, child of PID 29336.

C:\>start dockerd -g c:\control

C:\>docker ps -a
CONTAINER ID        IMAGE                         COMMAND                 CREATED             STATUS                PORTS               NAMES
8058fa79bb1c        microsoft/windowsservercore   "powershell sleep 30"   16 seconds ago      Removal In Progress       
8000
                test

C:\>docker rm -f 8058
Error response from daemon: container 8058fa79bb1c1b580b32bc2e4e27f22090a27fd8cef17aaa31ae02342982e7d3: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\8058fa79bb1c1b580b32bc2e4e27f22090a27fd8cef17aaa31ae02342982e7d3 C:\control\windowsfilter\8058fa79bb1c1b580b32bc2e4e27f22090a27fd8cef17aaa31ae02342982e7d3-removing: Access is denied.

C:\>dir control\windowsfilter
 Volume in drive C has no label.
 Volume Serial Number is 2E9A-473A

 Directory of C:\control\windowsfilter

08/23/2018  04:16 PM    <DIR>          .
08/23/2018  04:16 PM    <DIR>          ..
08/23/2018  04:16 PM    <DIR>          8058fa79bb1c1b580b32bc2e4e27f22090a27fd8cef17aaa31ae02342982e7d3
08/23/2018  09:04 AM    <DIR>          efa7700f8e4b54cdb068d0c08d600b483846091611cf913f3ba411525207bf93
               0 File(s)              0 bytes
               4 Dir(s)  117,058,809,856 bytes free

C:\>

After:

C:\>start dockerd -g c:\control

C:\>docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
01779097b29c3d0b04373c6f1b929f79ccce72a6f43689ade7e9f4cce9a977cd

C:\>taskkill /T /IM dockerd.exe
SUCCESS: Sent termination signal to process with PID 6172, child of PID 27248.
SUCCESS: Sent termination signal to process with PID 27248, child of PID 29336.

C:\>start dockerd -g c:\control

C:\>docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

C:\>dir control\windowsfilter
 Volume in drive C has no label.
 Volume Serial Number is 2E9A-473A

 Directory of C:\control\windowsfilter

08/23/2018  04:14 PM    <DIR>          .
08/23/2018  04:14 PM    <DIR>          ..
08/23/2018  09:04 AM    <DIR>          efa7700f8e4b54cdb068d0c08d600b483846091611cf913f3ba411525207bf93
               0 File(s)              0 bytes
               3 Dir(s)  117,096,996,864 bytes free

C:\>

@lowenna
Copy link
Member Author
lowenna commented Aug 23, 2018

Note this might not (won't) pass vendoring until #37674 is merged.

@lowenna
Copy link
Member Author
lowenna commented Aug 23, 2018

@johnstep PTAL. @jterry75 As mentioned offline, FYI.

@lowenna
Copy link
Member Author
lowenna commented Aug 23, 2018

@carlfischer1. Another backport candidate.

// before retrying the rename.
if os.IsPermission(err) {
f := filepath.Join(layerPath, "sandbox.vhdx")
if _, exists := os.Stat(f); exists == nil {
Copy link
Member
@johnstep johnstep Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of strange to call the error exists, since the true case is nil, but okay, I guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree this is confusing to read. Does can we just call DetachVhd directly instead of checking existence first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update to address both of these.

return err // the original error from rename
}
}
err = os.Rename(layerPath, tmpLayerPath)
Copy link
Member
@johnstep johnstep Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displa 8000 yed to describe this comment to others. Learn more.

Is it really worth retrying if nothing changed (the sandbox file did not exist)?

@codecov
Copy link
codecov bot commented Aug 24, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@7aa797f). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #37712   +/-   ##
=========================================
  Coverage          ?   36.02%           
=========================================
  Files             ?      610           
  Lines             ?    45071           
  Branches          ?        0           
=========================================
  Hits              ?    16238           
  Misses            ?    26603           
  Partials          ?     2230

@lowenna
Copy link
Member Author
lowenna commented Aug 24, 2018

@cpuguy83 @johnstep Updated. Still requires a rebase after #37674 is merged before it will fully pass CI

@lowenna
Copy link
Member Author
lowenna commented Aug 27, 2018

Rebased. Windows CI and vendor passed (the only path affected here). Rest are completing.

@cpuguy83 are you OK with this now?

@thaJeztah Could you take a look too please?

Copy link
Member
@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some additional context to the error messages?

if err = os.Rename(layerPath, tmpLayerPath); err != nil && !os.IsNotExist(err) {
return err
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but a little cleaner if we return early when "!os.IsPermission(err)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, missed that comment @cpuguy83. Yes, that's a lot cleaner. Updated.

Signed-off-by: John Howard <jhoward@microsoft.com>
@cpuguy83
Copy link
Member
cpuguy83 commented Sep 6, 2018

Can you just add some extra context to the new errors, e.g. errors.Wrap(err, "failed to detach VHD")

@lowenna
Copy link
Member Author
lowenna commented Sep 6, 2018

Can you just add some extra context to the new errors, e.g. errors.Wrap(err, "failed to detach VHD")

@cpuguy83 Done.

Copy link
Member
@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -6,7 +6,7 @@ import (
"bufio"
"bytes"
"encoding/json"
"errors"
//"errors"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh. Fixing....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now. Good catch 👍

Signed-off-by: John Howard <jhoward@microsoft.com>

This is a fix for a few related scenarios where it's impossible to remove layers or containers
until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill
while a container is running.

Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and
left on disk under the dataroot\windowsfilter directory after the container is removed.

One such example of a failure:

1. run a long running container with the --rm flag
docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30
2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle.
3. (re-)Start daemon
4. docker ps -a
PS C:\control> docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                PORTS               NAMES
7aff773d782b        malloc              "powershell start-sl…"   11 seconds ago      Removal In Progress                       malloc
5. Try to remove
PS C:\control> docker rm 7aff
Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied.
PS C:\control>

Step 5 fails.
@lowenna
Copy link
Member Author
lowenna commented Nov 9, 2018

@thaJeztah Do you know if this will make it into 18.09 EE? Not sure what the cut-off was.

@thaJeztah
Copy link
Member

@jhowardmsft looks like it didn't (but will double check when I'm back at my keyboard), I can see I opened a backport request for it, but it slipped of my radar.

I think a patch release is scheduled, so I'll open backports where needed, so that it will be included in that release

@lowenna
Copy link
Member Author
lowenna commented Nov 9, 2018

@thaJeztah Thanks. Kind of an important one as I know it fixes multiple production issues I've had to debug.

@drnybble
Copy link

Still seeing this with 19.03.2 on a clean Windows 2016 VM created from ISO image and fully patched.

driver "windowsfilter" failed to remove root filesystem: hcsshim::GetComputeSystems: Access is denied.

@thaJeztah
Copy link
Member

@drnybble that error message looks slightly different than the one mentioned here; perhaps it's a different issue; could you open a new ticket? (and include a link to this PR and the related issue #36218 to make it easier to find)?

@Bernolt
Copy link
Bernolt commented Nov 20, 2019

We had the same issue as @drnybble. (We used Unlocker Windows to delete the files)

@leiwang008
DA90
Copy link
leiwang008 commented Mar 24, 2020

I am still facing this problem. I use the "Docker Desktop" in Windows 10 OS, I built image with "mcr.microsoft.com/windows/servercore:1709", during the build I got the following message

**Error removing intermediate container 720292babe10: container 720292babe105f947428d0ca5fd02533277fc8788f9303e0752ee3eadf2b4439: driver "windowsfilter" failed to remove root filesystem: failed to detach VHD: The device is not ready.: rename C:\ProgramData\Docker\windowsfilter\720292babe105f947428d0ca5fd02533277fc8788f9303e0752ee3eadf2b4439 C:\ProgramData\Docker\windowsfilter\720292babe105f947428d0ca5fd02533277fc8788f9303e0752ee3eadf2b4439-removing: Access is denied.**

I could not remove the left-behind containers with command "docker container prune --force", after I restarted my computer, I could use that command to delete the left-behind containers.

Below is the information of "Docker" that I am using

Client: Docker Engine - Community
 Version:           19.03.5
 API version:       1.40
 Go version:        go1.12.12
 Git commit:        633a0ea
 Built:             Wed Nov 13 07:22:37 2019
 OS/Arch:           windows/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.5
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.12.12
  Git commit:       633a0ea
  Built:            Wed Nov 13 07:36:50 2019
  OS/Arch:          windows/amd64
  Experimental:     true

@aries1980
Copy link

I have the same issue as #37712 (comment) .

PS C:\Windows\system32> [System.Environment]::OSVersion.Version

Major  Minor  Build  Revision
-----  -----  -----  --------
10     0      17763  0

aka /aws/service/ami-windows-latest/Windows_Server-2019-English-Core-ECS_Optimized

PS C:\Windows\system32> docker version
Client: Mirantis Container Runtime
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        0c38b2d
 Built:             11/12/2020 21:33:08
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Mirantis Container Runtime
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.13.15
  Git commit:       5b5efb2d49
  Built:            11/12/2020 21:31:44
  OS/Arch:          windows/amd64
  Experimental:     false

running the jenkins/inbound-agent:windowsservercore-ltsc2019 container image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to Remove Containers: driver "windowsfilter" failed to remove root filesystem
0