-
Notifications
You must be signed in to change notification settings - Fork 18.8k
[DO NOT MERGE] daemon.ContainerLogs(): fix --follow #37656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e2df134
to
4acbb67
Compare
@@ -95,7 +95,7 @@ github.com/philhofer/fwd 98c11a7a6ec829d672b03833c3d69a7fae1ca972 | |||
github.com/tinylib/msgp 3b556c64540842d4f82967be066a7f7fffc3adad | |||
|
|||
# fsnotify | |||
github.com/fsnotify/fsnotify v1.4.7 | |||
github.com/fsnotify/fsnotify c9e9bfb647855178ec5f3947c02e6bd47a379eb9 https://github.com/kolyshkin/fsnotify/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per fsnotify/fsnotify#183 (comment) and fsnotify/fsnotify#245 (comment) the project is looking for new maintainers; perhaps we can assist in that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start with a few PRs :)
4acbb67
to
b05dacb
Compare
Codecov Report
@@ Coverage Diff @@
## master #37656 +/- ##
=========================================
Coverage ? 36.04%
=========================================
Files ? 609
Lines ? 45018
Branches ? 0
=========================================
Hits ? 16225
Misses ? 26565
Partials ? 2228 |
cb3c07d
to
63244e8
Compare
Windows CI failure looks legit:
|
docker.py test case hang is also legit; made a couple of fixes to tests: docker/docker-py#2121; will retest. |
63244e8
to
57af395
Compare
z ci needs to be restarted |
Merge conflict needs to be resolved |
57af395
to
1bad926
Compare
Rebased; local log driver patched as well (to remove readers). |
This code has many return statements, for some of them the "end logs" or "end stream" message was not printed, giving the impression that this "for" loop never ended. Make sure that "begin logs" is to be followed by "end logs". Signed-off-by: Kir Kolyshkin <kolyshk 8000 in@gmail.com>
This ia a test case for issue moby#37391. The idea is to start a container that produces lots of output, then run "docker logs --follow" on the above container, than kill "docker logs" and check whether the number of - daemon goroutines, - daemon opened file descriptors are back to whatever they were. Currently, this test reliably detects the leak. PS here's what it takes to run the test case against the local daemon: for i in busybox busybox:glibc hello-world debian:jessie; do docker pull $i; done cd integration ln -s ../Dockerfile . cd container go test -v -run TestLogsFollowGoroutineLeak Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
To include fsnotify/fsnotify#260 (fix for fsnotify/fsnotify#194). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This commit kills three birds with one stone. I. Goroutine leak on docker logs --follow (moby#37391). When daemon.ContainerLogs() is called with options.follow=true (as in "docker logs --follow"), the "loggerutils.followLogs()" function never returns (even then the logs consumer is gone). As a result, all the resources associated with it (including an opened file descriptor for the log file being read, two FDs for a pipe, and two FDs for inotify watch) are never released. If this is repeated (such as by running "docker logs --follow" and pressing Ctrl-C a few times), this results in DoS caused by either hitting the limit of inotify watches, or the limit of opened files. The only cure is daemon restart. Apparently, what happens is: 1. logs producer (a container) is gone, calling (*LogWatcher).Close() for all its readers (daemon/logger/jsonfilelog/jsonfilelog.go:175). 2. WatchClose() is properly handled by a dedicated goroutine in followLogs(), cancelling the context. 3. Upon receiving the ctx.Done(), the code in followLogs() (daemon/logger/loggerutils/logfile.go#L626-L638) keeps to send messages _synchronously_ (which is OK for now). 4. Logs consumer is gone (Ctrl-C is pressed on a terminal running "docker logs --follow"). Method (*LogWatcher).Close() is properly called (see daemon/logs.go:114). Since it was called before and due to to once.Do(), nothing happens (which is kinda good, as otherwise it will panic on closing a closed channel). 5. A goroutine (see item 3 above) keeps sending log messages synchronously to the logWatcher.Msg channel. Since the channel reader is gone, the channge send operation blocks forever, and resource cleanup set up in defer statements at the beginning of followLogs() never happens. II. Premature exit of docker logs --follow (moby#37630). docker logs --follow should mimic tail -f, which only exits once the file it watches is gone. Same here: even if a container is stopped, docker logs --follow should not exit since the container can be restarted and continue producing logs. The two ways to exit "docker logs --follow" should be: 1. Kill it (as in Ctrl-C or kill). 2. Remove the container. III. Complicated logic of following logs. This is not an issue per se, but followLogs() is "pretty gnarly" (C) @cpuguy83. While we're at it, let's try to improve things a bit. Now onto the fix. 1. It appears to be that LogWatcher should not be bothered with log producer (i.e. container) being gone, so remove calls to `(*LogWatcher).Close()` from the log drivers, as well as the accompaniying data structures. This helps to solve issues II and III. 2. To clarify what `(*LogWatcher).Close()` actually is, rename it to `ConsumerGone()` (note before that patch it used to mean both `ProducerGone()` (which is eliminated in 1 above) and `ConsumerGone()`. Similarly, WatchClose() is now WatchConsumerGone(), etc. This is not required, but improves code readability a bit. 3. followLogs() is modified to - remove the context and the associated goroutine, which is no longer needed (fixes II and III); - remove blocking msg send (fixes I); - watch for Chmod event (which, if the file being watched is no more, means that the (opened) file is removed), exiting if the log file is removed (fixes II); - exit once ConsumerGone() is received, freeing all the resources (fixes I). 4. Test cases are modified due to II being fixed (i.e. now the container should be removed if we want "docker logs -f" to finish by itself). Should fix moby#37391 and moby#37630. [v2: fix conflicts after local driver merge, patch local driver] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
TestLogsFollowNonStop is a test case for moby#3763. It checsk that ContainerLogs(opts.Follow=true): - won't stop even if the container is stopped; - keep reading logs once the container is restarted; - only stops when the container is removed. TestLogsFollowStopped checks that ContainerLogs(opts.Follow=true) works fine (i.e. waits for more logs) for a container that is currently stopped. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This fixes "docker logs --follow" to not exit (but wait for more logs) in case the container is currently not running. 1. ContainerLogs() does not modify the value of "follow" option. 2. ReadLogs() calls followLogs() even if the writer is closed (so that "docker logs --follow" will not exit on stopped container). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
to include docker/docker-py#2121 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1bad926
to
865104b
Compare
rebased; should help CI |
As much as I like this PR (it is relatively simple and elegant, and removes a lot of code), I am afraid it won't work, for the following reasons:
With that said, I see no other way but to return to #37576 -- which is complicated and ugly, but backward-compatible and solves the initial issue (#37391). |
This is a continuation of #37576, much simplified and improved, also easier to review
This commit tires to kill three birds with one stone.
I. Goroutine leak on docker logs --follow (#37391).
When daemon.ContainerLogs() is called with options.follow=true
(as in "docker logs --follow"), the "loggerutils.followLogs()"
function never returns (even then the logs consumer is gone).
As a result, all the resources associated with it (including
an opened file descriptor for the log file being read, two FDs
for a pipe, and two FDs for inotify watch) are never released.
If this is repeated (such as by running "docker logs --follow"
and pressing Ctrl-C a few times), this results in DoS caused by
either hitting the limit of inotify watches, or the limit of
opened files. The only cure is daemon restart.
Apparently, what happens is:
logs producer (a container) is gone, calling (*LogWatcher).Close()
for all its readers (daemon/logger/jsonfilelog/jsonfilelog.go:175).
WatchClose() is properly handled by a dedicated goroutine in
followLogs(), cancelling the context.
Upon receiving the ctx.Done(), the code in followLogs()
(daemon/logger/loggerutils/logfile.go#L626-L638) keeps to
send messages synchronously (which is OK for now).
Logs consumer is gone (Ctrl-C is pressed on a terminal running
"docker logs --follow"). Method (*LogWatcher).Close() is properly
called (see daemon/logs.go:114). Since it was called before and
due to to once.Do(), nothing happens (which is kinda good, as
otherwise it will panic on closing a closed channel).
A goroutine (see item 3 above) keeps sending log messages
synchronously to the logWatcher.Msg channel. Since the
channel reader is gone, the channge send operation blocks forever,
and resource cleanup set up in defer statements at the beginning
of followLogs() never happens.
II. Premature exit of docker logs --follow (#37630).
docker logs --follow should mimic tail -f, which only exits once
the file it watches is gone. Same here: even if a container is
stopped, docker logs --follow should not exit since the container
can be restarted and continue producing logs. The two ways to
exit "docker logs --follow" should be:
Kill it (as in Ctrl-C or kill).
Remove the container.
III. Complicated logic of following logs.
This is not an issue per se, but followLogs() is "pretty gnarly"
(C) @cpuguy83. While we're at it, let's try to improve things a bit.
Now onto the fix.
It appears to be that LogWatcher should not be bothered with
log producer (i.e. container) being gone, so remove calls to
(*LogWatcher).Close()
from the log drivers, as well as theaccompaniying data structures. This helps to solve issues
II
andIII
.To clarify what
(*LogWatcher).Close()
actually is, rename itto
ConsumerGone()
(note before that patch it used to mean bothProducerGone()
(which is eliminated in 1 above) andConsumerGone()
.Similarly, WatchClose() is now WatchConsumerGone(), etc.
This is not required, but improves code readability a bit.
followLogs()
is modified to:needed (helps to fix
II
andIII
);I
);means that the (opened) file is removed), exiting if the log file
is removed (helps to fix
II
);resources (fixes
I
).Some existing test cases are modified due to
II
being fixed (i.e. now the containershould be removed if we want
docker logs -f
to finish by itself).A test case
TestLogsFollowGoroutineLeak
is added to check ifI
is fixed.ContainerLogs()
andReadLogs()
are modified to honor "follow" flageven if the container is not running.