8000 Starting container fails with 'System error: read parent: connection reset by peer' · Issue #14203 · moby/moby · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting container fails with 'System error: read parent: connection reset by peer' #14203

Closed
mfrister opened this issue Jun 26, 2015 · 90 comments · Fixed by #19751
Closed

Starting container fails with 'System error: read parent: connection reset by peer' #14203

mfrister opened this issue Jun 26, 2015 · 90 comments · Fixed by #19751
Assignees
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. priority/P2 Normal priority: default priority applied.
Milestone

Comments

@mfrister
Copy link

On our CI server, we run tests in Docker containers using docker-compose. We link 2-15 containers during one run. We ensure that test jobs running concurrently have different docker-compose project names.

Since we upgraded to Docker 1.7.0 (from 1.6), docker-compose to 1.3.1 (from 1.2), and started killing containers instead of stopping them (faster, remove them anyway), we twice had containers failing to start with the following message from compose:

Creating x_db_1...
Creating x_1...
Cannot start container 10bbc5af8ec0d3bb39b207a6474ec70a0954bff01ff94389684a8b9f52df6067: [8] System error: read parent: connection reset by peer

/var/log/docker.log contains the following:

time="2015-06-25T10:29:44.322521665+02:00" level=info msg="POST /v1.18/containers/19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2/start" 
time="2015-06-25T10:29:44.690044235+02:00" level=warning msg="signal: killed" 
time="2015-06-25T10:29:44.915997839+02:00" level=error msg="Handler for POST /containers/{name:.*}/start returned error: Cannot start container 19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2: [8] System error: read parent: connection reset by peer" 
time="2015-06-25T10:29:44.916111471+02:00" level=error msg="HTTP Error" err="Cannot start container 19aec1ddb8a5cd771771f16a1f8929bb58eea2cf7e877425a7812f6c6e5756a2: [8] System error: read parent: connection reset by peer" statusCode=500 

The container is created, but doesn't start. Trying to manually start it using docker start fails with the same error message. Memory is available and the kernel log doesn't show any message from the OOM killer.

Restarting docker temporarily solves the problem, so I assume this is a problem with docker itself, not with docker-compose.

docker version:

Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64

docker info:

Containers: 30
Images: 451
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 517
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-0.bpo.4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
CPUs: 4
Total Memory: 15.52 GiB
Name: <hostname>
ID: HYYT:WNZW:UPU7:VI2O:HUTP:EZVV:2MQ2:WCRJ:3SHJ:LZXF:MVLS:P3XC
WARNING: No memory limit support
WARNING: No swap limit support

uname -a:

<hostname> 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1~bpo70+1 (2015-04-27) x86_64 GNU/Linux

Environment details (AWS, VirtualBox, physical, etc.): Physical machine

Steps to Reproduce:

  1. docker-compose run ..., kill and rm 50-200 (estimated) containers with links using docker-compose
  2. At some point, docker fails to start a container.

Actual Results: Starting the container fails.

Expected Results: Container starts.

Additional info:

Kubernetes seemed to have a similar problem, apparently some sort of race. The issue also links to a few occurences where people had a similar problem (mainly on IRC).

@mfrister mfrister changed the title Starting container fails with System error: read parent: connection reset by peer Starting container fails with 'System error: read parent: connection reset by peer' Jun 26, 2015
eremite added a commit to eremite/docker_rails_app that referenced this issue Jun 26, 2015
In the hopes that it'll prevent "System error: read parent: connection
reset by peer" errors. See moby/moby#14203
@cjcullen
Copy link
cjcullen commented Jul 6, 2015

I've included my repro instructions here: kubernetes/kubernetes#9822 (comment)

Unfortunately, I don't know how to get docker into the magic state where this is reproducible.

@dchen1107
Copy link
Contributor

Please note that the issue we found with kubernetes is with 1.6.2 docker release.

@shapiroj
Copy link

We see this exact error in several of our containers. Our only workaround is to rename the container. Interested in hearing other workarounds or solutions.

@njuicsgz
Copy link

Any advice for this bug? I suffered it for a long time with docker v1.6.2 in our production environment.

@cpuguy83
Copy link
Member

Seems like an error connecting to sqlite, or rather while reading it.

@jjelev
Copy link
jjelev commented Aug 4, 2015

Docker 1.7.1 and Ubuntu 14.04. Updated my environment through apt-get and restarted. Nginx container no longer failed to start.

@mfrister
Copy link
Author
mfrister commented Aug 4, 2015

@jjelev Yep, as stated in the original description, restarting Docker temporarily fixes the problem. Unfortunately, the problem reappears later.

@airhorns
Copy link
airhorns commented Aug 4, 2015

We're seeing this too, will try and dig up some more information. We're simultaneously executing a lot of the same container, might have something to do with it? Most executions work fine but this has just started happening.

@airhorns
Copy link
airhorns commented Aug 4, 2015

Wow, as has been seen by @cjcullen , the length of the docker run command seems to have something to do with this. I changed the length of one of the -e vars and my deterministic container launch failure went away.

@emslade
Copy link
emslade commented Aug 5, 2015

I've been having the same issue. Added an extra space to the failing command and it worked. So odd.

@rflynn
Copy link
rflynn commented Aug 5, 2015

We've hit this bug twice this morning after never seeing it in months of Docker use. Re: string length, we have made tweaks to our command recently.

@mfrister
Copy link
Author

I didn't expect the workaround to work, but after adding a few spaces in an environment variable, we now had a full week without the error occurring. Previously, it occurred almost daily.

We added this environment variable to all our compose containers:

DOCKER_FIX: '                                        '

@andrecp
Copy link
andrecp commented Aug 20, 2015

I am having this same problem and I same docker version as OP

@andrecp
Copy link
andrecp commented Aug 23, 2015

I also had to add

        - DOCKER_FIX='                                        '

to all my dockerbuild files for this error to go away...

@chrisjhoughton
Copy link

+1 one for the weird DOCKER_FIX!

@airhorns
Copy link

@burke or @sirupsen do you guys have any ideas on this one?

@burke
Copy link
Contributor
burke commented Sep 16, 2015

Nope, this is weird, I can't imagine what it would be.

@AlbertodelaCruz
Copy link

+1 adding a FOO env variable. Docker version 1.8.2 and docker-compose 1.4.0.
Curiously, not all hosts suffer this behaviour and need the variable.

alapidas added a commit to control-center/serviced that referenced this issue Oct 1, 2015
alapidas added a commit to control-center/serviced that referenced this issue Oct 1, 2015
This is a fix for moby/moby#14203

(cherry picked from commit 491e35b)
@haoyangz
Copy link
haoyangz commented Oct 5, 2015

@meeee @andrecp @AlbertodelaCruz @chrisjhoughton Just to clarify, by adding a FOO env variable do you mean adding a line like the following to the Dockerfile?

ENV DOCKER_FIX randomvalue

I tried this but it doesn't work for me...

@rachit1arora
Copy link

hello , i have also encountered this problem in docker version 1.9.1 .
I am encountering this in our production environment when we do a
docker exec and not during the docker run .

I know that the fix is avilable in docker 1.10 but is there a way to get the fix in docker 1.9.1 ? We may not be able to migrate to docker 1.10 soon .

Is there a work around we can try in docker exec command ? Many people reported that docker run -e DOCKER_FIX='' worked for them .
How Can we resolve this in docker exec command ?

@extemporalgenome
Copy link

You could try passing in a longer command line, for example:

docker exec the-container sh -c "intended-command args # extra padding junk
in comment"

On Thu, Mar 24, 2016, 1:46 AM rachit1arora notifications@github.com wrote:

hello , i have also encountered this problem in docker version 1.9.1 .
I am encountering this in our production environment when we do a
docker exec and not during the docker run .

I know that the fix is avilable in docker 1.10 but is there a way to get
the fix in docker 1.9.1 ? We may not be able to migrate to docker 1.10 soon
.

Is there a work around we can try in docker exec command ? Many people
reported that docker run -e DOCKER_FIX='' worked for them .
How Can we resolve this in docker exec command ?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#14203 (comment)

yujuhong added a commit to yujuhong/kubernetes that referenced this issue Apr 26, 2016
k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Apr 27, 2016
Automatic merge from submit-queue

e2e: add a dummy environment variable in the service tests

This works around the docker bug:
moby/moby#14203
yujuhong added a commit to yujuhong/kubernetes that referenced this issue Apr 27, 2016
chrislovecnm pushed a commit to chrislovecnm/kubernetes that referenced this issue Apr 28, 2016
alena1108 pushed a commit to rancher/kubernetes that referenced this issue May 20, 2016
@loretoparisi
Copy link

I get the same error with this DockerFile

FROM ubuntu:16.04
COPY . /app
VOLUME /app

I was doing a long running task using tensorflow with

./nvidia-docker-run --volumes-from myImage --rm -it tensorflow/tensorflow:0.10.0-gpu bash

6D40

@vielmetti
Copy link

I'm getting the error

System error: json: cannot unmarshal object into Go value of type libcontainer.syncType.

on CoreOS. docker -v reports Docker version 1.10.3, build 1f8f545

@thaJeztah
Copy link
Member

@vielmetti looks like possibly the JSON of one of your containers got corrupted, might be worth trying to find which one and either remove that container, or try to fix the JSON. Also keep in mind that CoreOS ships with a modified version of Docker (see coreos@1f8f545 for the commit it's built from), and issues should be reported in their issue tracker first

@vielmetti
Copy link

Thanks @thaJeztah , I opened a CoreOS issue which appears to be unrelated to this particular issue (the error text is the same but the reproduction is different).

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this issue Dec 1, 2016
shouhong pushed a commit to shouhong/kubernetes that referenced this issue Feb 14, 2017
@shawntoxu
Copy link

Which version of this bug can be resolved ????

@thaJeztah
Copy link
Member

@shawntoxu see the milestone attached to this issue; it's in docker 1.10.0

@shawntoxu
Copy link
shawntoxu commented Jun 29, 2017 via email

@jfdoerre
Copy link
jfdoerre commented Jul 6, 2017

We are still seeing this sporadic error "... [9] System error: read parent: connection reset by peer" with docker 1.10.3. Maybe it is, because we are on CentOS7.2?

It seems the fix of this error requires that you are using opencontainers/runc, but I guess we don't use that in our setup.
What we are using are the docker rpms from the CentOS repository:
`# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

uname -a

Linux ilgnext-jenkins.svl.ibm.com 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

docker -v

Docker version 1.10.3, build 9419b24-unsupported`

Any chances that this error is fixed in a more recent version for CentOS, e.g. in CentOS7.3?

Or else, is there a solid way to test, if on my upgraded system the error does no longer appear.

@thaJeztah
Copy link
Member

@jfdoerre docker 1.10 is no longer maintained, and the version in the CentOS repository is the Red Hat fork of docker, for which the code doesn't live in this repository.

Any chances that this error is fixed in a more recent version for CentOS, e.g. in CentOS7.3?

Be aware that CentOS is a rolling release, which means that when 7.3 was released, 7.2 no longer receives updates, so it's indeed recommended to be on the current version

If you're still seeing this on the current (17.03 or 17.06) release of the official Docker packages, please open a new issue.

@thaJeztah
Copy link
Member

I'm locking the conversation on this issue, because the original issue was resolved; if you encounter this issue on an up to date version of docker, please open a new issue instead.

@moby moby locked and limited conversation to collaborators Jul 6, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. priority/P2 Normal priority: default priority applied.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

0