8000 bugfix:close container io when runtime create failed by ningmingxiao · Pull Request #11885 · containerd/containerd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

bugfix:close container io when runtime create failed #11885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 22, 2025

Conversation

ningmingxiao
Copy link
Contributor
@ningmingxiao ningmingxiao commented May 22, 2025

@k8s-ci-robot
Copy link

Hi @ningmingxiao. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ningmingxiao
Copy link
Contributor Author
ningmingxiao commented May 24, 2025

how it happened. @apostasie @AkihiroSuda
there are 2 nerdctl processes hanging.(old container exited and because of restart=always and container will be recreate)

[root@host]# nerdctl run -d --name testbug --restart always --network foo busybox:1.28
FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running createRuntime hook #0: exit status 1, stdout: , stderr: time="2025-05-24T20:36:47+08:00" level=fatal msg="failed to call cni.Setup: plugin type=\"ipvlan\" failed (add): error dialing DHCP daemon: dial unix /run/cni/dhcp.sock: connect: no such file or directory

root      101602  0.3  0.0 1253360 26012 pts/1   Sl   20:41   0:00 /usr/local/bin/nerdctl _NERDCTL_INTERNAL_LOGGING /var/lib/nerdctl/1935db59
root      101714  0.0  0.0 1232680 11132 pts/1   Sl   20:41   0:00 /usr/local/bin/containerd-shim-runc-v2 -namespace default -id 64b95ba09dcd8e519
root      101727  0.0  0.0 1251760 23796 pts/1   Sl   20:41   0:00  \_ /usr/local/bin/nerdctl _NERDCTL_INTERNAL_LOGGING /var/lib/nerdctl/1935db59

goroutine 5 [syscall]:
syscall.Syscall(0x49, 0x9, 0x2, 0x0)
/usr/local/go/src/syscall/syscall_linux.go:73 +0x25
golang.org/x/sys/unix.Flock(0xc00019fcf8?, 0x50021e?)
/root/go/pkg/mod/golang.org/x/sys@v0.33.0/unix/zsyscall_linux.go:821 +0x32
github.com/containerd/nerdctl/v2/pkg/internal/filesystem.flock(0xc0000a4280?, 0x2)
/home/nmx/github.com/ningmingxiao/nerdctl/pkg/internal/filesystem/lockutil_unix.go:51 +0x65
github.com/containerd/nerdctl/v2/pkg/internal/filesystem.WithDirLock({0xc0000a4280, 0x79}, 0xc00019fed8)
/home/nmx/github.com/ningmingxiao/nerdctl/pkg/internal/filesystem/lockutil_unix.go:37 +0xa8
github.com/containerd/nerdctl/v2/pkg/logging.Main.loggerFunc.func1({0x157a9c0, 0xc00048e000}, 0xc000494000, 0xc000046040)
/home/nmx/github.com/ningmingxiao/nerdctl/pkg/logging/logging.go:317 +0x38d
github.com/containerd/containerd/v2/core/runtime/v2/logging.Run.func1()
/root/go/pkg/mod/github.com/containerd/containerd/v2@v2.1.0/core/runtime/v2/logging/logging_unix.go:49 +0x9b
created by github.com/containerd/containerd/v2/core/runtime/v2/logging.Run in goroutine 1
/root/go/pkg/mod/github.com/containerd/containerd/v2@v2.1.0/core/runtime/v2/logging/logging_unix.go:48 +0x298

new nerdctl ( pid 101727 )is hanging at flock to wait the lock to be release(old nerdctl 101602 hold the lock)

func WithDirLock(dir string, fn func() error) error {
    _ = os.MkdirAll(dir, 0700)
    dirFile, err := os.Open(dir)
    if err != nil {
        return err
    }
    defer dirFile.Close()

   //new nerdctl  (pid 101727)hanging at  here
    if err := flock(dirFile, unix.LOCK_EX); err != nil {
        return fmt.Errorf("failed to lock %q: %w", dir, err)
    }
    defer func() {

        if err := flock(dirFile, unix.LOCK_UN); err != nil {
            log.L.WithError(err).Errorf("failed to unlock %q", dir)
        }
    }()
    return fn()
}

old nerdctl (pid 101602 )hang at

goroutine 23 [select]:
io.(*pipe).read(0xc000480540, {0xc000568000, 0x1000, 0x1e9caa0?})
/usr/local/go/src/io/pipe.go:57 +0xa5
io.(*PipeReader).Read(0x7f4e4e2e3a68?, {0xc000568000?, 0x7f4e07752aa8?, 0x1000?})
/usr/local/go/src/io/pipe.go:134 +0x1a
bufio.(*Reader).fill(0xc000124f00)
/usr/local/go/src/bufio/bufio.go:110 +0x103
bufio.(*Reader).ReadSlice(0xc000124f00, 0xa)
/usr/local/go/src/bufio/bufio.go:376 +0x29
bufio.(*Reader).collectFragments(0xc000124f00, 0xa)
/usr/local/go/src/bufio/bufio.go:451 +0x70
bufio.(*Reader).ReadString(0x0?, 0x0?)
/usr/local/go/src/bufio/bufio.go:498 +0x1f
github.com/containerd/nerdctl/v2/pkg/logging.loggingProcessAdapter.func3({0x156ca40, 0xc000480540?}, 0xc000490150)
/home/nmx/github.com/ningmingxiao/nerdctl/pkg/logging/logging.go:256 +0x1ff
created by github.com/containerd/nerdctl/v2/pkg/logging.loggingProcessAdapter in goroutine 18
/home/nmx/github.com/ningmingxiao/nerdctl/pkg/logging/logging.go:267 +0x5c6

func loggingProcessAdapter(ctx context.Context, driver Driver, dataStore, address string, getContainerWait ContainerWaitFunc, config *logging.Config) error {
....
    processLogFunc := func(reader io.Reader, dataChan chan string) {
        defer wg.Done()
        defer close(dataChan)
        r := bufio.NewReader(reader)

        var err error

        for err == nil {
            var s string

           // old nerdctl (pid 101602 )hanging at here  to read old container log
            s, err = r.ReadString('\n')
            if len(s) > 0 {
                dataChan <- s
            }

            if err != nil && err != io.EOF {
                log.L.WithError(err).Error("failed to read log")
            }
        }
    }

old nerdctl (pid 101602 ) is waiting the shim to close the io , but shim forgot to close it when runtime create failed.

@ningmingxiao
Copy link
Contributor Author
ningmingxiao commented May 27, 2025

can you review this pr? thank you @fuweid @samuelkarp @mikebrow

@ningmingxiao ningmingxiao force-pushed the fix_nerdctl_unknown branch from 8457b25 to 353831b Compare June 14, 2025 16:50
@ningmingxiao ningmingxiao force-pushed the fix_nerdctl_unknown branch 2 times, most recently from 6ba97f6 to 56b10ce Compare June 14, 2025 16:57
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
@ningmingxiao ningmingxiao force-pushed the fix_nerdctl_unknown branch from 56b10ce to e6708bd Compare June 15, 2025 03:35
@ningmingxiao
Copy link
Contributor Author

ping @fuweid @AkihiroSuda

@dmcgowan dmcgowan added the cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch label Jun 19, 2025
@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Jun 19, 2025
@AkihiroSuda AkihiroSuda added this pull request to the merge queue Jun 22, 2025
@AkihiroSuda
Copy link
Member

/ok-to-test

Merged via the queue into containerd:main with commit 9300d03 Jun 22, 2025
87 of 90 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Jun 22, 2025
@AkihiroSuda
Copy link
Member

/cherry-pick release/2.1

@k8s-infra-cherrypick-robot

@AkihiroSuda: new pull request created: #12009

In response to this:

/cherry-pick release/2.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ningmingxiao ningmingxiao deleted the fix_nerdctl_unknown branch June 23, 2025 07:53
@austinvazquez austinvazquez added cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch and removed cherry-pick/2.1.x Change to be cherry picked to release/2.1 branch labels Jul 1, 2025
@austinvazquez
Copy link
Member

/cherry-pick release/2.0

@k8s-infra-cherrypick-robot

@austinvazquez: new pull request created: #12051

In response to this:

/cherry-pick release/2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime Runtime cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch cherry-picked/2.1.x PR commits are cherry picked into the release/2.1 branch kind/bug ok-to-test size/M
Projects
Development

Successfully merging this pull request may close these issues.

nerdctl rm -f does not work with containers in unknown state
8 participants
0