8000 TTL scan task will leak when down scaling the scan workers if it failed to cancel the task. · Issue #57708 · pingcap/tidb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTL scan task will leak when down scaling the scan workers if it failed to cancel the task. #57708

Closed
YangKeao opened this issue Nov 26, 2024 · 0 comments · Fixed by #57718
Closed
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. impact/leak severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@YangKeao
Copy link
Member

Bug Report

Please answer these questions before submitting your issue. Thanks!

If the scan worker failed to cancel the task, the task.result will always be nil, and it'll never be regarded as finished. See the following function:

func (m *taskManager) resizeScanWorkers(count int) error {
	var err error
	var canceledWorkers []worker
	m.scanWorkers, canceledWorkers, err = m.resizeWorkers(m.scanWorkers, count, func() worker {
		return newScanWorker(m.delCh, m.notifyStateCh, m.sessPool)
	})
	for _, w := range canceledWorkers {
		...

		result := s.PollTaskResult()
		if result != nil {
			jobID = result.task.JobID
			scanID = result.task.ScanID

			scanErr = result.err
		} else {
			// if the scan worker failed to poll the task, it's possible that the `WaitStopped` has timeout
			// we still consider the scan task as finished
			curTask := s.CurrentTask()
			if curTask == nil {
				continue
			}
			jobID = curTask.JobID
			scanID = curTask.ScanID
			scanErr = errors.New("timeout to cancel scan task")
		}

...

		task.result = result
	}
	return err
}

It's easy to fix. We need to assign a result to the scan task.

@YangKeao YangKeao added type/bug The issue is confirmed as a bug. sig/sql-infra SIG: SQL Infra severity/moderate affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. labels Nov 26, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in 0215550 Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. impact/leak severity/moderate sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0