[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Admin alter ddl jobs behavior is not expected #57697

Open
fzzf678 opened this issue Nov 26, 2024 · 2 comments
Open

Admin alter ddl jobs behavior is not expected #57697

fzzf678 opened this issue Nov 26, 2024 · 2 comments
Labels
component/ddl This issue is related to DDL of TiDB. type/enhancement The issue or PR belongs to an enhancement.

Comments

@fzzf678
Copy link
Contributor
fzzf678 commented Nov 26, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Use sysbench prepare table with 10000000 rows, create index and alter the thread.
The global system var value:

mysql> select @@global.tidb_ddl_enable_fast_reorg, @@global.tidb_enable_dist_task, @@global.tidb_ddl_reorg_worker_cnt, @@global.tidb_ddl_reorg_batch_size, @@global.tidb_ddl_reorg_max_write_speed\G
*************************** 1. row ***************************
    @@global.tidb_ddl_enable_fast_reorg: 1
         @@global.tidb_enable_dist_task: 0
     @@global.tidb_ddl_reorg_worker_cnt: 1
     @@global.tidb_ddl_reorg_batch_size: 32
@@global.tidb_ddl_reorg_max_write_speed: 0
1 row in set (0.00 sec)
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> alter table sbtest1 add index(c);
Query OK, 0 rows affected (43.47 sec)

In another a session get the job id and alter the thread

 admin alter ddl jobs 137 thread=4;

2. What did you expect to see? (Required)

The ddl jobs finished success and alter success.

3. What did you see instead (Required)

The ddl jobs finished success and alter success, but the there are error messages, the job took two attempts to succeed.

[2024/11/26 00:32:22.332 +08:00] [INFO] [index.go:1110] ["run add index job done"] [category=ddl] [charset=utf8mb4] [collation=utf8mb4_general_ci]
[2024/11/26 00:32:22.334 +08:00] [WARN] [session.go:819] ["can not retry txn"] [label=internal] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"] [IsBatchInsert=false] [IsPessimistic=false] [InRestrictedSQL=true] [tidb_retry_limit=10] [tidb_disable_txn_auto_retry=true]
[2024/11/26 00:32:22.334 +08:00] [WARN] [session.go:835] ["commit failed"] ["finished txn"="Txn{state=invalid}"] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"]
[2024/11/26 00:32:22.334 +08:00] [INFO] [job_scheduler.go:587] ["handle ddl job failed"] [category=ddl] [jobID=119] [conn=3271557130] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"] [job="ID:119, Type:add index, State:done, SchemaState:public, SchemaID:2, TableID:110, RowCount:1000000, ArgLen:1, start time: 2024-11-26 00:31:54.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454178193861246994, Version: v2, UniqueWarnings:0"]
[2024/11/26 00:32:22.335 +08:00] [INFO] [job_scheduler.go:497] ["run job failed"] [category=ddl] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"] [job="ID:119, Type:add index, State:done, SchemaState:public, SchemaID:2, TableID:110, RowCount:1000000, ArgLen:1, start time: 2024-11-26 00:31:54.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454178193861246994, Version: v2, UniqueWarnings:0"]
[2024/11/26 00:32:22.335 +08:00] [INFO] [job_worker.go:784] ["run DDL job"] [category=ddl] [jobID=119] [conn=3271557130] [job="ID:119, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:0, start time: 2024-11-26 00:31:54.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454178193861246994, Version: v2, UniqueWarnings:0"]

4. What is your TiDB version? (Required)

master

@fzzf678 fzzf678 added type/bug The issue is confirmed as a bug. component/ddl This issue is related to DDL of TiDB. affects-8.5 This bug affects the 8.5.x(LTS) versions. type/enhancement The issue or PR belongs to an enhancement. and removed affects-8.5 This bug affects the 8.5.x(LTS) versions. type/bug The issue is confirmed as a bug. labels Nov 26, 2024
@fzzf678
Copy link
Contributor Author
fzzf678 commented Nov 26, 2024

This behavior is expected, because the admin alter ddl jobs statement is taking effort by modifying the job meta in tidb_ddl_job table, this can cause write conflict with the ddl job itself, but we already have the retry mechanism and checkpoint to solve the write conflict. After the write conflict error was thrown, the retry log will like below, these means the job will be retry from checkpoint.

2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[job_worker.go:784] ["run DDL job"] [category=ddl] [jobID=153] [conn=1277165660] [job="ID:153, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:0, start time: 2024-11-26 01:52:48.312 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:454191734682288156, Version: v2, UniqueWarnings:0"]
2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[index.go:1322] ["index backfill state running"] [category=ddl] ["job ID"=153] [table=sbtest1] ["ingest mode"=true] [index=c]
2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[job_scheduler.go:722] ["resume physical table ID from checkpoint"] [category=ddl-ingest] [jobID=153] ["global sync key"=74800000000000006e5f728000000005feda0800] ["checkpoint physical ID"=110]
2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[index.go:2352] ["start to add table index"] [category=ddl] [job="ID:153, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:1, start time: 2024-11-26 01:52:48.312 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:454191734682288156, Version: v2, UniqueWarnings:0"] [reorgInfo="CurrElementType:_idx_,CurrElementID:15,StartKey:74800000000000006e5f728000000005feda0800,EndKey:74800000000000006e5f728000000005feda0800,First:false,PhysicalTableID:110,Ingest mode:false"]

or

[2024/11/26 14:14:02.657 +08:00] [INFO] [index.go:2352] ["start to add table index"] [category=ddl] [job="ID:129, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:1, start time: 2024-11-26 14:13:10.919 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454191111475822599, Version: v2, UniqueWarnings:0"] [reorgInfo="CurrElementType:_idx_,CurrElementID:10,StartKey:74800000000000006e5f7280000000000e62f700,EndKey:74800000000000006e5f7280000000000f589f00,First:false,PhysicalTableID:110,Ingest mode:false"]
[2024/11/26 14:14:02.659 +08:00] [INFO] [backfilling.go:487] ["load table ranges from PD"] [category=ddl] [physicalTableID=110] ["start key"=74800000000000006e5f7280000000000e62f700] ["end key"=74800000000000006e5f7280000000000f589f00]
[2024/11/26 14:14:02.659 +08:00] [INFO] [backfilling.go:514] ["load table ranges from PD done"] [category=ddl] [physicalTableID=110] ["range start"=74800000000000006e5f7280000000000e62f700] ["range end"=74800000000000006e5f7280000000000f589f00] ["range count"=3]
[2024/11/26 14:14:02.659 +08:00] [INFO] [backfilling.go:1023] ["start backfill workers to reorg record"] [category=ddl] [type="add index"] [workerCnt=4] [regionCnt=3] [startKey=74800000000000006e5f7280000000000e62f700] [endKey=74800000000000006e5f7280000000000f589f00]

@fzzf678
Copy link
Contributor Author
fzzf678 commented Nov 26, 2024

May be the write conflict problem can be solved to avoid retry, Convert this to an enhancement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/ddl This issue is related to DDL of TiDB. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

1 participant