Admin alter ddl jobs behavior is not expected #57697

fzzf678 · 2024-11-26T03:01:52Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Use sysbench prepare table with 10000000 rows, create index and alter the thread.
The global system var value:

mysql> select @@global.tidb_ddl_enable_fast_reorg, @@global.tidb_enable_dist_task, @@global.tidb_ddl_reorg_worker_cnt, @@global.tidb_ddl_reorg_batch_size, @@global.tidb_ddl_reorg_max_write_speed\G
*************************** 1. row ***************************
    @@global.tidb_ddl_enable_fast_reorg: 1
         @@global.tidb_enable_dist_task: 0
     @@global.tidb_ddl_reorg_worker_cnt: 1
     @@global.tidb_ddl_reorg_batch_size: 32
@@global.tidb_ddl_reorg_max_write_speed: 0
1 row in set (0.00 sec)

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> alter table sbtest1 add index(c);
Query OK, 0 rows affected (43.47 sec)

In another a session get the job id and alter the thread

 admin alter ddl jobs 137 thread=4;

2. What did you expect to see? (Required)

The ddl jobs finished success and alter success.

3. What did you see instead (Required)

The ddl jobs finished success and alter success, but the there are error messages, the job took two attempts to succeed.

[2024/11/26 00:32:22.332 +08:00] [INFO] [index.go:1110] ["run add index job done"] [category=ddl] [charset=utf8mb4] [collation=utf8mb4_general_ci]
[2024/11/26 00:32:22.334 +08:00] [WARN] [session.go:819] ["can not retry txn"] [label=internal] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"] [IsBatchInsert=false] [IsPessimistic=false] [InRestrictedSQL=true] [tidb_retry_limit=10] [tidb_disable_txn_auto_retry=true]
[2024/11/26 00:32:22.334 +08:00] [WARN] [session.go:835] ["commit failed"] ["finished txn"="Txn{state=invalid}"] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"]
[2024/11/26 00:32:22.334 +08:00] [INFO] [job_scheduler.go:587] ["handle ddl job failed"] [category=ddl] [jobID=119] [conn=3271557130] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"] [job="ID:119, Type:add index, State:done, SchemaState:public, SchemaID:2, TableID:110, RowCount:1000000, ArgLen:1, start time: 2024-11-26 00:31:54.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454178193861246994, Version: v2, UniqueWarnings:0"]
[2024/11/26 00:32:22.335 +08:00] [INFO] [job_scheduler.go:497] ["run job failed"] [category=ddl] [error="[kv:9007]Write conflict, txnStartTS=454178193874354182, conflictStartTS=454178194306891786, conflictCommitTS=454178194306891787, key={tableID=281474976710654, tableName=mysql.tidb_ddl_job, handle=119}, originalKey=748000fffffffffffe5f728000000000000077, primary={metaKey=true, key=DB:2, field=Table:110}, originalPrimaryKey=6d44423a3200000000fb00000000000000685461626c653a3131ff3000000000000000f8, reason=Optimistic [try again later]"] [job="ID:119, Type:add index, State:done, SchemaState:public, SchemaID:2, TableID:110, RowCount:1000000, ArgLen:1, start time: 2024-11-26 00:31:54.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454178193861246994, Version: v2, UniqueWarnings:0"]
[2024/11/26 00:32:22.335 +08:00] [INFO] [job_worker.go:784] ["run DDL job"] [category=ddl] [jobID=119] [conn=3271557130] [job="ID:119, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:0, start time: 2024-11-26 00:31:54.129 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454178193861246994, Version: v2, UniqueWarnings:0"]

4. What is your TiDB version? (Required)

master

The text was updated successfully, but these errors were encountered:

fzzf678 · 2024-11-26T07:24:45Z

This behavior is expected, because the admin alter ddl jobs statement is taking effort by modifying the job meta in tidb_ddl_job table, this can cause write conflict with the ddl job itself, but we already have the retry mechanism and checkpoint to solve the write conflict. After the write conflict error was thrown, the retry log will like below, these means the job will be retry from checkpoint.

2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[job_worker.go:784] ["run DDL job"] [category=ddl] [jobID=153] [conn=1277165660] [job="ID:153, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:0, start time: 2024-11-26 01:52:48.312 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:454191734682288156, Version: v2, UniqueWarnings:0"]
2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[index.go:1322] ["index backfill state running"] [category=ddl] ["job ID"=153] [table=sbtest1] ["ingest mode"=true] [index=c]
2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[job_scheduler.go:722] ["resume physical table ID from checkpoint"] [category=ddl-ingest] [jobID=153] ["global sync key"=74800000000000006e5f728000000005feda0800] ["checkpoint physical ID"=110]
2024-11-26 14:55:27 (UTC+08:00)TiDB 172.16.6.95:4604[index.go:2352] ["start to add table index"] [category=ddl] [job="ID:153, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:1, start time: 2024-11-26 01:52:48.312 -0500 EST, Err:<nil>, ErrCount:0, SnapshotVersion:454191734682288156, Version: v2, UniqueWarnings:0"] [reorgInfo="CurrElementType:_idx_,CurrElementID:15,StartKey:74800000000000006e5f728000000005feda0800,EndKey:74800000000000006e5f728000000005feda0800,First:false,PhysicalTableID:110,Ingest mode:false"]

or

[2024/11/26 14:14:02.657 +08:00] [INFO] [index.go:2352] ["start to add table index"] [category=ddl] [job="ID:129, Type:add index, State:running, SchemaState:write reorganization, SchemaID:2, TableID:110, RowCount:0, ArgLen:1, start time: 2024-11-26 14:13:10.919 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:454191111475822599, Version: v2, UniqueWarnings:0"] [reorgInfo="CurrElementType:_idx_,CurrElementID:10,StartKey:74800000000000006e5f7280000000000e62f700,EndKey:74800000000000006e5f7280000000000f589f00,First:false,PhysicalTableID:110,Ingest mode:false"]
[2024/11/26 14:14:02.659 +08:00] [INFO] [backfilling.go:487] ["load table ranges from PD"] [category=ddl] [physicalTableID=110] ["start key"=74800000000000006e5f7280000000000e62f700] ["end key"=74800000000000006e5f7280000000000f589f00]
[2024/11/26 14:14:02.659 +08:00] [INFO] [backfilling.go:514] ["load table ranges from PD done"] [category=ddl] [physicalTableID=110] ["range start"=74800000000000006e5f7280000000000e62f700] ["range end"=74800000000000006e5f7280000000000f589f00] ["range count"=3]
[2024/11/26 14:14:02.659 +08:00] [INFO] [backfilling.go:1023] ["start backfill workers to reorg record"] [category=ddl] [type="add index"] [workerCnt=4] [regionCnt=3] [startKey=74800000000000006e5f7280000000000e62f700] [endKey=74800000000000006e5f7280000000000f589f00]

fzzf678 · 2024-11-26T07:26:47Z

May be the write conflict problem can be solved to avoid retry, Convert this to an enhancement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Admin alter ddl jobs behavior is not expected #57697

Admin alter ddl jobs behavior is not expected #57697

Admin alter ddl jobs behavior is not expected #57697

Admin alter ddl jobs behavior is not expected #57697

Comments

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)