8000 Problem: Pipeline sync might deadlock by arajkumar · Pull Request #823 · dimitri/pgcopydb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Problem: Pipeline sync might deadlock #823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arajkumar
Copy link
Contributor
@arajkumar arajkumar commented Jun 24, 2024

Deadlock call stack:

%0  0x0000787ac168fe16 in select () from target:/lib/x86_64-linux-gnu/libc.so.6
%1  0x000057db30f718ad in pgsql_sync_pipeline (pgsql=pgsql@entry=0x7ffc8ce1f9c8) at pgsql.c:1979
%2  0x000057db30f5bc2c in stream_apply_file (context=context@entry=0x7ffc8ce1d880) at ld_apply.c:657
%3  0x000057db30f5c495 in stream_apply_catchup (specs=specs@entry=0x7ffc8d030050) at ld_apply.c:112
%4  0x000057db30f549d2 in follow_start_catchup (specs=0x7ffc8d030050) at follow.c:809
%5  0x000057db30f54baa in follow_start_subprocess (specs=specs@entry=0x7ffc8d030050,
%   subprocess=subprocess@entry=0x7ffc8d138da8) at follow.c:860
%6  0x000057db30f55383 in follow_prepare_mode_switch (streamSpecs=streamSpecs@entry=0x7ffc8d030050,
%   previousMode=previousMode@entry=STREAM_MODE_CATCHUP, currentMode=currentMode@entry=STREAM_MODE_REPLAY) at follow.c:488
%7  0x000057db30f55a0b in follow_main_loop (copySpecs=copySpecs@entry=0x7ffc8d13bd50,
%   streamSpecs=streamSpecs@entry=0x7ffc8d030050) at follow.c:340
%8  0x000057db30f37cbb in cli_follow (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:464
%9  0x000057db30f8a17a in commandline_run (command=command@entry=0x7ffc8d24f0d0, argc=0, argc@entry=4, argv=0x7ffc8d24f248,
%   argv@entry=0x7ffc8d24f228) at /usr/src/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71
%10 0x000057db30f36cf0 in main (argc=4, argv=0x7ffc8d24f228) at main.c:142

Root cause:

PQpipelineSync might clear select read condition because it might read the data from server.
Code path:(PQpipelineSync->pqPipelineSyncInternal->pqFlush->pqSendSome->pqReadData)

Solution:

Get rid of select call and read results in a blocking mode. It also reduces CPU utilization as the function never pools.

Along with fixing the deadlock, this commit also fixes the following,

  1. Use after free while dealing with PGresult
  2. Handle notifications while readings results

**Deadlock call stack:**

```
%0  0x0000787ac168fe16 in select () from target:/lib/x86_64-linux-gnu/libc.so.6
%1  0x000057db30f718ad in pgsql_sync_pipeline (pgsql=pgsql@entry=0x7ffc8ce1f9c8) at pgsql.c:1979
%2  0x000057db30f5bc2c in stream_apply_file (context=context@entry=0x7ffc8ce1d880) at ld_apply.c:657
%3  0x000057db30f5c495 in stream_apply_catchup (specs=specs@entry=0x7ffc8d030050) at ld_apply.c:112
%4  0x000057db30f549d2 in follow_start_catchup (specs=0x7ffc8d030050) at follow.c:809
%5  0x000057db30f54baa in follow_start_subprocess (specs=specs@entry=0x7ffc8d030050,
%   subprocess=subprocess@entry=0x7ffc8d138da8) at follow.c:860
%6  0x000057db30f55383 in follow_prepare_mode_switch (streamSpecs=streamSpecs@entry=0x7ffc8d030050,
%   previousMode=previousMode@entry=STREAM_MODE_CATCHUP, currentMode=currentMode@entry=STREAM_MODE_REPLAY) at follow.c:488
%7  0x000057db30f55a0b in follow_main_loop (copySpecs=copySpecs@entry=0x7ffc8d13bd50,
%   streamSpecs=streamSpecs@entry=0x7ffc8d030050) at follow.c:340
%8  0x000057db30f37cbb in cli_follow (argc=<optimized out>, argv=<optimized out>) at cli_clone_follow.c:464
%9  0x000057db30f8a17a in commandline_run (command=command@entry=0x7ffc8d24f0d0, argc=0, argc@entry=4, argv=0x7ffc8d24f248,
%   argv@entry=0x7ffc8d24f228) at /usr/src/pgcopydb/src/bin/pgcopydb/../lib/subcommands.c/commandline.c:71
%10 0x000057db30f36cf0 in main (argc=4, argv=0x7ffc8d24f228) at main.c:142

```

**Root cause:**

PQpipelineSync might clear select read condition because it might read
the data from server.
Code path:(PQpipelineSync->pqPipelineSyncInternal->pqFlush->pqSendSome->pqReadData)

**Solution:**

Get rid of select call and read results in a blocking mode. It also
reduces CPU utilization as the function never pools.

Along with fixing the deadlock, this commit also fixes the following,
1) Use after free while dealing with PGresult
2) Handle notifications while readings results

[1] https://github.com/postgres/postgres/blob/fd49e8f32325c675d9bb6e26fcdbe9754249932f/src/interfaces/libpq/fe-misc.c#L856-L926

Fixes timescale/team-data-onboarding#149

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
@arajkumar arajkumar changed the title Problem: Pipeline sync deadlock Problem: Pipeline sync might deadlock Jun 24, 2024
@dimitri
Copy link
Owner
dimitri commented Jun 24, 2024

Fixes #794

@dimitri dimitri added the bug Something isn't working label Jun 24, 2024
@dimitri dimitri added this to the v0.17 milestone Jun 24, 2024
@dimitri dimitri merged commit ce11b3a into dimitri:main Jun 24, 2024
19 checks passed
@arajkumar
Copy link
Contributor Author

Fixes #794

Ah, I didn’t know that there is already a bug for it. Thanks, @dimitri.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0