8000 "malloc: Heap corruption detected" after running merge_adjacent_files() on PG+S3 partitioned ducklake · Issue #87 · duckdb/ducklake · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

"malloc: Heap corruption detected" after running merge_adjacent_files() on PG+S3 partitioned ducklake #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kiwialec opened this issue May 29, 2025 · 3 comments

Comments

@kiwialec
Copy link
kiwialec commented May 29, 2025

I've managed to reproduce this a couple of times, but only with my data:

  1. Create a new ducklake (PG catalog, S3 object store partitioned on 2 fields)
  2. Copy in ~40GB of data from an iceberg lake (INSERT TO ... FROM s3tables WHERE ..)
  3. Everything works fine - at this point I can query etc and get the expected results
  4. run CALL main.merge_adjacent_files() - appears to succeed but does nothing
  5. All subsequent queries to the ducklake cause the process to crash with a malloc error below
duckdb(36187,0x16b99f000) malloc: Heap corruption detected, free list is damaged at 0x6000037c53e0
*** Incorrect guard value: 105553178690160
duckdb(36187,0x16b99f000) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort      duckdb

This happens whether I do any of these queries from my local mac or remote ubuntu.

Confusingly, I exported the contents of pg before and after, and none of the data appears different. Looking through S3, I can't see any modified files.

Happy to run this a couple of times if you let me know how to get useful debugging info out of duckdb

@Mytherin
Copy link
Contributor

Thanks for the report!

Does this behavior only happen when using Postgres/S3, or does it also happen locally when using DuckDB + local storage?

Does the behavior happen after reconnecting as well? Or does calling merge_adjacent_files only influence the running process, and the behavior is fine again after reconnecting?

@kiwialec
Copy link
Author

The behaviour happens when S3 is used as the storage - it did not appear when I used the SSD as storage (using both duckdb and postgres as the catalog).

When it happens, it spoils the ducklake completely - after restarting the process, it will crash any time the ducklake is queried.

@Mytherin
Copy link
Contributor

Could you try querying the Parquet files directly? Perhaps there's a particular Parquet file that is causing issues here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0