Description
What happens?
DuckDB is leaking memory even after closing connections and statements. We have a multi-threaded architecture which is opening connections to the in-memory DuckDB and creating unique tables, ingestion data into them, writing them to file, and then dropping the tables after complete. Each thread creates a new connection and closes it when complete. No 2 threads ever act upon the same table.
Please see the attached image for visual of the docker container resources.
Perhaps somewhat related or linked to #9263 or #109? Maybe a leak in httpfs
or nested s3fs
extension layer?
To Reproduce
Launch the following code in parallel threads (I was executing 5-10 concurrently in my workload).
try (var connection = DriverManager.getConnection("jdbc:duckdb:")) {
// Used for unique table name
final String ingestionId = UUID.randomUUID().toString().replace("-", "");
// Setup connection to S3
try (var statement = connection.createStatement()) {
final String setupSql = """
SET s3_region="us-east-1";
SET s3_access_key_id="***";
SET s3_secret_access_key="***";
SET s3_session_token="***";
""";
statement.executeUpdate(setupSql);
}
// Create temporary table
try (var statement = connection.createStatement()) {
final String createTableSql = """
CREATE TABLE table_%s (
id VARCHAR,
name VARCHAR,
description VARCAR
);
""".formatted(ingestionId);
statement.executeUpdate(createTableSql);
}
// Read GZIP compressed CSV data from S3 into temporary table
try (var statement = connection.createStatement()) {
final String readSql = """
INSERT INTO table_%s
SELECT * FROM read_csv_auto('s3://bucket/file.csv.gz', ALL_VARCHAR=true);
""".formatted(ingestionId);
statement.executeUpdate(readSql);
}
// Export the data from DuckDB to parquet files on file system
try (var statement = connection.createStatement()) {
final String writeSql = """
COPY (
SELECT *
FROM table_%s
ORDER BY
column1 ASC
)
TO '/tmp/%s.parquet'
WITH (FORMAT PARQUET, COMPRESSION GZIP);
""".formatted(ingestionId, ingestionId);
statement.executeUpdate(writeSql);
}
// Drop the temporary table to cleanup memory (or at least try to)
try (var statement = connection.createStatement()) {
final String dropSql = "DROP TABLE table_%s;".formatted(ingestionId);
statement.executeUpdate(dropSql);
}
}
OS:
Host Machine:
Mac iOS x64
Darwin Version 22.4.0: Mon Mar 6 21:00:17 PST 2023; root:xnu-8796.101.5~3/RELEASE_X86_64 x86_64
Docker Container:
Linux 6.4.16-linuxkit #1 SMP PREEMPT_DYNAMIC Fri Nov 10 14:51:57 UTC 2023 x86_64 GNU/Linux
DuckDB Version:
0.9.2
DuckDB Client:
Java JDBC
Full Name:
Brian Wyka
Affiliation:
LogicMonitor
Have you tried this on the latest main
branch?
I have tested with a release build (and could not test with a main build)
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have