8000 JDBC Java - Memory Leak - Linux in Docker · Issue #9712 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
JDBC Java - Memory Leak - Linux in Docker #9712
Open
@brianwyka

Description

@brianwyka

What happens?

DuckDB is leaking memory even after closing connections and statements. We have a multi-threaded architecture which is opening connections to the in-memory DuckDB and creating unique tables, ingestion data into them, writing them to file, and then dropping the tables after complete. Each thread creates a new connection and closes it when complete. No 2 threads ever act upon the same table.

Please see the attached image for visual of the docker container resources.

Perhaps somewhat related or linked to #9263 or #109? Maybe a leak in httpfs or nested s3fs extension layer?

image

To Reproduce

Launch the following code in parallel threads (I was executing 5-10 concurrently in my workload).

try (var connection = DriverManager.getConnection("jdbc:duckdb:")) {

    // Used for unique table name
    final String ingestionId = UUID.randomUUID().toString().replace("-", "");

    // Setup connection to S3
    try (var statement = connection.createStatement()) {
        final String setupSql = """
            SET s3_region="us-east-1";
            SET s3_access_key_id="***";
            SET s3_secret_access_key="***";
            SET s3_session_token="***";
        """;
        statement.executeUpdate(setupSql);
    }

    // Create temporary table
    try (var statement = connection.createStatement()) {
        final String createTableSql = """
            CREATE TABLE table_%s (
                id VARCHAR,
                name VARCHAR, 
                description VARCAR
            );
        """.formatted(ingestionId);
        statement.executeUpdate(createTableSql);
    }

   // Read GZIP compressed CSV data from S3 into temporary table
   try (var statement = connection.createStatement()) {
        final String readSql = """
            INSERT INTO table_%s
            SELECT * FROM read_csv_auto('s3://bucket/file.csv.gz', ALL_VARCHAR=true);
        """.formatted(ingestionId);
        statement.executeUpdate(readSql);
    }

    // Export the data from DuckDB to parquet files on file system
    try (var statement = connection.createStatement()) {
        final String writeSql = """
            COPY (
                SELECT *
                FROM table_%s 
                ORDER BY 
                  column1 ASC
            )
            TO '/tmp/%s.parquet'
            WITH (FORMAT PARQUET, COMPRESSION GZIP);
        """.formatted(ingestionId, ingestionId);
        statement.executeUpdate(writeSql);
    }

    // Drop the temporary table to cleanup memory (or at least try to)
    try (var statement = connection.createStatement()) {
        final String dropSql = "DROP TABLE table_%s;".formatted(ingestionId);
        statement.executeUpdate(dropSql);
    }

}

OS:

Host Machine:
Mac iOS x64
Darwin Version 22.4.0: Mon Mar 6 21:00:17 PST 2023; root:xnu-8796.101.5~3/RELEASE_X86_64 x86_64

Docker Container:
Linux 6.4.16-linuxkit #1 SMP PREEMPT_DYNAMIC Fri Nov 10 14:51:57 UTC 2023 x86_64 GNU/Linux

DuckDB Version:

0.9.2

DuckDB Client:

Java JDBC

Full Name:

Brian Wyka

Affiliation:

LogicMonitor

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0