8000 Process memory grows when loading parquet files even with -Xmx · Issue #876 · atoti/atoti · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Process memory grows when loading parquet files even with -Xmx #876
Open
@dufferzafar

Description

@dufferzafar

Steps to reproduce

As a continuation of #866 I have this snippet to load parquet files (compressed or otherwise) in a separate thread.

import sys
import glob
import time
import threading
import atoti as tt

# jars folder contains zstd jar
session = tt.Session(extra_jars=["atoti/jars"], java_options=["-Xmx32G", "-Xms32G"])

# Create or update the table with parquet file
def load_parquet(pqfile, table_name, keys=None):
    table = session.tables.get(table_name, None)
    if table is None:
        table = session.read_parquet(pqfile, table_name=table_name, keys=keys)
    else:
        table.load_parquet(pqfile)
    return table

pq_files = sorted(glob.glob("/path/to/compressed/files_*.parquet"))

# Load the first file & create a cube
tbl = load_parquet(pq_files[0], "mytable", keys=["id"])
cube = session.create_cube(tbl)

# Load positions in a separate thread
def loader():
    for idx, pq in enumerate(pq_files):
        sys.stdout.write(f"\rLoading file #{idx} : {pq}")
        load_parquet(pq, "mytable")
        time.sleep(1) # wait a sec
threading.Thread(target=loader).start()

Actual Result

Process memory continues to grow when my loader thread runs!

The process starts with 40 GB VIRT & 2.6 GB RSS (observed via htop)

Initial server log indicates other values (1GB heap + 3GB direct)

2024-04-02T10:55:32.129-04:00  INFO 3615 --- [activepivot-health-event-dispatcher] c.a.h.m.ILoggingHealthEventHandler       : [jvm, memory] INFO 2024-04-02T14:55:32.127Z uptime=34570ms com.activeviam.health.monitor.impl.JvmHealthCheck.createEvent:61 thread=activeviam-health-check-worker thread_id=52 event_type=JvmMemoryReport JVM Memory Usage report: G1 Young Generation[count=10 (+0), time=0s (+0)]  G1 Old Generation[count=0 (+0), time=0s (+0)]  Heap[used=1 GiB 465 MiB (1561350664) (+(0)), committed=32 GiB (34359738368) (+(0)), max=32 GiB (34359738368) (+(0))]  Direct[used=3 GiB 46 MiB (3269516297) (+(0)), count=11569 (+0), max=32 GiB (34359738368) (+(0))]  Threads[count=103 (+0), peak=104 (+0)]

Now as the loader thread runs, and loads a file i see that the total memory continues to rise & only stops rising when the loader stops.

At the end it reached ~85 GB VIRT & 74GB RSS (seen via htop)

But the last line of the server log says that heap being used is 12GB and direct is 16GB.

2024-04-02T12:56:32.199-04:00  INFO 3615 --- [activepivot-health-event-dispatcher] c.a.h.m.ILoggingHealthEventHandler       : [jvm, memory] INFO 2024-04-02T16:56:32.199Z uptime=7294642ms com.activeviam.health.monitor.impl.JvmHealthCheck.createEvent:61 thread=activeviam-health-check-worker thread_id=52 event_type=JvmMemoryReport JVM Memory Usage report: G1 Young Generation[count=108 (+0), time=2s (+0)]  G1 Old Generation[count=3 (+0), time=1s (+0)]  Heap[used=12 GiB 980 MiB (13912595624) (+(0)), committed=32 GiB (34359738368) (+(0)), max=32 GiB (34359738368) (+(0))]  Direct[used=16 GiB 380 MiB (17578900307) (+(0)), count=286824 (+0), max=32 GiB (34359738368) (+(0))]  Threads[count=40 (+0), peak=113 (+0)]

So the numbers don't add up. The sum of all such memory log lines is within the 32GB upper-bound that I'd set initially.

But the process is actually taking much much more RAM.

Expected Result

I expected the -Xmx option to set a maximum RAM size on the process.

But I think that is just the JVM heap max? I read somewhere that Atoti allocates data off-heap as well. Is that what is happening?

Is there a way to restrict the TOTAL memory usage of the process?

Could it perhaps be a "leak" in the parquet loader?

Environment

  • atoti: 0.8.10

  • Python: 3.12.2

  • Operating system: linux

  • Machine being tested on has 32 cores & 256 GB RAM

Logs

I have detailed logs as well, please let me know what additional info you require and I'll be happy to help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐛 bugunexpected or wrong behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0