You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, chDB executes queries by fetching the entire result set at once through the query_conn interface. This approach may lead to high memory usage and latency for large datasets. To address this, we propose adding streaming query capabilities to chDB.
The existing LocalServer in chDB initializes the execution engine via Connection::sendQuery and retrieves all results in one go using receiveResult, storing them in WriteBufferFromVector.
Proposed Changes
chDB Interface Modifications
New send_query Interface: Introduce a send_query method to initialize a streaming query. This method returns a stream_local_result object with a fetch method.
fetch Method in stream_local_result: Each call to fetch returns a single row (or a chunk) in the specified format (e.g., JSON, Arrow), enabling incremental data consumption.
LocalServer (ClientBase) Adjustments
Deferred Result Retrieval: During the first initialization, only call Connection::sendQuery to set up the execution engine without fetching results immediately.
On-Demand receiveResult Calls: When fetch is invoked, trigger receiveResult to retrieve a chunk of data. Once the chunk is exhausted, call receiveResult again for the next chunk.
Handling Blocking: If receiveResult is not called for an extended period, the execution engine may block.
BTW, https://github.com/timeplus-io/proton is an implementation of streaming SQL engine (like Apache Flink), using ClickHouse codebase. New results can be pushed to client via HTTP/TCP
Currently, chDB executes queries by fetching the entire result set at once through the query_conn interface. This approach may lead to high memory usage and latency for large datasets. To address this, we propose adding streaming query capabilities to chDB.
The existing LocalServer in chDB initializes the execution engine via Connection::sendQuery and retrieves all results in one go using receiveResult, storing them in WriteBufferFromVector.
Proposed Changes
New send_query Interface: Introduce a send_query method to initialize a streaming query. This method returns a stream_local_result object with a fetch method.
fetch Method in stream_local_result: Each call to fetch returns a single row (or a chunk) in the specified format (e.g., JSON, Arrow), enabling incremental data consumption.
Deferred Result Retrieval: During the first initialization, only call Connection::sendQuery to set up the execution engine without fetching results immediately.
On-Demand receiveResult Calls: When fetch is invoked, trigger receiveResult to retrieve a chunk of data. Once the chunk is exhausted, call receiveResult again for the next chunk.
Handling Blocking: If receiveResult is not called for an extended period, the execution engine may block.
The proposal can also address #265
The text was updated successfully, but these errors were encountered: