8000 Proposal for Implementing Streaming Query Support in chDB​ · Issue #322 · chdb-io/chdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Proposal for Implementing Streaming Query Support in chDB​ #322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wudidapaopao opened this issue Apr 15, 2025 · 1 comment
Open

Proposal for Implementing Streaming Query Support in chDB​ #322

wudidapaopao opened this issue Apr 15, 2025 · 1 comment
Assignees

Comments

@wudidapaopao
Copy link
Contributor
wudidapaopao commented Apr 15, 2025

Currently, chDB executes queries by fetching the entire result set at once through the query_conn interface. This approach may lead to high memory usage and latency for large datasets. To address this, we propose adding ​​streaming query capabilities​​ to chDB.

The existing LocalServer in chDB initializes the execution engine via Connection::sendQuery and retrieves all results in one go using receiveResult, storing them in WriteBufferFromVector.

Proposed Changes​​

  1. ​​chDB Interface Modifications​​
    ​​New send_query Interface​​: Introduce a send_query method to initialize a streaming query. This method returns a stream_local_result object with a fetch method.
    ​​fetch Method in stream_local_result​​: Each call to fetch returns a single row (or a chunk) in the specified format (e.g., JSON, Arrow), enabling incremental data consumption.
    ​​
  2. LocalServer (ClientBase) Adjustments​​
    Deferred Result Retrieval​​: During the first initialization, only call Connection::sendQuery to set up the execution engine ​​without fetching results immediately​​.
    ​​On-Demand receiveResult Calls​​: When fetch is invoked, trigger receiveResult to retrieve a chunk of data. Once the chunk is exhausted, call receiveResult again for the next chunk.
    ​​Handling Blocking​​: If receiveResult is not called for an extended period, the execution engine may block.

The proposal can also address #265

@wudidapaopao wudidapaopao self-assigned this Apr 15, 2025
@jovezhong
Copy link

BTW, https://github.com/timeplus-io/proton is an implementation of streaming SQL engine (like Apache Flink), using ClickHouse codebase. New results can be pushed to client via HTTP/TCP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0