8000 StarRocks Roadmap 2025 · Issue #55526 · StarRocks/starrocks · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

StarRocks Roadmap 2025 #55526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
35 tasks
Dshadowzh opened this issue Feb 4, 2025 · 2 comments
Open
35 tasks

StarRocks Roadmap 2025 #55526

Dshadowzh opened this issue Feb 4, 2025 · 2 comments
Labels
type/enhancement Make an enhancement to StarRocks

Comments

@Dshadowzh
Copy link
Contributor
Dshadowzh commented Feb 4, 2025

Refer to previous roadmap 2024 2023 2022

Execution Engine

  • Query Stability

    • Query Plan Manager: Enhance the robustness of the query plan generator to minimize plan instability.
    • Data Skew Handling: Develop dynamic algorithms to detect and adjust for data skew, optimizing query execution.
    • Cache Resilience: Implement smarter caching mechanisms to reduce query jittering during CN changes.
  • Performance Tuning

    • Operator Improvements: Introduce poller-free execution and runtime filter pushdown to the storage layer.
    • History-Based Optimizer: Leverage query feedback to refine optimization strategies.
    • ARM Performance Tuning: Resolve performance bottlenecks and edge cases for ARM architectures.
  • Query Optimizer

    • Improve NDV (Number of Distinct Values) Accuracy: Enhance the precision of NDV statistical information.
    • Improve Multi-Column Statistics Accuracy: Optimize the accuracy of statistics for multi-column data.
    • Optimize Sampling Estimation Algorithms: Refine algorithms for estimating statistics through sampling.
    • Column Property Propagation Refactoring.
  • Batch Processing

    • Adaptive Concurrency: Dynamically adjust the number of concurrent tasks based on system load and resource availability.
    • Query Queue and Spill Stability: Improve stability and efficiency for large-scale batch processing on 1000+ core clusters.
  • Materialized Views

    • Incremental MV Framework: Reduce full recomputation costs by enabling incremental updates for materialized views.
  • Data Types

    • New Data Types: Support for advanced data types such as BigString, and Datetime/Timestamp with timezone, maybe Geo.
  • Functions

    • Trino-Compatible Functions: Expand function compatibility with Trino (see #40894).
    • Causal Inference: Introduce functions for causal analysis and inference.
    • Others

LakeHouse

  • Iceberg as a Fully Featured LakeHouse

    • Performant and Cost-Effective Query Engine: Enhance statistics collection, indexing, and materialized view support.
    • Iceberg V3 Spec Compliance: Support for Variant, deletion vectors, geo types, and auth specifications.
    • Full Operation Support: Enable DDL, DML, procedures, and seamless table migration.
    • Compaction and Layout Optimization: Introduce compaction services and automatic layout arrangement.
  • Paimon as a fully Featured streaming lakehouse

    • Query: Metadata optimization, manifest cache, index, point lookup optimization
    • Full operation support: time-travel, management for tagging & branching, DDL, DML...
    • Paimon new features: varient type, view, materialized view, incremental MV
  • Other Open Lake Formats
    For other formats, we will prioritize query performance improvements:

    • Hudi: Enhance RLI (Record-Level Indexing), bloom filters on Parquet, and metadata table support.
    • Delta Lake: Implement optimizations as needed based on user demand.

Shared Data

  • Make shared-data as default architecture. Focus on stability and real-time/search capability improvement.

    • Batch data ingestion stability issues
    • Cost reduction for both batch and streaming data ingestion
  • Enhanced Functionality

    • Time Travel and Snapshots: Improve support for time travel and snapshot functionality. Snapshot for shared-data #53999
    • Merge Into: Enable efficient data merging operations.
    • Hybird search: improve the mixed vector/full text/scalar search capability
  • Real-Time Storage Engine

    • Data Freshness: Improve data freshness with readable memtables.
    • Compaction Optimization: Optimize compaction for time-series data.
    • Better Pipe: Expand the use of Pipe for continuous data ingestion.
  • Multi-Statement Transactions

    • Enhance support for multi-statement transactions to support delete, update, and handle better transaction conflict.
@Dshadowzh Dshadowzh added the type/enhancement Make an enhancement to StarRocks label Feb 4, 2025
@Dshadowzh Dshadowzh pinned this issue Feb 10, 2025
@kateshaowanjou
Copy link
Contributor

Anyone interested in Paimon can also see this doc: StarRocks Community 2025 Roadmap for Paimon

@jimdowling
Copy link

I see that ASOF Join that was on the 2024 roadmap is no longer here.
That is an important join for using StarRocks to create training data for AI systems -
https://www.hopsworks.ai/dictionary/point-in-time-correct-joins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Make an enhancement to StarRocks
Projects
None yet
Development

No branches or pull requests

3 participants
0