Since version 43.0.0
datafusion ballista extending core components and functionalities.
This example demonstrate extending datafusion ballista to support delta.rs read operations.
Note
This project has been part of "Extending DataFusion Ballista" show case series
Important
This is just a showcase project, it is not meant to be maintained.
Setting up standalone ballista:
use ballista::prelude::{SessionConfigExt, SessionContextExt};
use ballista_delta::{BallistaDeltaLogicalCodec, BallistaDeltaPhysicalCodec};
use datafusion::{
common::Result,
execution::SessionStateBuilder,
prelude::{SessionConfig, SessionContext},
};
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<()> {
let config = SessionConfig::new_with_ballista()
.with_ballista_logical_extension_codec(Arc::new(BallistaDeltaLogicalCodec::default()))
.with_ballista_physical_extension_codec(Arc::new(BallistaDeltaPhysicalCodec::default()));
let state = custom_session_state(config)?;
let ctx = SessionContext::standalone_with_state(state).await?;
let table = deltalake::open_table("./data/people_countries_delta_dask")
.await
.unwrap();
ctx.register_table("demo", Arc::new(table)).unwrap();
ctx.sql("select * from demo").await?.show().await?;
ctx.sql("create external table c stored as delta location 's3://ballista/people_countries_delta_dask/' ")
.await?
.show()
.await?;
ctx.sql("select * from c").await?.show().await?;
Ok(())
}
Other examples show extending client, scheduler and executor for cluster deployment.