BLAZE

The Blaze accelerator for Apache Spark leverages native vectorized execution to accelerate query processing. It combines the power of the Apache Arrow-DataFusion library and the scale of the Spark distributed computing framework.

Blaze takes a fully optimized physical plan from Spark, mapping it into DataFusion's execution plan, and performs native plan computation in Spark executors.

Blaze is composed of the following high-level components:

Spark Extension: hooks the whole accelerator into Spark execution lifetime.
Spark Shims: specialized codes for different versions of spark.
Native Engine: implements the native engine in rust, including:
- ExecutionPlan protobuf specification
- JNI gateway
- Customized operators, expressions, functions

Based on the inherent well-defined extensibility of DataFusion, Blaze can be easily extended to support:

Various object stores.
Operators.
Simple and Aggregate functions.
File formats.

We encourage you to extend DataFusion capability directly and add the supports in Blaze with simple modifications in plan-serde and extension translation.

Build from source

To build Blaze, please follow the steps below:

Install Rust

The native execution lib is written in Rust. So you're required to install Rust (nightly) first for compilation. We recommend you to use rustup.

Install JDK+Maven

Blaze has been well tested on jdk8 and maven3.5, should work fine with higher versions.

Check out the source code.

git clone git@github.com:blaze-init/blaze.git
cd blaze

Build the project.

Specify shims package of which spark version that you would like to run on. _Currently we have supported these shims:

spark303 - for spark3.0.x
spark313 - for spark3.1.x
spark324 - for spark3.2.x
spark333 - for spark3.3.x
spark351 - for spark3.5.x.

You could either build Blaze in dev mode for debugging or in release mode to unlock the full potential of Blaze.

SHIM=spark333 # or spark303/spark313/spark320/spark324/spark333/spark351
MODE=release # or pre
mvn package -P"${SHIM}" -P"${MODE}"

After the build is finished, a fat Jar package that contains all the dependencies will be generated in the target directory.

Build with docker

You can use the following command to build a centos-7 compatible release:

SHIM=spark333 MODE=release ./release-docker.sh

Run Spark Job with Blaze Accelerator

This section describes how to submit and configure a Spark Job with Blaze support.

move blaze jar package to spark client classpath (normally spark-xx.xx.xx/jars/).
add the follow confs to spark configuration in spark-xx.xx.xx/conf/spark-default.conf:

spark.blaze.enable true
spark.sql.extensions org.apache.spark.sql.blaze.BlazeSparkSessionExtension
spark.shuffle.manager org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager
spark.memory.offHeap.enabled false

# suggested executor memory configuration
spark.executor.memory 4g
spark.executor.memoryOverhead 4096

submit a query with spark-sql, or other tools like spark-thriftserver:

spark-sql -f tpcds/q01.sql

Performance

Check Benchmark Results with the latest date for the performance comparison with vanilla Spark 3.3.3. The benchmark result shows that Blaze save about 50% time on TPC-DS/TPC-H 1TB datasets. Stay tuned and join us for more upcoming thrilling numbers.

TPC-DS Query time: (How can I run TPC-DS benchmark?)

TPC-H Query time:

We also encourage you to benchmark Blaze and share the results with us. 🤗

Community

We're using Discussions to connect with other members of our community. We hope that you:

Ask questions you're wondering about.
Share ideas.
Engage with other community members.
Welcome others who are open-minded. Remember that this is a community we build together 💪 .

License

Blaze is licensed under the Apache 2.0 License. A copy of the license can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 993 Commits
.github		.github
benchmark-results		benchmark-results
dev		dev
native-engine		native-engine
spark-extension-shims-spark3		spark-extension-shims-spark3
spark-extension		spark-extension
tpcds		tpcds
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.txt		LICENSE.txt
README.md		README.md
RELEASES.md		RELEASES.md
build-native.sh		build-native.sh
pom.xml		pom.xml
release-docker.sh		release-docker.sh
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml
scalafix.conf		scalafix.conf
scalafmt.conf		scalafmt.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BLAZE

Build from source

Build with docker

Run Spark Job with Blaze Accelerator

Performance

Community

License

About

Releases

Packages

Languages

License

TJX2014/blaze

Folders and files

Latest commit

History

Repository files navigation

BLAZE

Build from source

Build with docker

Run Spark Job with Blaze Accelerator

Performance

Community

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages