Stars
REST job server for Apache Spark
A Scala API for Apache Beam and Google Cloud Dataflow.
Base classes to use when writing tests with Spark
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
A collection of open source Apache 2.0 Kafka Connector maintained by Lenses.io.
A tool for data sampling, data generation, and data diffing
Google BigQuery support for Spark, SQL, and DataFrames
Spark Extension : ML transformers, SQL aggregations, etc that are missing in Apache Spark
Spark-based approximate nearest neighbor search using locality-sensitive hashing
A fast and generic immutable radix tree for scala
GCS support for avro-tools, parquet-tools and protobuf
Interactive Audience Analytics with Spark and HyperLogLog
Based off the design of SparkOnHBase. This Repo will support Spark, Spark Streaming, and Spark SQL integration with Kudu.
Maven archetype used to bootstrap a Spark Scala project
Spark examples for parsing common crawl data
Interactive Audience Analytics with Spark and HyperLogLog