SparkMonitor

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) job execution from notebooks.

About

+

=

SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface.

Requirements

JupyterLab 4 or Jupyter Notebook 4.4.0 or later
PySpark 3.x or 4.x
- SparkMonitor requires Spark API mode "Spark Classic" (default in Spark 3.x and 4.0).
- Not compatible with Spark Client (Spark Connect), which uses the new decoupled client-server architecture.

Features

Live Monitoring: Automatically displays an interactive monitoring panel below each cell that runs Spark jobs in your Jupyter notebook.
Job and Stage Table: View a real-time table of Spark jobs and stages, each with progress bars for easy tracking.
Timeline Visualization: Explore a dynamic timeline showing the execution flow of jobs, stages, and tasks.
Resource Graphs: Monitor active tasks and executor core usage over time with intuitive graphs.

Quick Start

Installation

pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# When using jupyterlab extension is automatically enabled

# When using older versions of jupyter notebook install and enable the nbextension with:
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable  sparkmonitor --py

How to use SparkMonitor in your notebook

Create your Spark session with the extra configurations to activate the SparkMonitor listener. You will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package: path/to/package/sparkmonitor/listener_<scala_version>.jar
Example:

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.12/site-packages/sparkmonitor/listener_2.13.jar')\
        .getOrCreate()

Legacy: with the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:


from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)
Development

If you'd like to develop the extension:
# See package.json scripts for building the frontend
yarn run build:<action>

# Install the package in editable mode
pip install -e .

# Symlink jupyterlab extension
jupyter labextension develop --overwrite .

# Watch for frontend changes
yarn run watch

# Build the spark JAR files
sbt +package

History



The first version of SparkMonitor was written by krishnan-r as a Google Summer of Code project with the SWAN Notebook Service team at CERN.


Further fixes and improvements were made by the team at CERN and members of the community maintained at swan-cern/jupyter-extensions/tree/master/SparkMonitor


Jafer Haider worked on updating the extension to be compatible with JupyterLab as part of his internship at Yelp.

Jafer's work at the fork jupyterlab-sparkmonitor has since been merged into this repository to provide a single package for both JupyterLab and Jupyter Notebook.



Further development and maintenance is being done by the SWAN team at CERN and the community.


Changelog

This repository is published to pypi as sparkmonitor


2.x see the github releases page of this repository


1.x and below were published from swan-cern/jupyter-extensions and some initial versions from krishnan-r/sparkmonitor

Name		Name	Last commit message	Last commit date
Latest commit History 319 Commits
.github/workflows		.github/workflows
.ipython/profile_default		.ipython/profile_default
scalalistener_spark3		scalalistener_spark3
scalalistener_spark4		scalalistener_spark4
sparkmonitor		sparkmonitor
src		src
style		style
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.prettierignore		.prettierignore
.yarnrc.yml		.yarnrc.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
install.json		install.json
package.json		package.json
prettier.config.js		prettier.config.js
pyproject.toml		pyproject.toml
setup.py		setup.py
tsconfig.json		tsconfig.json
tsconfig.lab.json		tsconfig.lab.json
tsconfig.notebook.json		tsconfig.notebook.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SparkMonitor

About

Requirements

Features

Quick Start

Installation

How to use SparkMonitor in your notebook

Development

History

Changelog

About

Uh oh!

Releases 8

Uh oh!

Contributors 14

Uh oh!

Languages

License

swan-cern/sparkmonitor

Folders and files

Latest commit

History

Repository files navigation

SparkMonitor

About

Requirements

Features

Quick Start

Installation

How to use SparkMonitor in your notebook

Development

History

Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Uh oh!

Contributors 14

Uh oh!

Languages