An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) job execution from notebooks.
+ | = |
- JupyterLab 4 or Jupyter Notebook 4.4.0 or later
- PySpark 3.x or 4.x
- SparkMonitor requires Spark API mode "Spark Classic" (default in Spark 3.x and 4.0).
- Not compatible with Spark Client (Spark Connect), which uses the new decoupled client-server architecture.
- Live Monitoring: Automatically displays an interactive monitoring panel below each cell that runs Spark jobs in your Jupyter notebook.
- Job and Stage Table: View a real-time table of Spark jobs and stages, each with progress bars for easy tracking.
- Timeline Visualization: Explore a dynamic timeline showing the execution flow of jobs, stages, and tasks.
- Resource Graphs: Monitor active tasks and executor core usage over time with intuitive graphs.
pip install sparkmonitor # install the extension
# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >> $(ipython profile locate default)/ipython_kernel_config.py
# When using jupyterlab extension is automatically enabled
# When using older versions of jupyter notebook install and enable the nbextension with:
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable sparkmonitor --py
Create your Spark session with the extra configurations to activate the SparkMonitor listener.
You will need to set spark.extraListeners
to sparkmonitor.listener.JupyterSparkMonitorListener
and
spark.driver.extraClassPath
to the path to the sparkmonitor python package: path/to/package/sparkmonitor/listener_<scala_version>.jar
Example:
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
.config('spark.driver.extraClassPath', 'venv/lib/python3.12/site-packages/sparkmonitor/listener_2.13.jar')\
.getOrCreate()
Legacy: with the extension installed, a SparkConf object called
conf
will be usable from your notebooks. You can use it as follows:
from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)
If you'd like to develop the extension:
# See package.json scripts for building the frontend
yarn run build:<action>
# Install the package in editable mode
pip install -e .
# Symlink jupyterlab extension
jupyter labextension develop --overwrite .
# Watch for frontend changes
yarn run watch
# Build the spark JAR files
sbt +package
-
The first version of SparkMonitor was written by krishnan-r as a Google Summer of Code project with the SWAN Notebook Service team at CERN.
-
Further fixes and improvements were made by the team at CERN and members of the community maintained at swan-cern/jupyter-extensions/tree/master/SparkMonitor
-
Jafer Haider worked on updating the extension to be compatible with JupyterLab as part of his internship at Yelp.
- Jafer's work at the fork jupyterlab-sparkmonitor has since been merged into this repository to provide a single package for both JupyterLab and Jupyter Notebook.
-
Further development and maintenance is being done by the SWAN team at CERN and the community.
This repository is published to pypi as sparkmonitor
-
2.x see the github releases page of this repository
-
1.x and below were published from swan-cern/jupyter-extensions and some initial versions from krishnan-r/sparkmonitor