8000 GitHub - swan-cern/sparkmonitor: An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

swan-cern/sparkmonitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparkMonitor

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) job execution from notebooks.

About

+ =
SparkMonitor is an extension for Jupyter Notebook & Lab that enables the live monitoring of Apache Spark Jobs spawned from a notebook. The extension provides several features to monitor and debug a Spark job from within the notebook interface.

jobdisplay

Requirements

  • JupyterLab 4 or Jupyter Notebook 4.4.0 or later
  • PySpark 3.x or 4.x
    • SparkMonitor requires Spark API mode "Spark Classic" (default in Spark 3.x and 4.0).
    • Not compatible with Spark Client (Spark Connect), which uses the new decoupled client-server architecture.

Features

  • Live Monitoring: Automatically displays an interactive monitoring panel below each cell that runs Spark jobs in your Jupyter notebook.
  • Job and Stage Table: View a real-time table of Spark jobs and stages, each with progress bars for easy tracking.
  • Timeline Visualization: Explore a dynamic timeline showing the execution flow of jobs, stages, and tasks.
  • Resource Graphs: Monitor active tasks and executor core usage over time with intuitive graphs.

Quick Start

Installation

pip install sparkmonitor # install the extension

# set up an ipython profile and add our kernel extension to it
ipython profile create # if it does not exist
echo "c.InteractiveShellApp.extensions.append('sparkmonitor.kernelextension')" >>  $(ipython profile locate default)/ipython_kernel_config.py

# When using jupyterlab extension is automatically enabled

# When using older versions of jupyter notebook install and enable the nbextension with:
jupyter nbextension install sparkmonitor --py
jupyter nbextension enable  sparkmonitor --py

How to use SparkMonitor in your notebook

Create your Spark session with the extra configurations to activate the SparkMonitor listener. You will need to set spark.extraListeners to sparkmonitor.listener.JupyterSparkMonitorListener and spark.driver.extraClassPath to the path to the sparkmonitor python package: path/to/package/sparkmonitor/listener_<scala_version>.jar
Example:

from pyspark.sql import SparkSession
spark = SparkSession.builder\
        .config('spark.extraListeners', 'sparkmonitor.listener.JupyterSparkMonitorListener')\
        .config('spark.driver.extraClassPath', 'venv/lib/python3.12/site-packages/sparkmonitor/listener_2.13.jar')\
        .getOrCreate()

Legacy: with the extension installed, a SparkConf object called conf will be usable from your notebooks. You can use it as follows:

from pyspark import SparkContext
# Start the spark context using the SparkConf object named `conf` the extension created in your kernel.
sc=SparkContext.getOrCreate(conf=conf)

Development

If you'd like to develop the extension:

# See package.json scripts for building the frontend
yarn run build:<action>

# Install the package in editable mode
pip install -e .

# Symlink jupyterlab extension
jupyter labextension develop --overwrite .

# Watch for frontend changes
yarn run watch

# Build the spark JAR files
sbt +package

History

  • The first version of SparkMonitor was written by krishnan-r as a Google Summer of Code project with the SWAN Notebook Service team at CERN.

  • Further fixes and improvements were made by the team at CERN and members of the community maintained at swan-cern/jupyter-extensions/tree/master/SparkMonitor

  • Jafer Haider worked on updating the extension to be compatible with JupyterLab as part of his internship at Yelp.

    • Jafer's work at the fork jupyterlab-sparkmonitor has since been merged into this repository to provide a single package for both JupyterLab and Jupyter Notebook.
  • Further development and maintenance is being done by the SWAN team at CERN and the community.

Changelog

This repository is published to pypi as sparkmonitor

About

An extension for Jupyter Lab & Jupyter Notebook to monitor Apache Spark (pyspark) from notebooks

Topics

Resources

License

Stars

Watchers

Forks

Contributors 14

0