8000 GitHub - illinoisdata/kishu
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

illinoisdata/kishu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

build status codecov Python version PyPi version

Kishu: Versioned and Undoable Notebook System



Kishu is a system for intelligent versioning of notebook session states on Jupyter-based platforms (e.g. JupyterLab, Jupyter Hub). Kishu efficiently creates checkpoints of both the variable and code states of a notebook session, allowing users to both undo cell executions and manage branching states containing objects such as machine learning models, plots, and dataframes through a Git-like commit and checkout interface.

Getting Started

Kishu can be installed from PyPI:

pip install kishu jupyterlab_kishu

Note: installing jupyterlab_kishu will also install jupyterlab into your environment.

Using Kishu

Once Kishu has been installed, an additional Kishu tab should appear in JupyterLab's toolbar. This tab will allow access to Kishu's various functionalities:



Step 1: Initializing Kishu on a Notebook

To start protecting your notebook session, Kishu can be initialized and attached through the Kishu > Initialize/Re-attach option under the Kishu tab. Alternatively, you can use the shortcut Ctrl+K then Ctrl+I / ⌘+K then ⌘+I:



Step 2: Run Cells as Normal

Once initialized, you can proceed to execute cells in the session as normal. Kishu will automatically and transparently checkpoint your variable state (imported libraries, loaded dataframes, drawn plots, fitted models, etc.) after each cell execution.



Undoing cell executions

To undo your latest cell execution, you can use the Kishu > Undo Execution option under the Kishu tab:



Undoing cell executions only affects the variable state. The code state (i.e., the cells you write) is untouched. This can be useful, for example, to 'un-drop' a dataframe column dropped by a cell while keeping the cell code itself intact.

Checkpointing and Checking out Notebook States

Kishu can also be used to manage branching code and variable states; it supports making checkpoints of the notebook and variable state at any point during a notebook session, which can be returned to later via a checkout.

Step 1: Committing to make a checkpoint

Kishu can store the current state of your notebook, including both the variable state and your code state, with the Kishu > Commit option under the Kishu tab. Alternatively, you can use the shortcut Ctrl+K then Ctrl+C / ⌘+K then ⌘+C. You will be prompted to enter a commit message:



Step 2: Checkout to a checkpoint

You can return to a commit with the Kishu > Checkout option under the Kishu tab. Alternatively, you can use the shortcut Ctrl+K then Ctrl+V / ⌘+K then ⌘+V. This will bring up a menu for you to select the appropriate commit:



Checking out will replace both the current variable and code state with that of the selected checkpoint. It will also overwrite your current variable and code state; commit to make a (second) checkpoint before checking out if you wish to keep your current notebook state.

Configuring Kishu

Kishu can be configured through editing the ~/.kishu/config.ini file. A full list of configurable options can be found here.

Kishuboard: Interactive GUI for Kishu

Optionally, you can install Kishuboard. Kishuboard is a graphical extension for Kishu. With the interactive GUI provided by KishuBoard, you can browse, compare and search commits, checkout code/kernel variables to previous commits; branch out etc in a straightforward way. For a full list of supported features and dev instructions for Kishuboard can be found here.

To install Kishuboard, run:

pip install kishuboard

And then, launch it with:

kishuboard

Now you should be able to visit it at localhost://4999.

When Kishu is attached to a new notebook, refresh the notebook list. To enter the GUI of a specific notebook, simply click on its entry in the list.

Supported Libraries

This is the current list of libraries, their versions, and their classes supported by Kishu:

- βœ… : supported: All changes to instances of this class are always captured.
- 🟨 : too conservative: Kishu may report changes on non-changes to instances of this class, i.e., false positives.
- ❌ : failing: Some changes to an instance of this class may not be captured.

    βœ… arrow==1.3.0, arrow.arrow.Arrow
    βœ… astropy==5.2.2, astropy.convolution.Box2DKernel
    βœ… astropy==5.2.2, astropy.convolution.Gaussian2DKernel
    βœ… astropy==5.2.2, astropy.io.fits.HDUList
    βœ… astropy==5.2.2, astropy.io.fits.PrimaryHDU
    βœ… astropy==5.2.2, astropy.modeling.fitting.LinearLSQFitter
    βœ… astropy==5.2.2, astropy.modeling.functional_models.Ellipse2D
    βœ… astropy==5.2.2, astropy.modeling.functional_models.Linear1D
    βœ… astropy==5.2.2, astropy.modeling.polynomial.Polynomial1D
    βœ… astropy==5.2.2, astropy.modeling.polynomial.Polynomial2D
    βœ… astropy==5.2.2, astropy.nddata.NDData
    βœ… astropy==5.2.2, astropy.nddata.NDDataRef
    βœ… astropy==5.2.2, astropy.stats.SigmaClip
    βœ… astropy==5.2.2, astropy.table.QTable
    βœ… astropy==5.2.2, astropy.units.Quantity
    βœ… astropy==5.2.2, astropy.visualization.PercentileInterval
    βœ… astropy==5.2.2, astropy.wcs.WCS
    βœ… bokeh==2.4.3, bokeh.plotting._figure.figure
    βœ… catboost==1.2.3, catboost
    βœ… dask==2023.5.0, dask
    βœ… dataprep==0.4.5, dataprep.datasets
    βœ… dataprep==0.4.5, dataprep.eda.intermediate.Intermediate
    βœ… dill==0.3.8, dill
    βœ… gensim==4.3.2, gensim
    βœ… gym==0.26.2, gym
    βœ… ipywidgets==7.8.5, ipywidgets
    βœ… keras==2.13.1, keras.src.initializers.initializers.RandomNormal
    βœ… keras==2.13.1, keras.src.initializers.initializers.RandomUniform
    βœ… keras==2.13.1, keras.src.layers.core.dense.Dense
    βœ… keras==2.13.1, keras.src.optimizers.schedules.learning_rate_schedule.ExponentialDecay
    βœ… lightgbm==4.3.0, lightgbm.basic.Dataset
    βœ… llm==0.13.1, llm.default_plugins.openai_models.Chat
    βœ… lmfit==1.2.2, lmfit.parameter.Parameters
    βœ… matplotlib==3.7.5, matplotlib.colors.ListedColormap
    βœ… matplotlib==3.7.5, matplotlib.dates.AutoDateFormatter
    βœ… matplotlib==3.7.5, matplotlib.dates.WeekdayLocator
    βœ… matplotlib==3.7.5, matplotlib.ticker.AutoLocator
    βœ… networkx==3.1, networkx.classes.digraph.DiGraph
    βœ… networkx==3.1, networkx.classes.graph.Graph
    βœ… nltk==3.8.1, nltk.stem.porter.PorterStemmer
    βœ… numpy==1.24.3, ast
    βœ… numpy==1.24.3, copy
    βœ… numpy==1.24.3, datetime.time
    βœ… numpy==1.24.3, datetime.timedelta
    βœ… numpy==1.24.3, hashlib
    βœ… numpy==1.24.3, itertools
    βœ… numpy==1.24.3, json
    βœ… numpy==1.24.3, numpy.ndarray
    βœ… numpy==1.24.3, numpy.ndarray
    βœ… numpy==1.24.3, pickle
    βœ… numpy==1.24.3, random.Random
    βœ… numpy==1.24.3, re.Pattern
    βœ… numpy==1.24.3, urllib.request.Request
    βœ… numpy==1.24.3, uuid.UUID
    βœ… opencv-python==4.9.0.80, cv2
    βœ… optuna==3.5.0, optuna.Study
    βœ… pandas==1.5.3, pandas.DataFrame
    βœ… pandas==1.5.3, pandas.Series
    βœ… pathlib==1.0.1, pathlib.PosixPath
    βœ… photoutils==0.0.1, photutils.psf.matching.CosineBellWindow
    βœ… photoutils==0.0.1, photutils.psf.matching.HanningWindow
    βœ… photoutils==0.0.1, photutils.utils.CutoutImage
    βœ… photoutils==0.0.1, photutils.utils.ImageDepth
    βœ… plotly==5.18.0, plotly.express
    βœ… plotly==5.18.0, plotly.figure_factory
    βœ… plotly==5.18.0, plotly.graph_objects
    βœ… plotly==5.18.0, plotly.graph_objs
    βœ… plotly==5.18.0, plotly.io
    βœ… plotly==5.18.0, plotly.offline
    βœ… plotly==5.18.0, plotly.subplots
    βœ… polars==0.14.29, polars.DataFrame
    βœ… prophet==1.1.5, prophet.Prophet
    βœ… pyspark==3.5.1, pyspark.sql
    βœ… qiskit==0.45.0, qiskit.QuantumCircuit
    βœ… scikit-image==0.21.0, skimage
    βœ… scikit-image==0.21.0, skimage.morphology
    βœ… scikit-learn==1.3.2, sklearn.cluster
    βœ… scikit-learn==1.3.2, sklearn.cluster
    βœ… scikit-learn==1.3.2, sklearn.compose
    βœ… scikit-learn==1.3.2, sklearn.datasets
    βœ… scikit-learn==1.3.2, sklearn.datasets
    βœ… scikit-learn==1.3.2, sklearn.decomposition
    βœ… scikit-learn==1.3.2, sklearn.discriminant_analysis
    βœ… scikit-learn==1.3.2, sklearn.dummy
    βœ… scikit-learn==1.3.2, sklearn.ensemble
    βœ… scikit-learn==1.3.2, sklearn.feature_extraction.text
    βœ… scikit-learn==1.3.2, sklearn.feature_selection
    βœ… scikit-learn==1.3.2, sklearn.impute
    βœ… scikit-learn==1.3.2, sklearn.impute
    βœ… scikit-learn==1.3.2, sklearn.impute
    βœ… scikit-learn==1.3.2, sklearn.kernel_ridge
    βœ… scikit-learn==1.3.2, sklearn.linear_model
    βœ… scikit-learn==1.3.2, sklearn.linear_model
    βœ… scikit-learn==1.3.2, sklearn.manifold
    βœ… scikit-learn==1.3.2, sklearn.metrics
    βœ… scikit-learn==1.3.2, sklearn.metrics.pairwise
    βœ… scikit-learn==1.3.2, sklearn.mixture
    βœ… scikit-learn==1.3.2, sklearn.model_selection
    βœ… scikit-learn==1.3.2, sklearn.multiclass
    βœ… scikit-learn==1.3.2, sklearn.naive_bayes
    βœ… scikit-learn==1.3.2, sklearn.neighbors
    βœ… scikit-learn==1.3.2, sklearn.neural_network
    βœ… scikit-learn==1.3.2, sklearn.pipeline
    βœ… scikit-learn==1.3.2, sklearn.preprocessing
    βœ… scikit-learn==1.3.2, sklearn.random_projection
    βœ… scikit-learn==1.3.2, sklearn.svm
    βœ… scikit-learn==1.3.2, sklearn.tree
    βœ… scikit-learn==1.3.2, sklearn.utils
    βœ… scipy==1.10.1, scipy.interpolate
    βœ… scipy==1.10.1, scipy.ndimage
    βœ… scipy==1.10.1, scipy.ndimage.interpolate
    βœ… scipy==1.10.1, scipy.optimize
    βœ… scipy==1.10.1, scipy.signal
    βœ… scipy==1.10.1, scipy.signal.windows
    βœ… scipy==1.10.1, scipy.sparse
    βœ… scipy==1.10.1, scipy.spatial
    βœ… scipy==1.10.1, scipy.spatial
    βœ… scipy==1.10.1, scipy.spatial.distance
    βœ… scipy==1.10.1, scipy.spatial.distance._hausdorff
    βœ… scipy==1.10.1, scipy.special
    βœ… scipy==1.10.1, scipy.stats
    βœ… statsmodels==0.14.1, statsmodels.api
    βœ… tensorflow==2.13.1, tensorflow
    βœ… tensorflow==2.13.1, tensorflow.keras.models
    βœ… tensorflow==2.13.1, tensorflow.keras.optimizers
    βœ… textblob==0.17.1, textblob.TextBlob
    βœ… torch==2.4.1, torch
    βœ… torch==2.4.1, torch.nn
    βœ… torch==2.4.1, torch.nn.functional
    βœ… torch==2.4.1, torch.utils.data
    βœ… transformers==4.38.2, huggingface
    βœ… transformers==4.38.2, transformers
    βœ… typing==3.7.4.3, typing
    βœ… wordcloud==1.9.3, wordcloud.WordCloud
    🟨 matplotlib==3.7.5, matplotlib.Axes
    🟨 matplotlib==3.7.5, matplotlib.Axes
    🟨 seaborn==0.13.0, seaborn
    🟨 torch==2.4.1, torch.optim
    🟨 polars==0.14.29, polars.LazyFrame
    🟨 matplotlib==3.7.5, matplotlib.colors.BoundaryNorm
    🟨 matplotlib==3.7.5, matplotlib.lines.Line2D
    🟨 matplotlib==3.7.5, matplotlib.patches.Ellipse
    🟨 matplotlib==3.7.5, matplotlib.patches.Arrow
    🟨 matplotlib==3.7.5, matplotlib.image.AxesImage
    🟨 matplotlib==3.7.5, matplotlib.image.FigureImage
    🟨 matplotlib==3.7.5, matplotlib.offsetbox.AnchoredOffsetbox
    🟨 astropy==5.2.2, astropy.visualization.mpl_normalize.ImageNormalize
    🟨 astropy==5.2.2, astropy.wcs.Celprm
    🟨 matplotlib==3.7.5, 'mpl_toolkits.mplot3d.art3d.Line3DCollection

Limitations

Kishu may fail to correctly checkpoint notebook sessions containing the following items:

Silent Pickling Errors

Kishu relies on the assumption that any object, when pickled then unpickled, is identical to the original object, and does not automatically detect cases where this assumption is violated (i.e., silent pickling errors). This is typically caused by errors in the object class' reduce function which acts as its pickling instructions; for example, an object with the below reduction (incorrectly) returns an empty (byte)string when pickled.

  def __reduce__(self):
      return ""

As a potential workaround, you can add object classes with incorrect reductions to a blocklist in Kishu's config to inform it to never try to store (and always recompute) objects belonging to these classes.

Non-Deterministic and Unpicklable Objects

Kishu relies on cell replay to reconstruct unpicklable objects (e.g., generators). However, if the unpicklable object itself is created through non-deterministic means, Kishu will fail to exactly recreate it on undo/checkout, for example (assuming the seed for random was not set):

  nondet_gen = (i for i in range(random.randint(5, 10)))

FAQ

Q1: I am getting a Kernel for the notebook not found error when initializing Kishu on a new notebook file. What is going on?



A1: This happens if there is no running kernel for the current notebook file. Click the Kernel icon to start a new notebook kernel, then proceed with initializing Kishu normally.



Learn More

Kishu's efficiency is enabled by its low-overhead session state monitoring, deduplicated variable storage, and optimized recomputation-assisted checkout. Our papers on Kishu can be found here; don't forget to star our repository and cite our papers if you like our work!

@article{li2024kishu,
  title={Kishu: Time-Traveling for Computational Notebooks},
  author={Li, Zhaoheng and Chockchowwat, Supawit and Sahu, Ribhav and Sheth, Areet and Park, Yongjoo},
  journal={Proceedings of the VLDB Endowment},
  volume={18},
  number={4},
  pages={970 - 985},
  year={2024},
  doi={10.14778/3717755.3717759},
  publisher={VLDB Endowment},
}

@inproceedings{fang2025enhancing,
  title={Enhancing Computational Notebooks with Code+Data Space Versioning},
  author={Fang, Hanxi and Chockchowwat, Supawit and Sundaram, Hari and Park, Yongjoo},
  booktitle={CHI Conference on Human Factors in Computing Systems (Chi '25)},
  year={2025},
  doi={doi.org/10.1145/3706598.3714141}
}

@article{li2023elasticnotebook,
  title={ElasticNotebook: Enabling Live Migration for Computational Notebooks},
  author={Li, Zhaoheng and Gor, Pranav and Prabhu, Rahul and Yu, Hui and Mao, Yuzhou and Park, Yongjoo},
  journal={Proceedings of the VLDB Endowment},
  volume={17},
  number={2},
  pages={119--133},
  year={2023},
  doi={10.14778/3626292.3626296},
  publisher={VLDB Endowment}
}

@inproceedings{fang2025large,
  title={Large-scale Evaluation of Notebook Checkpointing with AI Agents},
  author={Fang, Hanxi and Chockchowwat, Supawit and Sundaram, Hari and Park, Yongjoo},
  booktitle={Late-breaking work in CHI Conference on Human Factors in Computing Systems (Chi '25)},
  year={2025}
}

@inproceedings{chockchowwat2023transactional,
  title={Transactional python for durable machine learning: Vision, challenges, and feasibility},
  author={Chockchowwat, Supawit and Li, Zhaoheng and Park, Yongjoo},
  booktitle={Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning},
  pages={1--5},
  year={2023},
  doi={10.1145/3595360.3595855}
}

@inproceedings{li2024demonstration,
  title={Demonstration of ElasticNotebook: Migrating Live Computational Notebook States},
  author={Li, Zhaoheng and Chockchowwat, Supawit and Fang, Hanxi and Sahu, Ribhav and Thakurdesai, Sumay and Pridaphatrakun, Kantanat and Park, Yongjoo},
  booktitle={Companion of the 2024 International Conference on Management of Data},
  pages={540--543},
  year={2024},
  doi={10.1145/3626246.3654752}
}

Contributing

To get started with developing Kishu, see CONTRIBUTING.md.

0