Inconsistencies, rolling back edits, and keeping track of the document's global state #333

cpitclaudel · 2018-04-12T20:30:15Z

Hi,

I'm looking at the kernel development docs to get a sense of how hard it would be to support proof assistants like Coq, Lean, or F*.

One feature that these languages have in common (unlike, say, Python or F#), is that they are able to roll-back changes to the environment. That is, the typical way in which you interact with a Coq or F* document is that you send small chunks of the document, one by one, to the theorem prover / language server. Every time you edit a previously-processed chunk, the editor sends a message to the prover to tell it to rollback that particular section of the document, ensuring that the prover's internal state is always in sync with the contents of the document.

In other words, if I write let x = 1 and send that to F*, then change my mind and rename x to y, then the old binding for x does not persist (I only get y).

This seems at odds with Jupyter's model, in which processing a definition x = 1 in Python, then renaming x to y and resending the definition (now y = 1) causes two bindings to coexist (x and y). That model can lead to inconsistencies, of course, where I rename a function after processing other definitions that depend on it and forget to update the other definitions. This will work until I restart jupyter, at which point the incorrect definitions will stop working.

This shadowing without rolling back is not a big problem in Python, but in C++ for example it is an issue: I tried running the cling-zeus kernel advertised at https://blog.jupyter.org/interactive-workflows-for-c-with-jupyter-fe9b54227d92, for example, and first wrote void test() { return; } before changing it to int test() { return 1; }. That change was rejected with this following message:

input_line_8:1:5: error: functions that differ only in their return type cannot be overloaded
int test() {
~~~ ^
input_line_7:1:6: note: previous definition is here
void test() {
~~~~ ^

So, the TLDR:

Can kernel request that code cells be sent in order? That is, can I write a kernel that requires UI to execute cells 0, 1, and 2 before executing cell 3? The closest I've found is https://github.com/stitchfix/nodebook (linked from https://www.reddit.com/r/datascience/comments/7grh1j/when_should_jupyter_notebooks_be_avoided/dqlakqy/)
Can kernels know where in the document the cell that's being executed is? Or do they only see the contents of the cell?
Does the protocol give kernels a way to get notifications when a previously-executed cell is edited? For example, if I execute cells 1, 2, and 3, then edit cell 2, can the kernel subscribe to a notification saying "cell 2 was modified"?

Thanks for Jupyter, btw — it's great :)

The text was updated successfully, but these errors were encountered:

Carreau · 2018-04-12T22:03:36Z

Hello @cpitclaudel for your question and interest.

Your analysis of what Jupyter currently does is spot one, and feedback like this one is typically what we are looking for to improve the various protocol. It is true that the current protocol have some limitations – I don't think the (zmq) protocol be updated to completely cover your use case, to keep it simple – but I think intermediate layer and API may allow what you are looking for.

I'll try to details various points:

We try to keep the kernel as unaware as possible of the document being executed. You can have several client connected to the same kernel; So 1 document/ may not make sens. Thus :

Can kernel request that code cells be sent in order?

No, (at least not directly)

Can kernels know where in the document the cell that's being executed is?

No (at least not directly)

Does the protocol give kernels a way to get notifications when a previously-executed cell is edited?

And no. (at least not directly)

While the above answer may be disappointing, and know that these assumption are (probably) not going to change, there is a potential change that can happen for the notebook interface.

Currently the notebook interface stores the all notebook in the frontend; and all execution logic happens in the frontend. With Real-Time collaboration being one of our goal, there is some desire to have a server-side model. With a server-side model, a finer-grain execution model is likely possible, with custom logic to invalidate further cells if earlier are modified and/or decide to send extra information to the kernel/execution engine.

The speed and chance of this being integrated in Jupyter depends of course of interest, funding and contribution we get; You already saw nodebook and I may suggest to look at https://github.com/dataflownb/dfkernel. Both of which have a JupyterCon talk in August.

These kind of question are coming more and more regularly; So there will likely be a possibility to have more formal work on it at some point.

What I can suggest is to monkey-patch the execute request on the classic notebook frontend; it is not too difficult, and should allow to send the full notebook (as long with the cursor position) at each request, then you can extract the information you need in the kernel or a kernel provide and provide the information you need.
You also attach arbitrary metadata to cells; so attaching a random id to cells at cell creation (and cut/copy/past? event) may help. This is likely hard to to by default because of the amount of data that would be transiting each time and why partial synchronisation with server-side model would be better suited.

I don't have enough knowledge about Coq/Lean/F* to exactly know what would be needed but would be happy to give extra hints and get more feedback.

Hope that helps a bit.

cpitclaudel · 2018-04-12T22:59:11Z

Hi @Carreau,

Thanks a lot for the detailed answer. It's extremely useful.

With a server-side model, a finer-grain execution model is likely possible, with custom logic to invalidate further cells if earlier are modified and/or decide to send extra information to the kernel/execution engine.

This sounds very nice, indeed.

I may suggest to look at https 8000 ://github.com/dataflownb/dfkernel.

That link is very useful too, thanks. The type of dependency tracking offered there sounds very similar to what I need: in F* a simple approach would be to have a dependency from each cell to the next, but Coq is smarter about dependencies and a Coq kernel would be able to inform the UI about the dependencies of each snippet. Interestingly, I think the focus there would not just be on input-output dependencies ("cell A's result depends on the output of cell B"), but also on code dependencies ("cell A defines a constant a used by cell B"). The latter type of dependency tracking likely makes more sense for fairly static languages, where such dependencies are easy to compute (the video presentation of dataflownb mentions global variable inconsistencies in the 'current ipython behavior' part, but I don't think it offers a solution for it).

What does dfkernel do if one of the dependencies contains an error? (I'll experiment with it as soon as I manage to run it, but I haven't been able to yet).

What I can suggest is to monkey-patch the execute request on the classic notebook frontend; it is not too difficult, and should allow to send the full notebook (as long with the cursor position) at each request, then you can extract the information you need in the kernel or a kernel provide and provide the information you need.

That's an interesting approach, thanks. We've had some versions of this debate (smart server + thin client vs smart client) in the proof assistants world for a long time: the current Coq model is that users specify how far to execute, with document segments being processed asynchronously along edges of the dependency graph, but in Isabelle (and Coq with Coqoon/PIDE), Lean, and Dafny the whole document is resent to the server upon edits (and the server keeps a global view of the document and sends updates about new segments). That latter approach sounds similar to what you're describing.

One difference, though, is that Jupyter notebooks have a natural notion of code fragments, so even when sending the whole document to the server one would likely still keep the cell structure.

You also attach arbitrary metadata to cells; so attaching a random id to cells at cell creation (and cut/copy/past? event) may help.

With your previous suggestion, I think I might not even need to attach metadata to the cells; it may be sufficient to send the whole chain of previous cells along with each execute request. The kernel side can diff that chain with previous chains, roll back up to the most recent ancestor, and start re-executing from there.

I might have a look into implementing this for F*. Thanks a lot for your help!

Hope that helps a bit.

More than a bit! :) Thanks again.

westurner · 2018-09-03T08:24:30Z

IIUC, something like 'Restart and Run All [up to here]' by default would solve.

Interestingly, I think the focus there would not just be on input-output dependencies ("cell A's result depends on the output of cell B"), but also on code dependencies ("cell A defines a constant a used by cell B").

Source order evaluation is not a good assumption (on this run of the notebook)?

The ability to specify (and infer) dependencies between cells or code fragments / expressions / variables therein could make it easier to model (relevant) premises of an inductive/deductive argument. ("#StructuredPremises")

There are often irrelevant bits.

cpitclaudel · 2018-09-03T22:41:22Z

IIUC, something like 'Restart and Run All [up to here]' by default would solve.

I don't think so; proof assistants like Coq and F* are typically rather slow, so reprocessing everything is very costly; that's why they offer this stateful interface.

Source order evaluation is not a good assumption (on this run of the notebook)?

Source order works, of course, but it's not optimal; Coq in particular is able to process multiple fragments in parallel, along the dependency graph.

westurner · 2018-09-04T01:10:26Z

I don't think so; proof assistants like Coq and F* are typically rather slow, so reprocessing everything is very costly; that's why they offer this stateful interface.

Got it.

I assume you've already read the Jupyter kernel docs?

https://jupyter-client.readthedocs.io/en/latest/kernels.html

https://jupyter-client.readthedocs.io/en/latest/messaging.html

Specifically, what additional messages are needed?

Source order works, of course, but it's not optimal; Coq in particular is able to process multiple fragments in parallel, along the dependency graph.

FWIW, Dask and TensorBoard have some decent CFG visualizations.

While there's now an indicator that a cell has been modified since it's last output, it's still possible to save a nb that won't have the same outputs when restarted and run again.

I took a look at jscoq + hott (rhino-hott) the other day and was impressed. The IJavascript kernel would also still need additional kernel messages AFAIU.

cpitclaudel · 2018-09-10T22:54:07Z

I assume you've already read the Jupyter kernel docs?

I have :) Or more precisely, I did when I posted the original issue. I don't remember everything right now :)

Specifically, what additional messages are needed?

Roughly, what's needed is some way for the prover to understand the current state of the document.
There are many ways to achieve this: sending all cells instead of the latest one (letting the prover compute the difference), or giving an identifier to each cell and including the id of the previous cell when sending each new cell.

UIs would also need to be able to display the status of a cell (sent or unsent); and, if a user try to send a cell, the UI should amke sure to send all preceding unsent cells, too.

I took a look at jscoq + hott (rhino-hott) the other day and was impressed. The IJavascript kernel would also still need additional kernel messages AFAIU.

You may be curious to try https://people.csail.mit.edu/cpitcla/fstar.js/stlc.html , too :)

SylvainCorlay · 2019-04-04T19:20:07Z

@cpitclaudel you may be interested in the Jupyter Community Workshop on building upon the Jupyter Kernel Protocol:

https://blog.jupyter.org/jupyter-community-workshop-building-upon-the-jupyter-kernel-protocol-2409409ed7ec

cpitclaudel · 2019-04-04T19:52:27Z

Ah, very exciting! I think it'll be tricky for me to be in Paris at that time, though :/

ColmBhandal · 2022-04-25T17:48:27Z

Briefly skimmed across this issue and didn't see any mention of coq_jupyter, which seems to have arisen after the last comment on this issue (based on the commit history). So thought I'd mention it here. Maybe it's relevant.

westurner · 2022-04-26T19:23:40Z

https://github.com/EugeneLoy/coq_jupyter :

Backtracking
There are number of convenience improvements over standard Jupyter notebook behaviour that are implemented to support Coq-specific use cases.

By default, running cell will rollback any code that was executed in that cell before. If needed, this can be disabled on a per-cell basis (using Auto rollback checkbox).

Manual cell rollback is also available using Rollback cell button (at the bottom of executed cell) or shortcut (Ctrl+Backspace).

ENH: ipykernel, : similar functionality
- https://django-reversion.readthedocs.io/en/stable/api.html#django-reversion-api
- https://pyrsistent.readthedocs.io/en/latest/api.html#pyrsistent.PMap.evolver
  - PMap doesn't do rollback or git diff dict() dict(), fwiu
- Considerable slowdown in Firefox once notebook gets a bit larger jupyterlab/jupyterlab#1639 (comment)
  
  jupyterlab/rtc synchronizes with a y.js CRDT for the one notebook but not the whole IDE?
- IIRC the effective per-cell locals and globals are set by ipykernel

ivanov added the enhancement label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistencies, rolling back edits, and keeping track of the document's global state #333

Inconsistencies, rolling back edits, and keeping track of the document's global state #333

Inconsistencies, rolling back edits, and keeping track of the document's global state #333

Inconsistencies, rolling back edits, and keeping track of the document's global state #333

Comments