8000 Add Runtime Dataflow Viewer by saulshanabrook · Pull Request #1023 · vega/editor · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add Runtime Dataflow Viewer #1023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 183 commits into from
Oct 23, 2021
Merged

Add Runtime Dataflow Viewer #1023

merged 183 commits into from
Oct 23, 2021

Conversation

saulshanabrook
Copy link
Contributor
@saulshanabrook saulshanabrook commented May 27, 2021

This pull request adds a visual viewer/debugger for the runtime dataflow.

Screen Shot 2021-08-29 at 9 26 35 PM

Motivation

Vega (Lite) provides a wonderful declarative mechanism for specifying visualizations, but by default, it requires all data to be loaded in memory to the client's browsers. It is often impractical or impossible to let Vega handle all of the data transforms and instead we wish to "push down" the queries that Vega visualization needs to some other data system. This could be another system in your browser, like Arquero, or a remote database.

Previously, I have achieved this by transforming the Vega spec itself, extracting out the existing transforms, and replacing them with a combined transform that executes on a remote database (ibis-vega-transform). This approach was able to move some interactive visualizations to SQL, but it was tied to Python. When we wanted to explore a strictly client-side version of this, that could work without a Python kernel, @domoritz from the Vega team recommended that we look into operating at the Vega runtime dataflow level, instead of the vega spec level, in order to more accurately capture all of the data transformations.

In order to move forward with this, I wanted to have a better understanding of how this runtime dataflow operated. I also wanted to be able to debug how each node in it functioned, which is vital to being able to modify the graph and the nodes.

There are also a number of related issues other have opened on debugging Vega (Lite): vega/vega-lite#4134 vega/vega#407 vega/vega#1879

Background

There was an existing article, by @jheer, called "How Vega Works" that showed a visual representation of the dataflow and let you see how different "pulses" of the graph executed it. @chengluyu then took that code and created an interactive vega-inspector to add more information and support more node types.

They helped for smaller graphs, but I knew that I needed to be able to inspect graphs like the "Interactive Layered Crossfilter" example, which became very hard to read and interact with, using those tools.

Features

So in this pull request, I have added a visual runtime dataflow viewer. In this video look at the data pipeline of the Interactive Layered Crossfilter example, by looking at what changes as you interact with the diagram and what nodes are executed:

Untitled.mp4

It currently includes:

  • Zooming, with mouse wheel
  • Selecting a node or edge to filter to the related nodes (ancestors and children)
  • Selecting a pulse to filter to those nodes touched in that pulse
  • Hovering over a node, to see the parameters used to instantiate it
  • If a pulse is selected, on hover the current value of the node will be shown as well
  • Filtering by type of node. We add nodes to the graph for all operators in the graph, as well as the updates, bindings, streams, and data.
  • Background processing of the node layout in a webworker, to allow for continued interaction
  • Caching of layouts to speed up switching to an existing one

Details

To render the graph, I use Cytoscape JS, which is a popular canvas based graph renderer. To layout the nodes, I used the Eclipse Layout Kernel. Originally, I used the Cytoscape ELK layout plugin, but switched to using elkjs directly, in order to have greater control of the layout timing and caching.

This PR depends on a corresponding PR in the Vega main repo to add typing for the runtime: vega/vega#3237, which it uses to properly type the function which turn the runtime into a graph.

The state management is bit complex in this code, unfortunately, primarily due to the need to interface with an async layout engine (ELK) and an imperative view layer (Cytoscape). I have gone through a number of different iterations on how to synchronize all of this properly (component state, React's useReducer, Cytoscape state), and have currently settled on moving as much state to Redux as possible, since the application is already using Redux for the rest of its state.

I tried not to disturb any of the existing application code, but I did create all of the necceary state manage 8000 ment code for this viewer in a "feature" subfolder, instead of following the existing pattern in the code base of keeping all reducers in the same file. I did this because this functionality is tightly coupled and splitting it off into its own folder made it much easier to iterate on and add to gradually. I tried to follow Redux best practices and pulled in the Redux toolkit to help implement those.

Future work

I have found this debuger useful to get a better grasp of the vega runtime dataflow, through particular examples, but there are a number of areas for follow up work. Since this PR is already quite large (too large?), I hope that any additional features could be added afterword. A few I have collected are:

  • Add more nuanced time profiling for each pulse, to understand how much time is spent on each node. We could then size the nodes by time.
  • On any action that is about to selection, on hover grey out all the nodes that wouldn't be selected. This would be on nodes themselves, on pulses, and on types.
  • Try filtering out axis and legends to reduce graph size
  • Improve styles of side panel, to make them more consistent and inline with application
  • Only record pulses when dataflow panel is open, to reduce memory consumption and CPU usage normally
  • Move node parameter details and values to side panel from popup
  • Auto select first pulse when loading, to show those values by default
  • When selecting a pulse, also show streams that caused that pulse to run
  • Add details to the graph to show semantics of nested nodes better, by showing what the special parent signal is for and the root node.

TODO

  • Consolidate tooltip libraries, react-tooltip and tippy
  • Move all deps to ^
  • Try removing web-worker dep
  • Try animating node positions
  • Remove tooltip when clicking on node
  • Add unselect button for node selection
  • clarify existing clear button for pulses
  • Fix clear button not causing re-layout when no pulse is selected
  • Fix scrolling on long list of pulses
  • Make element selection darker
  • Set selected nodes on relayout
  • Fix clicking on background to unselect when selected node not visible
  • Improve layout
    • move unconnected nodes out of middle
    • add more padding between nodes
    • Try adding more vertical alignment, possibly by aligning all render nodes

chengluyu and others added 30 commits September 20, 2019 10:33
Now if you hover any node in the scene graph tree, the corresponding element will be highlighted.
@domoritz
Copy link
Member

You can click on a row in the table, and that selects a pulse. I should make it more clear somehow what it's doing! Let me know if you have suggestions.

Oh, I got confused since my chart doesn't have pulses so the table is empty. You should hide the table when it has no rows.