LINX

This repository contains the source code and experiments used to evaluate LINX, a framework for autogenerating goal-oriented exploration notebook using a Natural Language Interface. The repository is free for use for academic purposes. Please contact the repository owners before usage.

The Goal-oriented ADE problem: Generate an exploration session given a user's dataset and analysis goal

One of the most effective methods for facilitating the process of exploring a dataset is to examine pre-existing data exploration notebooks that contain curated exploration sessions that demonstrate interesting hypotheses and conjectures on the data. While numerous Automated Data Exploration (ADE) systems have been devised in previous work, they focus on assiting users explore a new dataset, showing generally interesting patterns that are ineffective for goal-oriented exploration where the users need to answer specific questions about the data.

LIN is a generative system augmented with a natural language interface for goal-oriented ADE. It takes as input a dataset an analysis goal in natural language, then generates a personalized exploration session that is relevant to the user's goal.

LINX has two main compoenets: (1) An LLM-based solution the interprets the analysis goal and derive a set of specification for the output exploration session, (2) A modular ADE engine based on Constrained Deep Reinforcement Learning (CDRL), which consider the specifications in its optimization process, thus yielding high-utility sessions that are relevant to the analysis goal.

Source Code

The source code is located here (LINX/src)
Under this directory, there are two folders:

LLM-Based Solution for Deriving Specifications: contains all the source code of the LLM component for deriving exploration specifications given a user's goal and dataset.
CDRL ADE Engine: contains all the source code of the CDRL engine that generating the personalized exploration notebooks. For installation guide, running instructions and further details please refer to the documentation under the source code directory in the link above.

Documentation

LINX Technical Report - Including a comprehensive appendix with complete technical details.
LDX Language Guide
- Full details and various examples for our specification language for data exploration.

Experiment Datasets

The datasets used in our empirical evaluation are located here.
LINX is tested on 3 different datasets:

Netflix Movies and TV-Shows: list of Netflix titles, each title is described using 11 features such as the country of production, duration/num. of seasons, etc.
Flight-delays: Each record describes a domestic US flight, using 12 attributes such as origin/destination airport, flight duration, issuing airline, departure delay times, delay reasons, etc.
Google Play Store Apps: A collection of mobile apps available on the Google Play Store. Each app is described using 11 features, such as name, category, price, num. of installs, reviews, etc.

Goal-oriented ADE Benchamrk

We provide a new benchmark dataset for the task of goal-oriented ADE, available here.
The benchmark contains 182 instances of analytical goals and ocrresponding exploration specifications. This folder includes also a dedicated notebook for evaluating models on this benchmark, located here, and as well the evaluation results of our models.

Additional Experiments

This folder contains information about:

User Study - The exploration notebooks generated by either LINX and the baselines are located here.
In the given link you can find the exploratory sessions that were presented to each participant of the user study. The directory structure is as: <Dataset>/<Task>/<Baseline>.ipynb (the identity of the baseline was not disclosed to the participants). For the ChatGPT-based notebooks, we also provide the prompt and raw output.
Convergence Test - The convergence and running times of our CDRL engine located here.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
additional_experiments		additional_experiments
datasets		datasets
documentation		documentation
nl2ldx_benchmark		nl2ldx_benchmark
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LINX

The Goal-oriented ADE problem: Generate an exploration session given a user's dataset and analysis goal

Source Code

Documentation

Experiment Datasets

Goal-oriented ADE Benchamrk

Additional Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

analysis-bots/LINX

Folders and files

Latest commit

History

Repository files navigation

LINX

The Goal-oriented ADE problem: Generate an exploration session given a user's dataset and analysis goal

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages