8000 GitHub - kelvinguu/neural-editor at readme
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

kelvinguu/neural-editor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural editor

Source code accompanying our paper, "Generating Sentences by Editing Prototypes" (paper, slides).

Authors: Kelvin Guu*, Tatsunori B. Hashimoto*, Yonatan Oren, Percy Liang (* equal contribution)

NOTES:

  • The instructions below are still a work in progress.
    • If you encounter any problems, please open a GitHub issue, or submit a pull request if you know the fix!
  • This code requires data directories that we have not uploaded yet.
  • This is research code meant to serve as a reference implementation. We do not recommend heavily extending or modifying this codebase for other purposes.

If you have questions, please email Kelvin at guu.kelvin at gmail.com.

Related resources

Datasets

Each line of each TSV file is a (prototype, revision) edit pair, separated by a tab.

Setup

  1. Install Docker. If you want to use GPUs, also install nvidia-docker.

  2. Download the repository and necessary data.

    DATA_DIR=$HOME/neural-editor-data
    REPO_DIR=$HOME/neural-editor
    
    # Download repository
    git clone https://github.com/kelvinguu/neural-editor.git $REPO_DIR
    
    # Set up data directory
    mkdir -p $DATA_DIR
    cd $DATA_DIR
    
    # Download word vectors
    wget http://nlp.stanford.edu/data/glove.6B.zip  # GloVe vectors
    unzip glove.6B.zip -d word_vectors
    
    # Download expanded set of word vectors
    cd word_vectors
    wget https://worksheets.codalab.org/rest/bundles/0xa57f59ab786a4df2b86344378c17613b/contents/blob/ -O glove.6B.300d_yelp.txt
    # TODO: do the same for glove.6B.300d_onebil.txt
    cd ..
    
    # Download datasets into data directory
    wget https://worksheets.codalab.org/rest/bundles/0x99d0557925b34dae851372841f206b8a/contents/blob/ -O yelp_dataset_large_split.tar.gz
    mkdir yelp_dataset_large_split
    tar xvf yelp_dataset_large_split.tar.gz -C yelp_dataset_large_split
    # TODO: do the same for one_billion_split
    
    # TODO: install NLTK
    
    # our code uses this variable to locate the data
    export TEXTMORPH_DATA=$DATA_DIR

Quick Start

Before you begin, be sure to set the TEXTMORPH_DATA environment variable (see "Setup" above).

Start a Docker container:

$ python run_docker.py --root --gpu $CUDA_VISIBLE_DEVICES  # enter Docker
  • run_docker.py pulls the latest version of our Docker image (kelvinguu/textmorph:1.2) and then starts an interactive Docker container.
    • --root: flag to run as root inside the container (optional)
    • --gpu $CUDA_VISIBLE_DEVICES: if you do not have a GPU, you can skip this argument.
  • Inside the container, $DATA_DIR is mounted at /data and $REPO_DIR is mounted at /code.
  • The current working directory will be /code.

Once you are inside the container, start a training run:

$ python textmorph/edit_model/main.py configs/edit_model/edit_baseline.txt
  • textmorph/edit_model/main.py is the main script for training an edit model.
  • It takes a config file as input: configs/edit_model/edit_baseline.txt
  • The script will dump checkpoints into $DATA_DIR/edit_runs
    • inside edit_runs, each training run is assigned its own folder.
    • The folders are numbered 0, 1, 2, ...
  • main.py will complain if the Git working tree is dirty (because it logs the current commit as a record of the code's current state)
    • to override this, pass --check_commit disable

About

Repository for "Generating Sentences by Editing Prototypes"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0