8000 Error Running make_datafiles.sh · Issue #13 · allenai/scitldr · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Error Running make_datafiles.sh #13
Open
@Yatin97hrc

Description

@Yatin97hrc

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Running
cd SciTLDR-Data
export TASK=SciTLDR-A
chmod +x make_datafiles.sh
./make_datafiles.sh # BPE preprocess

Error
`usage: to_stories.py [-h] [--mapping_dir MAPPING_DIR] [--out_dir OUT_DIR]
[--num_cores NUM_CORES]
data_dir
to_stories.py: error: the following arguments are required: data_dir
usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR]
[--finished_files_dir FINISHED_FILES_DIR]
make_datafiles.py: error: argument --finished_files_dir: expected one argument
--2021-03-28 10:36:22-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘encoder.json’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘vocab.bpe’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt
Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ...
Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected.
HTTP request sent, awaiting response... 304 Not Modified
File ‘dict.txt’ not modified on server. Omitting download.

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/val.target'
usage: fairseq-preprocess [-h] [--no-progress-bar]
[--log-interval LOG_INTERVAL]
[--log-format {json,none,simple,tqdm}]
[--tensorboard-logdir TENSORBOARD_LOGDIR]
[--wandb-project WANDB_PROJECT] [--azureml-logging]
[--seed SEED] [--cpu] [--tpu] [--bf16]
[--memory-efficient-bf16] [--fp16]
[--memory-efficient-fp16] [--fp16-no-flatten-grads]
[--fp16-init-scale FP16_INIT_SCALE]
[--fp16-scale-window FP16_SCALE_WINDOW]
[--fp16-scale-tolerance FP16_SCALE_TOLERANCE]
[--min-loss-scale MIN_LOSS_SCALE]
[--threshold-loss-scale THRESHOLD_LOSS_SCALE]
[--user-dir USER_DIR]
[--empty-cache-freq EMPTY_CACHE_FREQ]
[--all-gather-list-size ALL_GATHER_LIST_SIZE]
[--model-parallel-size MODEL_PARALLEL_SIZE]
[--quantization-config-path QUANTIZATION_CONFIG_PATH]
[--profile] [--reset-logging] [--suppress-crashes]
[--use-plasma-view] [--plasma-path PLASMA_PATH]
[--criterion {sentence_ranking,wav2vec,model,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,legacy_masked_lm_loss,nat_loss,ctc,label_smoothed_cross_entropy_with_alignment,cross_entropy,sentence_prediction,composite_loss,masked_lm,adaptive_loss,vocab_parallel_cross_entropy}]
[--tokenizer {space,moses,nltk}]
[--bpe {bytes,gpt2,byte_bpe,sentencepiece,bert,characters,hf_byte_bpe,fastbpe,subword_nmt}]
[--simul-type {hard_aligned,infinite_lookback,waitk,waitk_fixed_pre_decision,hard_aligned_fixed_pre_decision,infinite_lookback_fixed_pre_decision}]
[--optimizer {nag,adafactor,adam,composite,adagrad,adamax,sgd,cpu_adam,lamb,adadelta}]
[--lr-scheduler {cosine,pass_through,polynomial_decay,reduce_lr_on_plateau,inverse_sqrt,tri_stage,triangular,manual,fixed}]
[--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK]
[-s SRC] [-t TARGET] [--trainpref FP]
[--validpref FP] [--testpref FP] [--align-suffix FP]
[--destdir DIR] [--thresholdtgt N]
[--thresholdsrc N] [--tgtdict FP] [--srcdict FP]
[--nwordstgt N] [--nwordssrc N] [--alignfile ALIGN]
[--dataset-impl FORMAT] [--joined-dictionary]
[--only-source] [--padding-factor N] [--workers N]
fairseq-preprocess: error: argument --destdir: expected one argument
usage: build_ctrl_datasets.py [-h] [--outdir OUTDIR] datadir
build_ctrl_datasets.py: error: the following arguments are required: datadir
Times to run script: 2.3285547892252606e-06 min
Making bin file for URLs listed in /ctrl/mapping/mapping_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 117, in
write_to_bin(all_test_urls, args.stories_dir, os.path.join(args.finished_files_dir, "test"))
File "make_datafiles.py", line 76, in write_to_bin
url_list = read_text_file(url_file)
File "make_datafiles.py", line 17, in read_text_file
with open(text_file, "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/mapping/mapping_test.txt'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.target'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.source'
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in
main()
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main
for input in args.inputs
File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in
for input in args.inputs
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.target'
2021-03-28 10:36:44 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='/ctrl-bin/', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, simul_type=None, source_lang='source', srcdict='dict.txt', suppress_crashes=False, target_lang='target', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='dict.txt', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='/ctrl/train.bpe', use_plasma_view=False, user_dir=None, validpref='/ctrl/val.bpe', wandb_project=None, workers=60)
2021-03-28 10:36:45 | INFO | fairseq_cli.preprocess | [source] Dictionary: 50264 types
Traceback (most recent call last):
File "/usr/local/bin/fairseq-preprocess", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')())
File "/content/fairseq/fairseq_cli/preprocess.py", line 394, in cli_main
main(args)
File "/content/fairseq/fairseq_cli/preprocess.py", line 284, in main
make_all(args.source_lang, src_dict)
File "/content/fairseq/fairseq_cli/preprocess.py", line 252, in make_all
make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 248, in make_dataset
make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
File "/content/fairseq/fairseq_cli/preprocess.py", line 133, in make_binary_dataset
offsets = Binarizer.find_offsets(input_file, num_workers)
File "/content/fairseq/fairseq/binarizer.py", line 106, in find_offsets
with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.bpe.source'`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0