Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

HURIMOZ · 2024-08-16T19:24:20Z

Hi guys I get an error when using transforms norm_quote_commas and norm_numbers. See below:

eole build_vocab --config wmt17_frty.yaml --n_sample -1 # --num_threads 4
Traceback (most recent call last):
  File "/home/ubuntu/TY-EN/TY-EN/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('eole', 'console_scripts', 'eole')())
  File "/home/ubuntu/TY-EN/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/home/ubuntu/TY-EN/eole/eole/bin/run/build_vocab.py", line 272, in run
    config = cls.build_config(args)
  File "/home/ubuntu/TY-EN/eole/eole/bin/run/__init__.py", line 42, in build_config
    config = cls.config_class(**config_dict)
  File "/home/ubuntu/TY-EN/TY-EN/lib/python3.10/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for BuildVocabConfig
data.corpus_1.norm_quote_commas
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
data.corpus_1.norm_numbers
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden

The text was updated successfully, but these errors were encountered:

francoishernandez · 2024-08-20T16:40:39Z

I would suspect there is some issue in your config syntax. Can you share your config? (Or at least the part regarding transforms.)

HURIMOZ · 2024-08-21T05:13:53Z

Hi François, this is my config:

## IO
save_data: wmt17_en_ty
overwrite: true
seed: 1234
report_every: 100
valid_metrics: ["BLEU"]

### Vocab
src_vocab: data/vocab.shared
tgt_vocab: data/vocab.shared
src_vocab_size: 32000
tgt_vocab_size: 28000
vocab_size_multiple: 8
src_words_min_frequency: 1
tgt_words_min_frequency: 1
share_vocab: true
n_sample: 0

data:
    corpus_1:
        path_src: processed_data/train.src.bpe.shuf
        path_tgt: processed_data/train.trg.bpe.shuf
        #transforms: [normalize, filtertoolong]
    valid:
        path_src: processed_data/dev.src.bpe
        path_tgt: processed_data/dev.trg.bpe

training:
    # Model configuration
    model_path: models
    keep_checkpoint: 50
    save_checkpoint_steps: 1000
    average_decay: 0
    train_steps: 100000
    valid_steps: 10000

    # bucket_size: 
    bucket_size: 2048
    num_workers: 4
    prefetch_factor: 4
    world_size: 1
    gpu_ranks: [0]
    batch_type: "tokens"
    batch_size: 2048
    valid_batch_size: 1024
    batch_size_multiple: 8
    accum_count: [10]
    accum_steps: [0]
    dropout_steps: [0]
    dropout: [0.2]
    attention_dropout: [0.2]
    #compute_dtype: 16
    optim: "adam"
    learning_rate: 2
    warmup_steps: 4000
    decay_method: "noam"
    adam_beta2: 0.998
    max_grad_norm: 0
    label_smoothing: 0.1
    param_init: 0
    param_init_glorot: true
    normalization: "tokens"

model:
    architecture: "transformer"
    hidden_size: 256
    share_decoder_embeddings: true
    share_embeddings: true
    layers: 6
    heads: 8
    transformer_ff: 256

embeddings_type: "word2vec"
src_embeddings: data/cc.en.256.txt
word_vec_size: 256
position_encoding_type: "SinusoidalInterleaved"

francoishernandez · 2024-08-21T09:33:01Z

The config you provided is apparently not the one which triggered the error, but investigating further lead me to the issue.
Should be fixed in #87. Can you try this branch to make sure it fixes all your issues before we merge?

HURIMOZ · 2024-08-23T02:24:02Z

Hi François, yes the error is gone now. Thanks for that.

francoishernandez mentioned this issue Aug 21, 2024

[fix] fix normalize and clean transforms config management #87

Merged

francoishernandez added the bug Something isn't working label Aug 21, 2024

francoishernandez closed this as completed in #87 Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

Comments