8000 Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted · Issue #83 · eole-nlp/eole · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

New issue

Have a question about this project? Sign up for a free GitHub account 8000 to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
HURIMOZ opened this issue Aug 16, 2024 · 4 comments · Fixed by #87
Closed

Transform norm_quote_commas & norm_numbers | Extra inputs are not permitted #83

HURIMOZ opened this issue Aug 16, 2024 · 4 comments · Fixed by #87
Labels
bug Something isn't working

Comments

@HURIMOZ
Copy link
HURIMOZ commented Aug 16, 2024

Hi guys I get an error when using transforms norm_quote_commas and norm_numbers. See below:

eole build_vocab --config wmt17_frty.yaml --n_sample -1 # --num_threads 4
Traceback (most recent call last):
  File "/home/ubuntu/TY-EN/TY-EN/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('eole', 'console_scripts', 'eole')())
  File "/home/ubuntu/TY-EN/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/home/ubuntu/TY-EN/eole/eole/bin/run/build_vocab.py", line 272, in run
    config = cls.build_config(args)
  File "/home/ubuntu/TY-EN/eole/eole/bin/run/__init__.py", line 42, in build_config
    config = cls.config_class(**config_dict)
  File "/home/ubuntu/TY-EN/TY-EN/lib/python3.10/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 2 validation errors for BuildVocabConfig
data.corpus_1.norm_quote_commas
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
data.corpus_1.norm_numbers
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.8/v/extra_forbidden
@francoishernandez
Copy link
Member

I would suspect there is some issue in your config syntax. Can you share your config? (Or at least the part regarding transforms.)

@HURIMOZ
Copy link
Author
HURIMOZ commented Aug 21, 2024

Hi François, this is my config:

## IO
save_data: wmt17_en_ty
overwrite: true
seed: 1234
report_every: 100
valid_metrics: ["BLEU"]

### Vocab
src_vocab: data/vocab.shared
tgt_vocab: data/vocab.shared
src_vocab_size: 32000
tgt_vocab_size: 28000
vocab_size_multiple: 8
src_words_min_frequency: 1
tgt_words_min_frequency: 1
share_vocab: true
n_sample: 0

data:
    corpus_1:
        path_src: processed_data/train.src.bpe.shuf
        path_tgt: processed_data/train.trg.bpe.shuf
        #transforms: [normalize, filtertoolong]
    valid:
        path_src: processed_data/dev.src.bpe
        path_tgt: processed_data/dev.trg.bpe

training:
    # Model configuration
    model_path: models
    keep_checkpoint: 50
    save_checkpoint_steps: 1000
    average_decay: 0
    train_steps: 100000
    valid_steps: 10000

    # bucket_size: 
    bucket_size: 2048
    num_workers: 4
    prefetch_factor: 4
    world_size: 1
    gpu_ranks: [0]
    batch_type: "tokens"
    batch_size: 2048
    valid_batch_size: 1024
    batch_size_multiple: 8
    accum_count: [10]
    accum_steps: [0]
    dropout_steps: [0]
    dropout: [0.2]
    attention_dropout: [0.2]
    #compute_dtype: 16
    optim: "adam"
    learning_rate: 2
    warmup_steps: 4000
    decay_method: "noam"
    adam_beta2: 0.998
    max_grad_norm: 0
    label_smoothing: 0.1
    param_init: 0
    param_init_glorot: true
    normalization: "tokens"

model:
    architecture: "transformer"
    hidden_size: 256
    share_decoder_embeddings: true
    share_embeddings: true
    layers: 6
    heads: 8
    transformer_ff: 256

embeddings_type: "word2vec"
src_embeddings: data/cc.en.256.txt
word_vec_size: 256
position_encoding_type: "SinusoidalInterleaved"

@francoishernandez
Copy link
Member

The config you provided is apparently not the one which triggered the error, but investigating further lead me to the issue.
Should be fixed in #87. Can you try this branch to make sure it fixes all your issues before we merge?

@francoishernandez francoishernandez added the bug Something isn't working label Aug 21, 2024
@HURIMOZ
Copy link
Author
HURIMOZ commented Aug 23, 2024

Hi François, yes the error is gone now. Thanks for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0