how to run spm_train in multilingual_preprocess.sh #3

altctrl00 · 2022-10-13T14:57:26Z

in the scripts/ted/data_process/multilingual_preprocess.sh ,you use spm_train to train spm,is that from fairseq? how can i run it?

cordercorder · 2022-10-13T15:14:18Z

spm_train is a command of sentencepiece toolkit. Please install sentencepiece first and the shell script you mentioned can run well.

altctrl00 · 2022-10-13T15:44:30Z

Thanks for your response so quickly, i found i can't install sentencepiece as command line tools because i am not root,i am sorry i am a rookie,is that possible i can install it as a non-root user?

cordercorder · 2022-10-13T17:16:55Z

Yes. You can run pip install sentencepiece or conda install sentencepiece to install sentencepiece. After that, the command line tools provided by sentencepiece can be direly used. There is no need to build and install sentencepiece from the source, which may require root privileges to install the build tools.

8000

altctrl00 · 2022-10-14T03:03:07Z

Thanks a lot, conda install sentencepiece will be a solution, pip install may not be compatible with conda

cordercorder · 2022-10-14T06:32:32Z

There may be some discrepancy between sentencepiece from pip and conda. As sentencepiece in my Python environment is installed through conda install sentencepiece and the command line tools work well, I thought pip install sentencepiece will also work 😥.

altctrl00 · 2022-10-14T08:27:56Z

when I in nmt-multi directory to run bash scripts/ted/data_process/multilingual_preprocess.sh. It couldn't find nmt module in python -u ${project_dir}/nmt/data_handling/corpus_manager.py . My project diretory is /home/.../nmt-multi.
I was curious that if adding __init__.py would work , but it turns to be not.
I edit the corpus_manager.py, changing nmt.data_handling to data_utils and it can work.

altctrl00 · 2022-10-14T08:37:02Z

In the data_handling/data_utils,there is one import from nmt.tools import Converter,i couldn't find nmt.tools.

cordercorder · 2022-10-14T08:55:37Z

Thanks for reporting these issues.

You can insert the path of nmt-multi directory to the environment variable PYTHONPATH to make Python interpreter aware of the nmt package. python -u ${project_dir}/nmt/data_handling/corpus_manager.py can work well afterward. Below is an example:

export PYTHONPATH=/path/to/nmt-multi:${PYTHONPATH}

when I in nmt-multi directory to run bash scripts/ted/data_process/multilingual_preprocess.sh. It couldn't find nmt module in python -u ${project_dir}/nmt/data_handling/corpus_manager.py . My project diretory is /home/.../nmt-multi. I was curious that if adding __init__.py would work , but it turns to be not. I edit the corpus_manager.py, changing nmt.data_handling to data_utils and it can work.

Sorry, this is a mistake during cleaning up the source codes. Please delete this line.

In the data_handling/data_utils,there is one import from nmt.tools import Converter,i couldn't find nmt.tools.

cordercorder · 2022-10-14T09:13:39Z

Hi, I pushed a new commit to this repository and the changed files can be found at here. Does this script run well now?

altctrl00 · 2022-10-14T09:29:40Z

Thanks,it can run well now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to run spm_train in multilingual_preprocess.sh #3

how to run spm_train in multilingual_preprocess.sh #3

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

how to run spm_train in multilingual_preprocess.sh #3

how to run spm_train in multilingual_preprocess.sh #3

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!