8000 how to run spm_train in multilingual_preprocess.sh · Issue #3 · cordercorder/nmt-multi · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

how to run spm_train in multilingual_preprocess.sh #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our 8000 terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
altctrl00 opened this issue Oct 13, 2022 · 10 comments
Open

how to run spm_train in multilingual_preprocess.sh #3

altctrl00 opened this issue Oct 13, 2022 · 10 comments

Comments

@altctrl00
Copy link

in the scripts/ted/data_process/multilingual_preprocess.sh ,you use spm_train to train spm,is that from fairseq? how can i run it?

@cordercorder
Copy link
Owner

spm_train is a command of sentencepiece toolkit. Please install sentencepiece first and the shell script you mentioned can run well.

@altctrl00
Copy link
Author

Thanks for your response so quickly, i found i can't install sentencepiece as command line tools because i am not root,i am sorry i am a rookie,is that possible i can install it as a non-root user?

@cordercorder
Copy link
Owner

Yes. You can run pip install sentencepiece or conda install sentencepiece to install sentencepiece. After that, the command line tools provided by sentencepiece can be direly used. There is no need to build and install sentencepiece from the source, which may require root privileges to install the build tools.

8000

@altctrl00
Copy link
Author

Thanks a lot, conda install sentencepiece will be a solution, pip install may not be compatible with conda

@cordercorder
Copy link
Owner

There may be some discrepancy between sentencepiece from pip and conda. As sentencepiece in my Python environment is installed through conda install sentencepiece and the command line tools work well, I thought pip install sentencepiece will also work 😥.

@altctrl00
Copy link
Author
altctrl00 commented Oct 14, 2022

when I in nmt-multi directory to run bash scripts/ted/data_process/multilingual_preprocess.sh. It couldn't find nmt module in python -u ${project_dir}/nmt/data_handling/corpus_manager.py . My project diretory is /home/.../nmt-multi.
I was curious that if adding __init__.py would work , but it turns to be not.
I edit the corpus_manager.py, changing nmt.data_handling to data_utils and it can work.

@altctrl00
Copy link
Author

In the data_handling/data_utils,there is one import from nmt.tools import Converter,i couldn't find nmt.tools.

@cordercorder
Copy link
Owner

Thanks for reporting these issues.

You can insert the path of nmt-multi directory to the environment variable PYTHONPATH to make Python interpreter aware of the nmt package. python -u ${project_dir}/nmt/data_handling/corpus_manager.py can work well afterward. Below is an example:

export PYTHONPATH=/path/to/nmt-multi:${PYTHONPATH}

when I in nmt-multi directory to run bash scripts/ted/data_process/multilingual_preprocess.sh. It couldn't find nmt module in python -u ${project_dir}/nmt/data_handling/corpus_manager.py . My project diretory is /home/.../nmt-multi. I was curious that if adding __init__.py would work , but it turns to be not. I edit the corpus_manager.py, changing nmt.data_handling to data_utils and it can work.

Sorry, this is a mistake during cleaning up the source codes. Please delete this line.

In the data_handling/data_utils,there is one import from nmt.tools import Converter,i couldn't find nmt.tools.

@cordercorder
Copy link
Owner

Hi, I pushed a new commit to this repository and the changed files can be found at here. Does this script run well now?

@altctrl00
Copy link
Author

Thanks,it can run well now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0