8000 bm25s构建索引出错 · Issue #178 · RUC-NLPIR/FlashRAG · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

bm25s构建索引出错 #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thunderbolt-fire opened this issue May 10, 2025 · 1 comment
Open

bm25s构建索引出错 #178

thunderbolt-fire opened this issue May 10, 2025 · 1 comment

Comments

@thunderbolt-fire
Copy link
Finding newlines for mmindex: 0.00B [00:00, ?B/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/siton-data-0553377b2d664236bad5b5d0ba8aa419/workspace/FlashRAG/flashrag/retriever/index_builder.py", line 415, in <module>
    main()
  File "/root/siton-data-0553377b2d664236bad5b5d0ba8aa419/workspace/FlashRAG/flashrag/retriever/index_builder.py", line 411, in main
    index_builder.build_index()
  File "/root/siton-data-0553377b2d664236bad5b5d0ba8aa419/workspace/FlashRAG/flashrag/retriever/index_builder.py", line 127, in build_index
    self.build_bm25_index_bm25s()
  File "/root/siton-data-0553377b2d664236bad5b5d0ba8aa419/workspace/FlashRAG/flashrag/retriever/index_builder.py", line 213, in build_bm25_index_bm25s
    tokenizer.save_vocab(self.save_dir)
    ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Tokenizer' object has no attribute 'save_vocab'. Did you mean: 'reset_vocab'?

使用的脚步

CUDA_VISIBLE_DEVICES=0 python -m flashrag.retriever.index_builder \
    --retrieval_method bm25 \
    --corpus_path /root/siton-data-0553377b2d664236bad5b5d0ba8aa419/workspace/FlashRAG/FlashRAG_Dataset/mmqa/train.parquet \
    --save_dir indexes/mmqa \
    --max_length 512 \
    --batch_size 512 \
    --bm25_backend bm25s
@ignorejjj
Copy link
Member

bm25s库本身的版本管理有点问题,先尝试降级到0.2.0或者更低的版本。如果仍然不能正常运行,把bm25的backend更换为pyserini

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0