8000 Support using baidu vdb as doc engine by letterbeezps · Pull Request #7380 · infiniflow/ragflow · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support using baidu vdb as doc engine #7380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

letterbeezps
Copy link
@letterbeezps letterbeezps commented Apr 28, 2025

What problem does this PR solve?

support use baidu vectorDB as doc_engine

Type of change

  • New Feature (non-breaking change which adds functionality)

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 💞 feature Feature request, pull request that fullfill a new feature. labels Apr 28, 2025
@yingfeng yingfeng added the ci Continue Integration label Apr 28, 2025
@yingfeng
Copy link
Member

CI failed due to static check

@yingfeng
Copy link
Member
Run astral-sh/ruff-action@v2
  
Found ruffDir in tool-cache for 0.11.7
Added /home/infiniflow/runners_work/inf128-de3e5ec46472/_tool/ruff/0.11.7/x86_64 to the path
Successfully installed ruff version 0.11.7
/home/infiniflow/runners_work/inf128-de3e5ec46472/_tool/ruff/0.11.7/x86_64/ruff check /home/infiniflow/runners_work/inf128-de3e5ec46472/ragflow/ragflow
rag/utils/baidu_vdb_conn.py:20:8: F401 [*] `time` imported but unused
   |
18 | import re
19 | import json
20 | import time
   |        ^^^^ F401
21 | import os
   |
   = help: Remove unused import: `time`
rag/utils/baidu_vdb_conn.py:49:26: F401 [*] `rag.settings.TAG_FLD` imported but unused
   |
48 | from rag import settings
49 | from rag.settings import TAG_FLD, PAGERANK_FLD
   |                          ^^^^^^^ F401
50 | from rag.utils import singleton
51 | import pandas as pd
   |
   = help: Remove unused import
rag/utils/baidu_vdb_conn.py:49:35: F401 [*] `rag.settings.PAGERANK_FLD` imported but unused
   |
48 | from rag import settings
49 | from rag.settings import TAG_FLD, PAGERANK_FLD
   |                                   ^^^^^^^^^^^^ F401
50 | from rag.utils import singleton
51 | import pandas as pd
   |
   = help: Remove unused import
rag/utils/baidu_vdb_conn.py:55:21: F401 [*] `rag.nlp.is_english` imported but unused
   |
53 | from rag.utils.doc_store_conn import DocStoreConnection, MatchExpr, OrderByExpr, MatchTextExpr, MatchDenseExpr, \
54 |     FusionExpr
55 | from rag.nlp import is_english, rag_tokenizer
   |                     ^^^^^^^^^^ F401
56 |
57 | ATTEMPT_TIME = 2
   |
   = help: Remove unused import
rag/utils/baidu_vdb_conn.py:55:33: F401 [*] `rag.nlp.rag_tokenizer` imported but unused
   |
53 | from rag.utils.doc_store_conn import DocStoreConnection, MatchExpr, OrderByExpr, MatchTextExpr, MatchDenseExpr, \
54 |     FusionExpr
55 | from rag.nlp import is_english, rag_tokenizer
   |                                 ^^^^^^^^^^^^^ F401
56 |
57 | ATTEMPT_TIME = 2
   |
   = help: Remove unused import
rag/utils/baidu_vdb_conn.py:218:31: F841 [*] Local variable `e` is assigned to but never used
    |
216 |             table_name_list = [table.table_name for table in table_list]
217 |             return table_name in table_name_list
218 |         except ClientError as e:
    |                               ^ F841
219 |             return False
220 |         except Exception as e:
    |
    = help: Remove assignment to unused variable `e`
rag/utils/baidu_vdb_conn.py:308:21: E731 Do not assign a `lambda` expression, use a `def`
    |
306 |                     sort_fields.append((field, min, True if desc else False))
307 |                 if field == "create_timestamp_flt":
308 |                     f = lambda x : -x if desc else lambda x : x
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ E731
309 |                     sort_fields.append((field, f, True if desc else False))
310 |             def build_sort_key(entry):
    |
    = help: Rewrite `f` as a `def`
rag/utils/baidu_vdb_conn.py:404:24: E713 [*] Test for membership should be `not in`
    |
402 |                 assert isinstance(field, str)
403 |                 assert isinstance(field_attribute, dict)
404 |                 if not field in d:
    |                        ^^^^^^^^^^ E713
405 |                     d[field] = field_attribute["default"]
    |
    = help: Convert to `not in`
Found 8 errors.
[*] 7 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).
Error: The process '/home/infiniflow/runners_work/inf128-de3e5ec46472/_tool/ruff/0.11.7/x86_64/ruff' failed with exit code 1

@yingfeng yingfeng changed the title support use baidu vdb as doc engine Support use baidu vdb as doc engine May 13, 2025
@yingfeng yingfeng changed the title Support use baidu vdb as doc engine Support using baidu vdb as doc engine May 13, 2025
@KevinHuSh
Copy link
Collaborator
KevinHuSh commented May 15, 2025

Appreciations!
It seems that it does not support field weight assignments which does not meet the relevance calculation requirements.
Would you figure it out how to improve it?

@letterbeezps
Copy link
Author

@KevinHuSh Field weight assignments is not supported for Baidu_Vector_Database right now, we will add this feature in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continue Integration 💞 feature Feature request, pull request that fullfill a new feature. size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0