English | 简体中文
HuixiangDou is a group chat assistant based on LLM (Large Language Model).
Advantages:
- Design a three-stage pipeline of preprocess, rejection and response to cope with group chat scenario, answer user questions without message flooding, see 2401.08772, 2405.02817 and Precision Report.
- Low cost, requiring only 1.5GB memory and no need for training
- Offers a complete suite of Web, Android, and pipeline source code, which is industrial-grade and commercially viable
Check out the scenes in which HuixiangDou are running and join WeChat Group to try AI assistant inside.
If this helps you, please give it a star ⭐
The web portal is available on OpenXLab, where you can build your own knowledge assistant without any coding, using WeChat and Feishu groups.
Visit web portal usage video on YouTube and BiliBili.
-
[2024/05] wkteam WeChat access, support image, URL and reference resolution in group chat
-
[2024/05] Add Coreference Resolution fine-tune
🤗 LoRA-Qwen1.5-14B LoRA-Qwen1.5-32B alpaca data arXiv -
[2024/04] Add SFT data annotation and examples
-
[2024/04] Update technical report
-
[2024/04] Release web server source code 👍
-
[2024/03] New wechat integration method with prebuilt android apk !
-
[2024/02] [experimental] Integrated multimodal model into our wechat group for OCR
Model | File Format | IM Application |
|
|
The following are the hardware requirements for running. It is suggested to follow this document, starting with the basic version and gradually experiencing advanced features.
Version | GPU Memory Requirements | Features | Tested on Linux |
---|---|---|---|
Cost-effective Edition | 1.5GB | Use openai API (e.g., kimi and deepseek) to handle source code-level issues Free within quota |
|
Standard Edition | 19GB | Deploy local LLM can answer basic questions | |
Complete Edition | 40GB | Fully utilizing search + long-text, answer source code-level questions |
First agree BCE license and login huggingface.
huggingface-cli login
Then install requirements.
# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt
The standard edition runs text2vec, rerank and a 7B model locally.
STEP1. First, without rejection pipeline, run test cases:
# Standalone mode
# main creates a subprocess to run the LLM API, then sends requests to the subprocess
python3 -m huixiangdou.main --standalone
..
..Topics unrelated to the knowledge base.."How to install mmpose?"
..Topics unrelated to the knowledge base.."How's the weather tomorrow?"
You can see that the result of handling the example question in main.py
is the same, whether it's about mmpose installation
or How's the weather tomorrow?
STEP2. Use mmpose and test documents to build a knowledge base and enable the rejection pipeline
Copy all the commands below (including the '#' symbol) and execute them.
# Download knowledge base documents
cd HuixiangDou
mkdir repodir
git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
git clone https://github.com/tpoisonooo/huixiangdou-testdata --depth=1 repodir/testdata
# Save the features of repodir to workdir
mkdir workdir
python3 -m huixiangdou.service.feature_store
Note
Then rerun main
, Huixiangdou will be able to answer mmpose installation
and reject casual chats.
python3 -m huixiangdou.main --standalone
..success.. To install mmpose, you should..
..Topics unrelated to the knowledge base.."How's the weather tomorrow?"
Please adjust the repodir
documents, good_questions, and bad_questions to try your own domain knowledge (medical, financial, power, etc.).
STEP3. Test sending messages to Feishu group (optional)
This step is just for testing algorithm pipeline, STEP4
also support IM applications.
Click Create Feishu Custom Bot to obtain the callback WEBHOOK_URL and fill it in config.ini
# config.ini
...
[frontend]
type = "lark"
webhook_url = "${YOUR-LARK-WEBHOOK-URL}"
Run. After the end, the technical assistant's response will be sent to Feishu group.
python3 -m huixiangdou.main --standalone
- Integrate Feishu group receiving, sending, and withdrawal
- Integrate personal WeChat access
- Integrate wkteam WeChat access
STEP4. WEB service and IM applications
We provide a complete front-end UI and backend service that supports:
- Multi-tenant management
- Zero-programming access to Feishu, WeChat groups
See the effect at OpenXlab APP, please read the web deployment document.
If your machine only has 2G GPU memory, or if you are pursuing cost-effectiveness, you only need to read this Zhihu document.
The cost-effective version only discards the local LLM and uses the remote LLM instead, and other functions are the same as the standard version.
Take kimi as an example, fill in the API KEY applied from the official website into config-2G.ini
# config-2G.ini
[llm]
enable_local = 0
enable_remote = 1
...
remote_type = "kimi"
remote_api_key = "YOUR-API-KEY-HERE"
Note
Execute the command to get the Q&A result
python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once
The HuixiangDou deployed in the WeChat group is the complete version.
When 40G of GPU memory is available, long text + retrieval capabilities can be used to improve accuracy.
Please read following topics
- Refer to config-advanced.ini to improve precision
- Use rag.py to annotate SFT training data
- Coreference resolution fine-tune
- Using the commercial WeChat integration, add image analysis, public account parsing, and reference resolution
-
What if the robot is too cold/too chatty?
- Fill in the questions that should be answered in the real scenario into
resource/good_questions.json
, and fill the ones that should be rejected intoresource/bad_questions.json
. - Adjust the theme content in
repodir
to ensure that the markdown documents in the main library do not contain irrelevant content.
Re-run
feature_store
to update thresholds and feature libraries.⚠️ You can directly modifyreject_throttle
in config.ini. Generally speaking, 0.5 is a high value; 0.2 is too low. - Fill in the questions that should be answered in the real scenario into
-
Launch is normal, but out of memory during runtime?
LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as lmdeploy quantization description. Then use docker to independently deploy Hybrid LLM Service.
-
How to access other local LLM / After access, the effect is not ideal?
- Open hybrid llm service, add a new LLM inference implementation.
- Refer to test_intention_prompt and test data, adjust prompt and threshold for the new model, and update them into worker.py.
-
What if the response is too slow/request always fails?
- Refer to hybrid llm service to add exponential backoff and retransmission.
- Replace local LLM with an inference framework such as lmdeploy, instead of the native huggingface/transformers.
-
What if the GPU memory is too low?
At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that
config.ini
only uses remote LLM and turn off local LLM. -
No module named 'faiss.swigfaiss_avx2'
locate installedfaiss
packageimport faiss print(faiss.__file__) # /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/__init__.py
add soft link
# cd your_python_path/site-packages/faiss cd /root/.conda/envs/InternLM2_Huixiangdou/lib/python3.10/site-packages/faiss/ ln -s swigfaiss.py swigfaiss_avx2.py
- KIMI: long context LLM
- BCEmbedding: Bilingual and Crosslingual Embedding (BCEmbedding) in English and Chinese
- Langchain-ChatChat: ChatGLM Application based on Langchain
- GrabRedEnvelope: Grab Wechat RedEnvelope
@misc{kong2024huixiangdou,
title={HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
year={2024},
eprint={2401.08772},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{kong2024huixiangdoucr,
title={HuixiangDou-CR: Coreference Resolution in Group Chats},
author={Huanjun Kong},
year={2024},
eprint={2405.02817},
archivePrefix={arXiv},
primaryClass={cs.CL}
}