Pulse · InternLM/lmdeploy · GitHub

More Web Proxy on the site http://driver.im/

December 3, 2024 – December 10, 2024

Overview

15 Active pull requests

18 Active issues

1 Release published by 1 person

v0.6.4 LMDeploy Release v0.6.4
published Dec 9, 2024

8 Pull requests merged by 5 people

Fix VLM batch inference error
#2879 merged Dec 10, 2024
[Feature] Support for loading lora adapter weights in safetensors format
#2860 merged Dec 9, 2024
bump version to v0.6.4
#2864 merged Dec 9, 2024
Fix vision model batch inference
#2868 merged Dec 9, 2024
Update dlinfer-ascend version in runtime_ascend.txt
#2865 merged Dec 9, 2024
update supported models
#2849 merged Dec 6, 2024
[ascend]feat: support kv int8
#2736 merged Dec 6, 2024
[ascend] convert kv cache to nd format in ascend graph mode
#2853 merged Dec 4, 2024

7 Pull requests opened by 6 people

Support Medusa speculative decoding
#2859 opened Dec 5, 2024
Fix llama3.1 chat template
#2862 opened Dec 6, 2024
refactor PyTorchEngine check env
#2870 opened Dec 9, 2024
support tp > n_kv_heads for pt models
#2872 opened Dec 9, 2024
replicate kv for some models when tp is divisble by kv_head_num
#2874 opened Dec 9, 2024
refine multi-backend setup.py
#2880 opened Dec 10, 2024
fix cpu cache
#2881 opened Dec 11, 2024

9 Issues closed by 4 people

[Bug] Problem when awq quant Internvl 2.5 78B
#2873 closed Dec 10, 2024
[Docs] 模型推理时显存占用比使用transformers大
#2875 closed Dec 10, 2024
[Bug] Llama 3.3 70B Support
#2867 closed Dec 9, 2024
量化警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors
#2866 closed Dec 9, 2024
[Feature] Desire to Use the Latest Version of lmdeploy with PyTorch 2.1.0
#2843 closed Dec 9, 2024
[Bug] glm-4v-9b多卡报错
#2855 closed Dec 9, 2024
[Bug] glm-4v-9b在昇腾卡推理报错
#2819 closed Dec 5, 2024
[Bug] qwen2-vl发布服务报错
#2858 closed Dec 5, 2024
[Bug] serve的时候event loop报错
#2101 closed Dec 4, 2024

9 Issues opened by 9 people

[Bug] llama3.2 -11b-version batch infer text-only item return Garbled code
#2878 opened Dec 10, 2024
[Bug] raise error on transformers==4.47.0 when do awq or chat with awq models
#2877 opened Dec 10, 2024
[Bug] lmdeploy serve InternVL2_5 report errors
#2876 opened Dec 10, 2024
关于lmdeploy的模型细节
#2871 opened Dec 9, 2024
[Bug] lmdeploy[432]: OSError: image file is truncated
#2869 opened Dec 9, 2024
Qwen2-VL-72B-Instruct-AWQ 推理结果异常
#2863 opened Dec 6, 2024
[Feature] how to adding a new model as vllm
#2861 opened Dec 6, 2024
LLaVA INT4 quantization
#2857 opened Dec 5, 2024
Poor performance of Molmo pointing function
#2856 opened Dec 4, 2024

9 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Refactor VLM modules
#2810 commented on Dec 11, 2024 • 17 new comments
[Bug] docker+lmdeploy deploys multimodal large model and reports an error：AssertionError: failed to match chat template, please explicit set chat_template_config（docker+lmdeploy部署多模态大模型）
#2805 commented on Dec 4, 2024 • 0 new comments
[Bug] Deployment of Llama3.1-70b getting struck
#2724 commented on Dec 4, 2024 • 0 new comments
[Bug] 0.6.2 vs 0.4.2 qwen1.5b模型，0.6.2推理性能差距有慢3倍
#2752 commented on Dec 4, 2024 • 0 new comments
[Feature] Load safetensors of a lora adapter in PyTorch engine
#2851 commented on Dec 5, 2024 • 0 new comments
[Bug] Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies.
#2745 commented on Dec 5, 2024 • 0 new comments
How to deploy model with its lora adapter
#2852 commented on Dec 9, 2024 • 0 new comments
[Docs] 有关VLLM性能测试的疑问
#2838 commented on Dec 9, 2024 • 0 new comments
Refactor turbomind attention by precomputing rotary embed
#2801 commented on Dec 10, 2024 • 0 new comments