-
Notifications
You must be signed in to change notification settings - Fork 438
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.6.4 LMDeploy Release v0.6.4
published
Dec 9, 2024
8 Pull requests merged by 5 people
-
Fix VLM batch inference error
#2879 merged
Dec 10, 2024 -
[Feature] Support for loading lora adapter weights in safetensors format
#2860 merged
Dec 9, 2024 -
bump version to v0.6.4
#2864 merged
Dec 9, 2024 -
Fix vision model batch inference
#2868 merged
Dec 9, 2024 -
Update dlinfer-ascend version in runtime_ascend.txt
#2865 merged
Dec 9, 2024 -
update supported models
#2849 merged
Dec 6, 2024 -
[ascend]feat: support kv int8
#2736 merged
Dec 6, 2024 -
[ascend] convert kv cache to nd format in ascend graph mode
#2853 merged
Dec 4, 2024
7 Pull requests opened by 6 people
-
Support Medusa speculative decoding
#2859 opened
Dec 5, 2024 -
Fix llama3.1 chat template
#2862 opened
Dec 6, 2024 -
refactor PyTorchEngine check env
#2870 opened
Dec 9, 2024 -
support tp > n_kv_heads for pt models
#2872 opened
Dec 9, 2024 -
replicate kv for some models when tp is divisble by kv_head_num
#2874 opened
Dec 9, 2024 -
refine multi-backend setup.py
#2880 opened
Dec 10, 2024 -
fix cpu cache
#2881 opened
Dec 11, 2024
9 Issues closed by 4 people
-
[Bug] Problem when awq quant Internvl 2.5 78B
#2873 closed
Dec 10, 2024 -
[Docs] 模型推理时显存占用比使用transformers大
#2875 closed
Dec 10, 2024 -
[Bug] Llama 3.3 70B Support
#2867 closed
Dec 9, 2024 -
[Feature] Desire to Use the Latest Version of lmdeploy with PyTorch 2.1.0
#2843 closed
Dec 9, 2024 -
[Bug] glm-4v-9b多卡报错
#2855 closed
Dec 9, 2024 -
[Bug] glm-4v-9b在昇腾卡推理报错
#2819 closed
Dec 5, 2024 -
[Bug] qwen2-vl发布服务报错
#2858 closed
Dec 5, 2024 -
[Bug] serve的时候event loop报错
#2101 closed
Dec 4, 2024
9 Issues opened by 9 people
-
[Bug] llama3.2 -11b-version batch infer text-only item return Garbled code
#2878 opened
Dec 10, 2024 -
[Bug] raise error on transformers==4.47.0 when do awq or chat with awq models
#2877 opened
Dec 10, 2024 -
[Bug] lmdeploy serve InternVL2_5 report errors
#2876 opened
Dec 10, 2024 -
关于lmdeploy的模型细节
#2871 opened
Dec 9, 2024 -
[Bug] lmdeploy[432]: OSError: image file is truncated
#2869 opened
Dec 9, 2024 -
Qwen2-VL-72B-Instruct-AWQ 推理结果异常
#2863 opened
Dec 6, 2024 -
[Feature] how to adding a new model as vllm
#2861 opened
Dec 6, 2024 -
LLaVA INT4 quantization
#2857 opened
Dec 5, 2024 -
Poor performance of Molmo pointing function
#2856 opened
Dec 4, 2024
9 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Refactor VLM modules
#2810 commented on
Dec 11, 2024 • 17 new comments -
[Bug] docker+lmdeploy deploys multimodal large model and reports an error:AssertionError: failed to match chat template, please explicit set chat_template_config(docker+lmdeploy部署多模态大模型)
#2805 commented on
Dec 4, 2024 • 0 new comments -
[Bug] Deployment of Llama3.1-70b getting struck
#2724 commented on
Dec 4, 2024 • 0 new comments -
[Bug] 0.6.2 vs 0.4.2 qwen1.5b模型,0.6.2推理性能差距有慢3倍
#2752 commented on
Dec 4, 2024 • 0 new comments -
[Feature] Load safetensors of a lora adapter in PyTorch engine
#2851 commented on
Dec 5, 2024 • 0 new comments -
[Bug] Cannot install torch-npu==2.3.1, torch==2.3.1 and torchvision==0.18.1 because these package versions have conflicting dependencies.
#2745 commented on
Dec 5, 2024 • 0 new comments -
How to deploy model with its lora adapter
#2852 commented on
Dec 9, 2024 • 0 new comments -
[Docs] 有关VLLM性能测试的疑问
#2838 commented on
Dec 9, 2024 • 0 new comments -
Refactor turbomind attention by precomputing rotary embed
#2801 commented on
Dec 10, 2024 • 0 new comments