量化警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors #2866

lingyezhixing · 2024-12-08T07:29:20Z

量化InternVL2_5-8B，因为网络不稳定，所以我先下载了ptb然后从本地加载数据集进行量化，结果出现如下警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors。

lingyezhixing · 2024-12-08T07:29:38Z

(lmdeploy) C:\Users\31940>lmdeploy lite auto_awq D:\LLM\models\LLM\OpenGVLab\InternVL2_5-8B --calib-dataset ptb --calib-samples 128 --calib-seqlen 2048 --w-bits 4 --w-group-size 128 --batch-size 1 --work-dir D:\LLM\models\LLM\OpenGVLab\InternVL2_5-8B-AWQ
2024-12-08 15:23:36,569 - lmdeploy - INFO - builder.py:64 - matching vision model: InternVLVisionModel
E:\Programming\pycodes\miniconda3\envs\lmdeploy\lib\site-packages\timm\models\layers_init_.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
InternLM2ForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
If you are not the owner of the model architecture class, please contact the model code owner to update it.
2024-12-08 15:24:03,429 - lmdeploy - INFO - internvl.py:120 - using InternVL-Chat-V1-5 vision preprocess
Move model.tok_embeddings to GPU.
Move model.layers.0 to CPU.
Move model.layers.1 to CPU.
Move model.layers.2 to CPU.
Move model.layers.3 to CPU.
Move model.layers.4 to CPU.
Move model.layers.5 to CPU.
Move model.layers.6 to CPU.
Move model.layers.7 to CPU.
Move model.layers.8 to CPU.
Move model.layers.9 to CPU.
Move model.layers.10 to CPU.
Move model.layers.11 to CPU.
Move model.layers.12 to CPU.
Move model.layers.13 to CPU.
Move model.layers.14 to CPU.
Move model.layers.15 to CPU.
Move model.layers.16 to CPU.
Move model.layers.17 to CPU.
Move model.layers.18 to CPU.
Move model.layers.19 to CPU.
Move model.layers.20 to CPU.
Move model.layers.21 to CPU.
Move model.layers.22 to CPU.
Move model.layers.23 to CPU.
Move model.layers.24 to CPU.
Move model.layers.25 to CPU.
Move model.layers.26 to CPU.
Move model.layers.27 to CPU.
Move model.layers.28 to CPU.
Move model.layers.29 to CPU.
Move model.layers.30 to CPU.
Move model.layers.31 to CPU.
Move model.norm to GPU.
Move output to CPU.
Loading calibrate dataset ...
Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors
model.layers.0, samples: 128, max gpu memory: 7.14 GB
model.layers.1, samples: 128, max gpu memory: 9.14 GB
model.layers.2, samples: 128, max gpu memory: 9.14 GB
model.layers.3, samples: 128, max gpu memory: 9.14 GB
model.layers.4, samples: 128, max gpu memory: 9.14 GB
model.layers.5, samples: 128, max gpu memory: 9.14 GB
model.layers.6, samples: 128, max gpu memory: 9.14 GB
model.layers.7, samples: 128, max gpu memory: 9.14 GB
model.layers.8, samples: 128, max gpu memory: 9.14 GB
model.layers.9, samples: 128, max gpu memory: 9.14 GB
model.layers.10, samples: 128, max gpu memory: 9.14 GB
model.layers.11, samples: 128, max gpu memory: 9.14 GB
model.layers.12, samples: 128, max gpu memory: 9.14 GB
model.layers.13, samples: 128, max gpu memory: 9.14 GB
model.layers.14, samples: 128, max gpu memory: 9.14 GB
model.layers.15, samples: 128, max gpu memory: 9.14 GB
model.layers.16, samples: 128, max gpu memory: 9.14 GB
model.layers.17, samples: 128, max gpu memory: 9.14 GB
model.layers.18, samples: 128, max gpu memory: 9.14 GB
model.layers.19, samples: 128, max gpu memory: 9.14 GB
model.layers.20, samples: 128, max gpu memory: 9.14 GB
model.layers.21, samples: 128, max gpu memory: 9.14 GB
model.layers.22, samples: 128, max gpu memory: 9.14 GB
model.layers.23, samples: 128, max gpu memory: 9.14 GB
model.layers.24, samples: 128, max gpu memory: 9.14 GB
model.layers.25, samples: 128, max gpu memory: 9.14 GB

AllentDan · 2024-12-09T04:55:49Z

Just ignore the warning. It does not influence the quantization.

lvhan028 assigned AllentDan Dec 9, 2024

AllentDan closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

量化警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors #2866

量化警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors #2866

量化警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors #2866

量化警告：Token indices sequence length is longer than the specified maximum sequence length for this model (1085165 > 16384). Running this sequence through the model will result in indexing errors #2866

Comments