New iteration on convert HF, fix some models, support Qwen3/Qwen3MoE #238

francoishernandez · 2025-05-07T10:16:18Z

We'll probably never have a perfect solution to handle every HF cases, but it doesn't hurt to keep rationalizing a few things.

Addressed topics

centralize mappings and configs in separate file
clarify encoder/decoder keys mapping (previously, decoder stuff would be in the root mapping, whereas encoder would be in the specific key)
first shards params are transparently grabbed from mapping root, instead of relying on a fixed set which is a hassle to maintain
move specific config flags to "config" key of main mapping
simplify shards building loop (ongoing -- shall we loop on params/map instead of checkpoints?)

~~while testing this, I checked Mixtral quickly, and it appears to have been broken for a while (even before previous refactoring); not sure if we'll fix this here or later~~ EDIT: MoE (Mixtral/Qwen3) seems fine after a few patches but AWQ is not -- though deprecated so not sure we want to dive back into it (might be better off investigating llm-compressor which replaces it)
~~Did not test all architectures yet (e.g. gpt2/nllb/xlmroberta)~~ EDIT: only XLM-roberta not fully tested
~~transformer decoder refactoring a while ago introduced post_attention_layernorm, which should probably be made optional (e.g. phi-2)~~ EDIT: introduced post_attention_layernorm flag (default True)

francoishernandez added 14 commits May 7, 2025 09:13

restructure HF mappings, simplify build shards loop

915fa37

fix flake

261a9e8

patch phi layer_norm

03e5a1a

fix GPT2 conversion

e434e08

fix NLLB conversion

bf58c86

patch XLMRoberta conversion

816db0e

make post_attention_layernorm configurable

c4ae678

patch is_seq2seq condition, mark phi-2 as validated

10ad175

black

a4035e2

support Qwen3 and Qwen3MoE

d889411

fix mixtral mapping

2cb1d56

update model-validator

8adc980

patch mixtral shared_layer_norm

ccb9cb2

black

11ab596

francoishernandez changed the title ~~New iteration on convert HF~~ New iteration on convert HF, fix some models, support Qwen3/Qwen3MoE May 15, 2025