-
Notifications
You must be signed in to change notification settings - Fork 67
Request for Official HF → OLMoE Checkpoint Conversion Script #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you share which you are referring to? Maybe you could also share the script you already wrote? I'm not sure atm if such a script exists but tagging some people in case they know cc @soldni @swj0419 |
Hey @Muennighoff, I meant to load the pretrained checkpoint to continually pretrain it futher, the model is needed to be in the OLMo class format right? I wrote the code to convert it to that class using the attached script.
I wanted to know if the above code I wrote is safe for this conversion as I want to pretrain the allenai/OLMoE-1B-7B-0924-Instruct checkpoint from HF. After converting this way, and checking in the model path in the pretraining code, it expects a file metadata.json which I am attaching: I am not sure how to create this file. |
Basically I want to know how can I continually pretrain allenai/OLMoE-1B-7B-0924-Instruct checkpoint from HF using the available OLMoE pretraining code. |
Looks pretty good! I think the metadata file specifies some shapes for FSDP - to recreate it I think you can simply train a dummy model from scratch for 1 step with the FSDP distribution setup you want, save its checkpoint and take its metadata.json |
I actually realized that the script I pasted above is imperfect and leads to errors. I tried improving but couldn't suceed in loading the converted model using:
Now I got more confused with a few things, for pretraining OLMoE, the model class still remains OLMo? Are there any new keys introduced for OLMoE? Or am I loading it incorrectly? Updated script I tried:
|
yes! There are just some new keys related to the experts and gating that you should have from the ckpt |
Hey Niklas Upon multiple trials to convert the HF checkpoint to OLMo class, I still am encountering issues like: Missing key(s) in state_dict: "transformer.wpe.weight". and shape mismatch errors. Although I printed all shapes and they look fine, still the script isn't working. Can you provide the checkpoint in the OLMo format so that I can load it directly? like there are checkpoints available on HuggingFace which are loaded without transformers library. Or if you can provide the script.
|
Tagging @aman-17 here 🙌 |
@aditi184, did you solve this? The errors like missing keys such as transformer.wpe.weight and unexpected ones like q_norm.weight are expected if there’s a structural mismatch between the HF checkpoint and what the OLMo class expects. It’s likely due to architectural differences. |
Yes I have written the conversion script for it I can create a PR if you would like to include it in the code-base. |
Hi OLMo team,
I'm currently working on converting a Hugging Face model (allenai/OLMoE-1B-7B-0924-Instruct) into OLMo/OLMoE's pretraining checkpoint format to resume pretraining. While I was able to convert the model weights, I encountered issues with missing metadata.json, which seems to be a critical component for loading the checkpoint in restore_checkpoint().
After examining the metadata.json generated when training OLMo from scratch, I realized that it contains non-trivial fields and additional metadata about the architecture and checkpoint format. Reconstructing this file seems error-prone, and I couldn't find documentation or scripts for safely performing this conversion.
Would it be possible for you to provide an official Hugging Face → OLMo checkpoint conversion script, or any guidance on the exact format required for metadata.json? This would be extremely helpful for me as I am working on a project that requires continual pretraining of the OLMoE- Instruct model using OLMo from existing HF checkpoints.
Alternately, it would also be helpful if you can share the OLMoE checkpoint that is accepted by the pretraining script (https://github.com/allenai/OLMo/blob/Muennighoff/MoE/scripts/train.py)
Thanks for your help!
The text was updated successfully, but these errors were encountered: