- Montreal
-
12:00
(UTC -12:00) - in/kantwalamonil
Pinned Loading
-
DeepSeek-R1-TrainingSuite
DeepSeek-R1-TrainingSuite PublicAdvanced implementation of DeepSeek-R1 featuring Group Relative Policy Optimization (GRPO) for mathematical reasoning AI. Integrates safe distillation, modular reward systems, and efficient LoRA fi…
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.