-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Traning does not converge, because of dataset too small? #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
copy config here from #333 for convenience
|
Short answer: yes. You may set epochs to 10000 to actually see any sufficient updates to weights. It does not guarantee a good model. |
But after set log interval=1 , the model is able to inference keypoints after 250 epoches. |
Ideally logging should not affect the training process. @jin-s13 could you investigate this? |
Please note that
, which means that if the number is small, the lr scheduling might still in the warm up stage. |
* Improve registry infer_scope * add warning info * set scope as mmengine when failed to infer it * refine message
Uh oh!
There was an error while loading. Please reload this page.
as addressed in #333, with the same tiny dataset which only have 38 images and 10 annotations, the trainning does not give useful model(no keypoints detected when use 'top_down_img_demo_with_mmdet.py . when i watch log info on screen, I see that the "mse_loss" and "loss" almost does not change along epoches.
[INFO ] text:_log_info:122 - Epoch [1][1/1] lr: 5.000e-07, eta: 0:14:13, time: 3.428, data_time: 2.963, memory: 3805, mse_loss: 0.0019, acc_pose: 0.0469, loss: 0.0019
...
[INFO ] text:_log_info:122 - Epoch [180][1/1] lr: 1.793e-05, eta: 0:03:53, time: 3.306, data_time: 2.907, memory: 4560, mse_loss: 0.0004, acc_pose: 0.9302, loss: 0.0004
The text was updated successfully, but these errors were encountered: