-
Notifications
You must be signed in to change notification settings - Fork 183
Fine-tuning fogetfulness #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems like your model is experiencing catastrophic forgetting, where it heavily overfits to the new data (EntC) and forgets the previous entities. This is a common issue in continual learning and fine-tuning scenarios. To mitigate this problem, you can use Experience Replay. This involves maintaining a buffer of orginal data (in this case the pile ner dataset) and periodically use these samples during training. By doing this, you can ensure that the model retains knowledge of the previously learned entities while learning new ones. |
Adding pile ner data with my training data fixed this issue. @urchade What is the best ratio of mixing pile ner data set with our training data set? pile ner has 45K+ entries my training data has only 200+ entries. |
@urchade Thank you. I really appreciate you sharing your knowledge with me and the broader community by answering these questions. I found the pile NER data so as @KUMBLE mentioned, is there is preferred means of mixing the pile NER data with out custom data sets? The software you have developed, GLiNER, GraphER, etc are simply fabulous. |
Hi @davidress-ILW. You can try this. Let:
Then, mix Sample A and Sample B to create a new data for training. (optionally) Use another Sample B after each epoch . |
Hello @urchade Thank you for the reply on mixing training data with ner pile data. For my testing, I found Sample B needed to be 5x the size of Sample A I then mixed Sample A and Sample B (shuffled) to randomize the data. I say 5x as that ration enabled GLiNER to predict everything found before fine-tuning at high scores, entities that were missed, with the "best" model found during the fine-tuning. So, the fine-tuning appeared to work. However, I notices that the eval_loss metric was always between 220 and 270 (regardless of mix, ie, 2x, 3x, 4x, and 5x), which I do not understand. Is there a way to extract all the training metrics from a fine-tuning? Should I be concerned about the high eval_loss values? Thank you again for the efforts you and your team have put into GLiNER. So much easier to fine-tune than other NER models. I also appreciate the support. |
Hi @davidress-ILW, |
Hi dear @davidress |
Hi @ChristinaPetschnig Did you get the script to calculate F1 score while finetuning? @urchade It would be great if you can modify the training script to include F1 score metric as the original scripts has only training loss and validation loss |
I am working on fine-tuning a model and running into a "forgetful" situation I wanted to bring to your attention.
The 2 changes we made to the finetuning Jupyter notebook are:
model: urchade/gliner_small
json: sample_data.json
num_steps = 500
batch_size = 8
data_size = 57
num_batches = 7
num_epochs = 7
Before training results:
Cristiano Ronaldo > Person > 0.9846
Ballon d'Or > Award > 0.9413
UEFA Men's Player of the Year Awards > Award > 0.8620
European Golden Shoes > Award > 0.9594
After training, using final model:
Cristiano Ronaldo dos Santos Aveiro > Person > 0.9472
Ballon d'Or awards > Award > 0.8051
UEFA Men's Player of the Year Awards > Award > 0.9852
European Golden Shoes > Award > 0.9863
outfield player > Person > 0.8722
Model retained original entities (although the scores changed) and even predicted a new entity. So I think the finetuning Juypter file works for your sample data just fine.
Our data set is composed of 72 records, which after the 90% split,
there are 64 records in the training set, 8 in the test set. All records
are for a single label, EntC.
num_steps = 500
batch_size = 8
data_size = 64
num_batches = 8
num_epochs = 62
Before training, results are:
EntA > OurLabel > 0.8799
EntA > OurLabel > 0.8288
EntB > OurLabel > 0.7210
EntA > OurLabel > 0.8052
EntA > OurLabel > 0.7026
EntC > OurLabel > 0.5243
EntA > OurLabel > 0.7475
After training, results are:
EntC > OurLabel > 1.0000
The model now finds EntC with a score of 1.000, but it is as if the last model completely forgot all other entities except EntC.
Any thoughts as to why the forgetfulness could be happening?
While I cannot disclose the entity names or label, I can say that all entities are three-characters long.
Any suggestions are appreciated, thank you.
The text was updated successfully, but these errors were encountered: