Fine-tuning fogetfulness #163

davidress-ILW · 2024-07-28T22:52:09Z

I am working on fine-tuning a model and running into a "forgetful" situation I wanted to bring to your attention.

The 2 changes we made to the finetuning Jupyter notebook are:

create PyCharm Python script
Change output and provide scores

model: urchade/gliner_small
json: sample_data.json
num_steps = 500
batch_size = 8
data_size = 57
num_batches = 7
num_epochs = 7

Before training results:
Cristiano Ronaldo > Person > 0.9846
Ballon d'Or > Award > 0.9413
UEFA Men's Player of the Year Awards > Award > 0.8620
European Golden Shoes > Award > 0.9594

After training, using final model:
Cristiano Ronaldo dos Santos Aveiro > Person > 0.9472
Ballon d'Or awards > Award > 0.8051
UEFA Men's Player of the Year Awards > Award > 0.9852
European Golden Shoes > Award > 0.9863
outfield player > Person > 0.8722

Model retained original entities (although the scores changed) and even predicted a new entity. So I think the finetuning Juypter file works for your sample data just fine.

Our data set is composed of 72 records, which after the 90% split,
there are 64 records in the training set, 8 in the test set. All records
are for a single label, EntC.

num_steps = 500
batch_size = 8
data_size = 64
num_batches = 8
num_epochs = 62

Before training, results are:
EntA > OurLabel > 0.8799
EntA > OurLabel > 0.8288
EntB > OurLabel > 0.7210
EntA > OurLabel > 0.8052
EntA > OurLabel > 0.7026
EntC > OurLabel > 0.5243
EntA > OurLabel > 0.7475

After training, results are:
EntC > OurLabel > 1.0000

The model now finds EntC with a score of 1.000, but it is as if the last model completely forgot all other entities except EntC.
Any thoughts as to why the forgetfulness could be happening?

While I cannot disclose the entity names or label, I can say that all entities are three-characters long.

Any suggestions are appreciated, thank you.

urchade · 2024-08-03T13:01:59Z

Hi @davidress-ILW

It seems like your model is experiencing catastrophic forgetting, where it heavily overfits to the new data (EntC) and forgets the previous entities. This is a common issue in continual learning and fine-tuning scenarios.

To mitigate this problem, you can use Experience Replay. This involves maintaining a buffer of orginal data (in this case the pile ner dataset) and periodically use these samples during training. By doing this, you can ensure that the model retains knowledge of the previously learned entities while learning new ones.

KUMBLE · 2024-08-06T01:40:52Z

Adding pile ner data with my training data fixed this issue.

@urchade What is the best ratio of mixing pile ner data set with our training data set?

pile ner has 45K+ entries my training data has only 200+ entries.

davidress-ILW · 2024-08-07T13:23:00Z

@urchade Thank you. I really appreciate you sharing your knowledge with me and the broader community by answering these questions. I found the pile NER data so as @KUMBLE mentioned, is there is preferred means of mixing the pile NER data with out custom data sets?

The software you have developed, GLiNER, GraphER, etc are simply fabulous.

urchade · 2024-08-07T13:44:45Z

Hi @davidress-ILW. You can try this. Let:

Sample A: Your new dataset
Sample B: Sampled dataset from pile-ner (eg.: 2x size of Sample A)

Then, mix Sample A and Sample B to create a new data for training. (optionally) Use another Sample B after each epoch .

davidress-ILW · 2024-08-28T14:24:41Z

Hello @urchade

Thank you for the reply on mixing training data with ner pile data.

For my testing, I found Sample B needed to be 5x the size of Sample A

I then mixed Sample A and Sample B (shuffled) to randomize the data.

I say 5x as that ration enabled GLiNER to predict everything found before fine-tuning at high scores, entities that were missed, with the "best" model found during the fine-tuning. So, the fine-tuning appeared to work.

However, I notices that the eval_loss metric was always between 220 and 270 (regardless of mix, ie, 2x, 3x, 4x, and 5x), which I do not understand. Is there a way to extract all the training metrics from a fine-tuning? Should I be concerned about the high eval_loss values?

Thank you again for the efforts you and your team have put into GLiNER. So much easier to fine-tune than other NER models. I also appreciate the support.

urchade · 2024-08-28T16:07:45Z

Hi @davidress-ILW,
I recommend focusing more on metrics like the F1-score rather than relying heavily on the loss metric. The loss value is influenced by several factors, and a value of 200 might be close to the lower bound, especially since the loss reduction is set to sum by default. Additionally, the number of spans in an input is L*K, where L represents the length and K is the maximum span size.

ChristinaPetschnig · 2025-04-18T17:10:44Z

Hi dear @davidress
as a part of my master thesis I am currently fine-tuning gliner for my master thesis and I wanted to ask you, if you extended the fine tuning script by adding calculating the f1 score during training and taking the f1 score as the metrics for saving the best model. I am running into errors when I try that and it would help me so much.
Best regards,
Christina

Priyabrata017 · 2025-05-02T16:43:41Z

Hi @ChristinaPetschnig Did you get the script to calculate F1 score while finetuning?

@urchade It would be great if you can modify the training script to include F1 score metric as the original scripts has only training loss and validation loss

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning fogetfulness #163

Fine-tuning fogetfulness #163

Fine-tuning fogetfulness #163

Fine-tuning fogetfulness #163

Comments