How to handle tags #189

chillum-codeX · 2025-01-20T07:48:31Z

I am training a model for en to spanish and spanish to english but when I am doing translation unable to handle tags
Input
for sutures is an art<g1> </g1>learned through practice, the knowledge of what<g2> </g2>happens at the cellular and

The output should looks like
Les sutures sont un art<g1> </g1> appris par la pratique, la connaissance de ce qui<g2> </g2> se passe au niveau cellulaire.

But I am getting
Translation:
Les coutures sont un art appris par la pratique, la connaissance de ce qui se passe au niveau cellulaire et pour

The text was updated successfully, but these errors were encountered:

vince62s · 2025-01-20T08:50:54Z

What is probably super clear in your head is not in your post above, please clarify what you mean with your tag thing and post your config.

chillum-codeX · 2025-01-20T17:24:59Z

Input
for sutures is an art<g1> </g1>learned through practice, the knowledge of what<g2> </g2>happens at the cellular and

Output
fra_Latn
Les coutures sont un art appris par la pratique, la connaissance de ce qui se passe au niveau cellulaire et pour

The translation is not maintaining the tags

I am using pretrained weight nllb-200-1.3B-onmt.pt

Or any suggestions would help me to fine-tune the data on pretrained weight
I need to handle tags also while translating.

chillum-codeX · 2025-01-20T17:40:16Z

Any suggestions will help me a lot.

francoishernandez · 2025-01-23T10:40:57Z

@chillum-codeX I edited your comments with backticks to properly show the html-like tags, otherwise hidden by Github rendering.

Most models should be able to naively output some form of tags in examples such as provided, even if not placed properly. The fact that there are none at all might indicate some tokenization/vocab related issue. (E.g. if the tags are tokenized improperly, and can't be handled as is by the token.)

Also, you mention some pretrained model nllb-200-1.3B-onmt.pt. The .pt model format support has been dropped for quite a while, so you might want to start by migrating to a more recent codebase and models. If such conversion has already been done, provide details that might help understand the root cause of the issue.

chillum-codeX · 2025-01-29T18:10:39Z

Than how to fine tune nllb with EOLE
I want to train en to spa and spa to eng on domain-specific data

Previously I have used openmt py

francoishernandez · 2025-01-30T17:38:55Z

I just opened #204 to facilitate the conversion of NLLB models. I'll close this issue as it's not very specific. If you need further assistance, please open a topic in the Discussions tab, which is more appropriate for broad support requests.

I probably won't add any finetuning recipe there for now, but it should be quite straightforward to adapt from the opennmt ones (#69). Feel free to PR a valid finetuning config once you get it running!

francoishernandez closed this as completed Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to handle tags #189

How to handle tags #189

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

How to handle tags #189

How to handle tags #189

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!