8000 GitHub - cceyda/SpanMarkerNER: SpanMarker for Named Entity Recognition
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SpanMarker for Named Entity Recognition

License

Notifications You must be signed in to change notification settings

cceyda/SpanMarkerNER

 
 

Repository files navigation

SpanMarker for Named Entity Recognition

SpanMarker is a framework for training powerful Named Entity Recognition models using familiar encoders such as BERT, RoBERTa and DeBERTa. Tightly implemented on top of the 🤗 Transformers library, SpanMarker can take advantage of its valuable functionality.

Based on the PL-Marker paper, SpanMarker breaks the mold through its accessibility and ease of use. Crucially, SpanMarker works out of the box with many common encoders such as bert-base-cased and roberta-large, and automatically works with datasets using the IOB, IOB2, BIOES, BILOU or no label annotation scheme.

Documentation

Feel free to have a look at the documentation.

Installation

You may install the span_marker Python module via pip like so:

pip install span_marker

Quick Start

Please have a look at our Getting Started notebook for details on how SpanMarker is commonly used. It explains the following snippet in more detail.

from datasets import load_dataset
from span_marker import SpanMarkerModel, Trainer
from transformers import TrainingArguments

dataset = load_dataset("DFKI-SLT/few-nerd", "supervised")
labels = dataset["train"].features["ner_tags"].feature.names

model_name = "bert-base-cased"
model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

args = TrainingArguments(
    output_dir="my_span_marker_model",
    learning_rate=5e-5,
    gradient_accumulation_steps=2,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    save_strategy="steps",
    eval_steps=200,
    logging_steps=50,
    fp16=True,
    warmup_ratio=0.1,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"].select(range(8000)),
    eval_dataset=dataset["validation"].select(range(2000)),
)

trainer.train()
trainer.save_model("my_span_marker_model/checkpoint-final")

metrics = trainer.evaluate()
print(metrics)

Because this work is based on PL-Marker, you may expect similar results to its Papers with Code Leaderboard results. Tests, documentation and further information on expected performance will come soon.

Pretrained Models

Changelog

See CHANGELOG.md for news on all SpanMarker versions.

About

SpanMarker for Named Entity Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.3%
  • Jupyter Notebook 32.4%
  • Other 1.3%
0