SpanMarker for Named Entity Recognition

🤗 Models | 🛠️ Getting Started In Google Colab | 📄 Documentation

SpanMarker is a framework for training powerful Named Entity Recognition models using familiar encoders such as BERT, RoBERTa and DeBERTa. Tightly implemented on top of the 🤗 Transformers library, SpanMarker can take advantage of its valuable functionality.

Based on the PL-Marker paper, SpanMarker breaks the mold through its accessibility and ease of use. Crucially, SpanMarker works out of the box with many common encoders such as bert-base-cased and roberta-large, and automatically works with datasets using the IOB, IOB2, BIOES, BILOU or no label annotation scheme.

Documentation

Feel free to have a look at the documentation.

Installation

You may install the span_marker Python module via pip like so:

pip install span_marker

Quick Start

Please have a look at our Getting Started notebook for details on how SpanMarker is commonly used. It explains the following snippet in more detail.

from datasets import load_dataset
from span_marker import SpanMarkerModel, Trainer
from transformers import TrainingArguments

dataset = load_dataset("DFKI-SLT/few-nerd", "supervised")
labels = dataset["train"].features["ner_tags"].feature.names

model_name = "bert-base-cased"
model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

args = TrainingArguments(
    output_dir="my_span_marker_model",
    learning_rate=5e-5,
    gradient_accumulation_steps=2,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=1,
    save_strategy="steps",
    eval_steps=200,
    logging_steps=50,
    fp16=True,
    warmup_ratio=0.1,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=dataset["train"].select(range(8000)),
    eval_dataset=dataset["validation"].select(range(2000)),
)

trainer.train()
trainer.save_model("my_span_marker_model/checkpoint-final")

metrics = trainer.evaluate()
print(metrics)

Because this work is based on PL-Marker, you may expect similar results to its Papers with Code Leaderboard results. Tests, documentation and further information on expected performance will come soon.

Pretrained Models

tomaarsen/span-marker-bert-base-fewnerd-fine-super is a model that I have trained in just 4 hours on the finegrained, supervised Few-NERD dataset. It reached a 0.7020 Test F1, competitive in the all-time Few-NERD leaderboard. My training script resembles the one that you can see above.
- See this Weights and Biases report for training details.
- Try the model out online using this 🤗 Space.

Changelog

See CHANGELOG.md for news on all SpanMarker versions.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
span_marker		span_marker
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
make.bat		make.bat
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpanMarker for Named Entity Recognition

Documentation

Installation

Quick Start

Pretrained Models

Changelog

About

Uh oh!

Releases

Packages

Languages

License

cceyda/SpanMarkerNER

Folders and files

Latest commit

History

Repository files navigation

SpanMarker for Named Entity Recognition

Documentation

Installation

Quick Start

Pretrained Models

Changelog

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages