keyword_extractor

⚠️ Experimental Package — Not ready for production

keyword_extractor is a Dart package for extracting keywords from structured text data.

It supports:

✅ Basic word-based tokenization
✅ Word prefix and phrase n-gram generation
✅ Field-specific keyword extraction

Features

Extracts keywords from Map<String, dynamic> data
Works with any object that provides .toMap() or .toJson()
Swappable tokenizer strategies:
- DefaultTokenizer: simple word splitting
- AdvancedTokenizer: word prefixes + phrase n-grams
SelectiveKeywordExtractor for targeting specific fields

Getting Started

import 'package:keyword_extractor/keyword_extractor.dart';

void main() {
  final data = {
    'title': 'Improving search accuracy with keyword extraction',
    'summary': 'This article explores simple and advanced tokenization techniques.',
  };

  final extractor = DefaultKeywordExtractor(
    tokenizer: const DefaultTokenizer(),
  );

  final keywords = extractor.extract(data);
  print(keywords);
}

Selective Field Extraction

final extractor = SelectiveKeywordExtractor(
  tokenizer: const AdvancedTokenizer(),
  fields: ['title'], // extract only from the 'title' field
);

final keywords = extractor.extract(data);
print(keywords);

Input & Output Example

Input Map:

{
  "title": "Improving search accuracy with keyword extraction",
  "summary": "This article explores simple and advanced tokenization techniques."
}

DefaultTokenizer Output:

[
  "improving",
  "search",
  "accuracy",
  "with",
  "keyword",
  "extraction",
  "this",
  "article",
  "explores",
  "simple",
  "and",
  "advanced",
  "tokenization",
  "techniques"
]

AdvancedTokenizer Output (partial):

[
  "imp",
  "impr",
  "impro",
  "improv",
  "improvi",
  "improvin",
  "improving",
  "sea",
  "sear",
  "searc",
  "search",
  "keyword extraction",
  "extraction techniques",
  "simple and advanced",
  "improving search",
  "search accuracy",
  "accuracy with keyword"
]

Tokenizers

Tokenizer	Description
`DefaultTokenizer`	Splits text on spaces and punctuation
`AdvancedTokenizer`	Adds word prefixes and phrase n-gram tokens

Disclaimer

This package is experimental and under active development.
Do not use it in production environments. APIs may change, and edge cases may not be fully covered yet.

Roadmap

Stopword filtering
Fuzzy variant generation
Nested field/key support
Token ranking and weighting

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
example		example
lib		lib
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
analysis_options.yaml		analysis_options.yaml
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

keyword_extractor

Features

Getting Started

Selective Field Extraction

Input & Output Example

Tokenizers

Disclaimer

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

plokmij/keyword_extractor

Folders and files

Latest commit

History

Repository files navigation

keyword_extractor

Features

Getting Started

Selective Field Extraction

Input & Output Example

Tokenizers

Disclaimer

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages