Stars
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Interact with your documents using the power of GPT, 100% privately, no data leaks
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
keras implement of transformers for humans
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Graphene SQLAlchemy integration
pytorch handbook是一本开源的书籍,目标是帮助那些希望和使用PyTorch进行深度学习开发和研究的朋友快速入门,其中包含的Pytorch教程全部通过测试保证可以成功运行
"table2JSON", "table2XML", "table2PNG","table2CSV","table2Excel","table2Word","table2Powerpoint","table2txt","table2PDF"
中文拼写检查工具,用于对中文文本中的错误用语进行检测并给出纠正建议
hankcs / text-classification-svm
861A div>The missing SVM-based text classification module implementing HanLP's interface
Extract Keywords from sentence or Replace keywords in sentences.
HanLP Analyzer for Elasticsearch
HanLP Analyzer for Elasticsearch
for algorithm implementation and testing.
一个生产级、高性能、模块化、可扩展的中文NLP工具包。(中文分词、平均感知机、fastText、拼音、新词发现、分词纠错、BM25、人名识别、命名实体、自定义词典)
Samples for Spring Cloud Contract project