ocr
Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools,…
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Tesseract Open Source OCR Engine (main repository)
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Trained models with fast variant of the "best" LSTM models + legacy models
OCR, layout analysis, reading order, table recognition in 90+ languages
Toolkit for linearizing PDFs for LLM datasets/training