8000 GitHub - karhunenloeve/TesseRACT: OCR recognition for PDFs.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository has been archived by the owner on Jun 1, 2024. It is now read-only.

karhunenloeve/TesseRACT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ocrTESSERACT

This repository is a tutorial for using the Tesseract software. It is a simple Shell script. You can run it simply over any of your directories. Before you start, please note, that a config.jsonis necessary to 5F49 launch the script.

In the following I'll explain which properties you have to determine.

Argument config.json Attribute Description
results     resultFiles directory of the image folder
createTXT     createTXT     trueor false whether a .txt will be created or not
createPDF createPDF     trueor false whether a searchable .pdf will be created or not
deleteIMG deleteIMG     trueor false whether you want to delete the orig. image or not
dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

predictions=`jq '.predictionFiles' config.json | xargs -n 1`
results=`jq '.resultFiles' config.json | xargs -n 1`
createTXT=`jq '.createTXT' config.json | xargs -n 1`
createPDF=`jq '.createPDF' config.json | xargs -n 1`
deleteIMG=`jq '.deleteIMG' config.json | xargs -n 1`

After all insert the directory with the images to be recognized into the results-attribute in the config.json-file and run the script with the following command:

bash setup.sh

Used tools

  • jq - Reading and writing .json with shell script
  • Imagemagick – Imagemagick, image processing
  • Tesseract – Tesseract - OCR tool

About

OCR recognition for PDFs.

Topics

Resources

License

Stars

Watchers

Forks

Languages

0