CoverUP PDF Redaction Software

CoverUP is a free software, developed in Python, designed to provide a secure and straightforward method for redacting PDF files and their subsequent optical character recognition (OCR).

This version is fork of the original work at https://github.com/digidigital/CoverUP. Two major changes are OCR of documents and the possibility to specify name of the document along the program name. This enables to open files in a file manager or from a command line.

Name of the actual app to be used is pdfanon on Linux or pdfanon.exe on Windows.

A PDF document, which is opened in pdfanon, is first converted into images and all textual information is removed. Users can then conceal sensitive text passages visible in the images by overlaying them with black or white bars. Finally, when saving the document, the overlayed parts of the images are deleted and the remaining visible text is OCR-ed.

The OCR feature requires that the Tesseract OCR engine is installed on the system.

Installation on Linux

Installation requires administration rights. If you are not a sudo user, ask the administrator to install the software.

Steps:

Download content of this repository (either using git or the provided zip file).
Unzip the zip file, if using it.
Enter the created directory and run the install.sh script. The pdfanon script will be placed in the /usr/local/bin directory.

Comment 1: do not delete the directory after installation. It contains the code.

Comment 2: If you want to hide the directory by adding . (dot) as the first character of its name. Do that prior to running the script.

Comment 3: The script was developed on a debian-like system and tested on recent Ubuntu (python 3.12) and older Mint (python 2.8). If you want to use it on e. g. a rpm-based system, you need to do some changes in the script (changing apt for rpm and perhaps other).

Installation on Windows

Installation requires administration rights. If you do not have them, ask the administrator to install the software.

Steps:

Install TesseractOCR from Github. During installation select the needed languages
Add path to TeseractOCR (most likely C:\Program Files\Tesseract-OCR) in the Path system variables (a guide for Windows 11).
Download the most recent zip file from a Google Drive folder, extract the pdfanon.exe file and place it somewhere, where the system can find it, e.g. in the C:\Program Files\Tesseract-OCR directory.
If required, set you default OCR language by setting the COVERUP_OCR_LANG environment variable.

Building the `pdfanon.exe` program

Install Python
Repeat first two steps of the previous section
Add path to python programs to the path system variable (e.g. C:\Users\username\AppData\Local\Programs\Python\Python312\Scripts). If set correctly, the pip program should be accessible in the cmd window.
In a cmd window install pyinstaller by running pip install pyinstaller
Download content of this repository (either using git or the provided zip file).
Unzip the zip file, if using it
In cmd window enter the directory and install dependencies by pip install -r requirements.txt
In the same window run


pyinstaller --add-data "Fonts\MaterialSymbolsOutlined[FILL,GRAD,opsz,wght].ttf":Fonts --onefile --windowed --splash splash.png -n pdfanon CoverUP.py

The `pdfanon.exe` will be placed in the `dist` subdirectory.

Specification of the OCR languages

Default OCR language is slk (Slovak), which is hardcoded in the CoverUP.py file. Documents in English seem to be OCR-ed correctly as well. On Linux languages installed by the install.sh script are eng, slk, ces, deu and fra, more can be added by the apt program (or similar). On Windows additional languages can be installed by the TesseractOCR installation program.

Currently, there is no possibility to select language from the program itself.

One can specify OCR language by setting the COVERUP_OCR_LANG environment variable, for example by adding the line

export COVERUP_OCR_LANG=deu

to the user's .bashrc file if using the bash shell or performing an equivalent action for other shells or on Windows. Refer to tesseract documentation to get the proper code for your specific language.

In order to set the language temporarily, in bash you can run it by

export COVERUP_OCR_LANG=deu; pdfanon file.pdf

The actually used OCR language and file name are displayed in the program's title bar.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Fonts		Fonts
Screenshots		Screenshots
CoverUP.ico		CoverUP.ico
CoverUP.py		CoverUP.py
LICENSE		LICENSE
PySimpleGUI_4_60.py		PySimpleGUI_4_60.py
README.md		README.md
install.sh		install.sh
pdfanon.desktop		pdfanon.desktop
pdfanon.svg		pdfanon.svg
requirements.txt		requirements.txt
snapcraft.yaml		snapcraft.yaml
splash.png		splash.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoverUP PDF Redaction Software

Installation on Linux

Installation on Windows

Building the `pdfanon.exe` program

Specification of the OCR languages

About

Uh oh!

Releases

Packages

Languages

License

milossramek/CoverUP

Folders and files

Latest commit

History

Repository files navigation

CoverUP PDF Redaction Software

Installation on Linux

Installation on Windows

Building the pdfanon.exe program

Specification of the OCR languages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Building the `pdfanon.exe` program

Packages