epitar.gz

Highly customizable archive and index framework for EPITA.

Get started

Create a new config.yml (see config.sample.yml) to configure the EPITA services you wish to archive by specifying the associated archive module.
Configure your sonic instance in sonic.cfg.
Run the given docker-compose.yml file in order to start your sonic instance and a docconv container (word extractor for PDF files).
Run ./epitar start to start archiving and indexing.

How does it work

Archive modules

An archive module scrapes, downloads, or archives websites and services. These modules are highly customizable as they run in Docker containers.

Index

Archived files may be scanned to build a search index. PDF files words are extracted using regular methods or using an OCR for scanned documents.
Words are then processed by a sonic instance in order to build a fast search index.

UI & API

A UI is exposed along with an API to quickly search for files.

Contributing

Add an archive module

An archive module is highly customizable as it can be written in programming language as long as a valid Dockerfile is provided.
Your archive module must have a Dockerfile, a module.json and a README.

Dockerfile

Your Dockerfile can use any base image but try to keep the image size small.

The output directory for archived files must be /output.

module.json

Your module.json must provide informations about the website or service that is being archived.
Here is an example:

{
    "name": "Past-Exams",
    "slug": "past-exams",
    "url": "https://github.com/Epidocs/Past-Exams",
    "description": "Past subjects and other files, for the benefit of EPITA students. ",
    "logo": "https://github.com/fluidicon.png", // optional
    "authors": [
        {
            "name": "Aurele Oules",
            "email": "aurele@oules.com"
        }
    ]
}

README.md

You must provide a simple README.md that explains how to use this module.
An archive module may take environment variables as options so you may explain them here.

Other files

You may add any other files in the module directory but try to keep it organized and only commit necessary files.

You must edit the config.sample.yml file to provide an example on how to use your archive module.

License

MIT - Aurèle Oulès

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
api		api
archive		archive
cmd		cmd
config		config
db		db
docker		docker
docs		docs
models		models
modules		modules
ui		ui
util		util
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.sample.yml		config.sample.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
sonic.cfg		sonic.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

epitar.gz

Get started

How does it work

Archive modules

Index

UI & API

Contributing

Add an archive module

Dockerfile

module.json

README.md

Other files

License

About

Contributors 2

Languages

License

aureleoules/epitar.gz

Folders and files

Latest commit

History

Repository files navigation

epitar.gz

Get started

How does it work

Archive modules

Index

UI & API

Contributing

Add an archive module

Dockerfile

module.json

README.md

Other files

License

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages