This example shows how to create an Java application that downloads the html code from a link via Commomns-io and Jsoup , saves it to the hard drive , then creates word statistics and saves it to the Postgres database.
First of all, you have to clone my repo. Just type it in terminal:
git clone https://github.com/egrva/html-parser.git
After that, you should go to the folder jar-file
cd jar-file
So, to run application you should choose any link, for example www.google.com . this link serves as an argument.
java -jar html-parser-1.0-SNAPSHOT.jar https://www.example.com
Jar file does not contain saving to database, but saving is described in code.
Downloaded htmlfiles you can find in storage
directory.
If you want launch this app with saving in database, you should:
- Create table in database
create table words
(
word varchar,
num_of_occur int
);
- Change info in
db.properties
file
- Maven to buid project
- Java language
- slf4j and Logback for logging
- jsoup for work with html files
- Commons-io to download files
- PostgreSQL as db
- JUnit for testing