HTML Parser

This example shows how to create an Java application that downloads the html code from a link via Commomns-io and Jsoup , saves it to the hard drive , then creates word statistics and saves it to the Postgres database.

Getting Started
Development environment
Used technologies

Getting Started

First of all, you have to clone my repo. Just type it in terminal:

git clone https://github.com/egrva/html-parser.git

After that, you should go to the folder jar-file

cd jar-file

So, to run application you should choose any link, for example www.google.com . this link serves as an argument.

java -jar html-parser-1.0-SNAPSHOT.jar https://www.example.com

Jar file does not contain saving to database, but saving is described in code. Downloaded htmlfiles you can find in storage directory.

Development environment

If you want launch this app with saving in database, you should:

Create table in database

create table words
(
	word varchar,
	num_of_occur int
);

Change info in db.properties file

Used technologies

Maven to buid project
Java language
slf4j and Logback for logging
jsoup for work with html files
Commons-io to download files
PostgreSQL as db
JUnit for testing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
jar-file		jar-file
src		src
README.md		README.md
pom.xml		pom.xml

Uh oh!

Repository files navigation

HTML Parser

Getting Started

Development environment

Used technologies

About

Uh oh!

Releases

Packages

Uh oh!

Languages

egrva/html-parser

Folders and files

Latest commit

History

Repository files navigation

HTML Parser

Getting Started

Development environment

Used technologies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages