8000 GitHub - SaudAljaloud/RdfLiteralStats: A tool to generate some stats about RDF files
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

SaudAljaloud/RdfLiteralStats

Repository files navigation

RdfLiteralStats

This is a tool for providing statistics about RDF data files in terms of literal usage.

Parsing RDF is done by the Jena arq (Apache License, Version 2.0).

To compile the code:

mvn clean install
mvn dependency:copy-dependencies

give yourself a permission to run the script below, if needed. Then,

sh run2.sh <dir/to/rdf>

The result will be printed to screen and to three files (termsMap.txt, Histogram.txt stats.txt).
stats.txt contains the same form as the sample below.
Histogram.txt contains the distribution of the word usage across the literal values within the RDF.
termsMap.txt contains the words (while space tokened) included in the literal values.

Licensing:

This code is licensed under the MIT License attached in the parent directory.

Output Sample:

Triples Count: 17686178
Number of Literal objects with duplicates: 9129307
Number of Literal objects NO duplicates: 2046362
Average literals against triples: 0.51618314
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyTextual6	 10043
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyNumeric6	 10170
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyNumeric4	 24939
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyTextual4	 24998
http://xmlns.com/foaf/0.1/mbox_sha1sum                      	 25592
http://xmlns.com/foaf/0.1/name                              	 25592
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyTextual5	 27430
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyNumeric5	 27569
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyTextual1	 50000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyTextual2	 50000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyTextual3	 50000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyNumeric3	 50000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyNumeric1	 50000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/productPropertyNumeric2	 50000
http://www.w3.org/2000/01/rdf-schema#label                  	 76085
http://www.w3.org/2000/01/rdf-schema#comment                	 76085
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating3	 349500
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating1	 349625
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating2	 349912
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/rating4	 350090
http://purl.org/dc/elements/1.1/title                       	 500000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/reviewDate	 500000
http://purl.org/stuff/rev#text                              	 500000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/price	 1000000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/deliveryDays	 1000000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/validFrom	 1000000
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/validTo	 1000000
http://purl.org/dc/elements/1.1/date                        	 1601677
Plain Literals: 1203354
Typed-Literals: 7925953
The average length of literals: 89
http://www.w3.org/2001/XMLSchema#string                     	 212471
http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/USD	 1000000
http://www.w3.org/2001/XMLSchema#date                       	 1601677
http://www.w3.org/2001/XMLSchema#dateTime                   	 2500000
http://www.w3.org/2001/XMLSchema#integer                    	 2611805
Number of words with duplicates: 83003158
Word count histogram:
1              	 412372
2              	 186427
3              	 61793
4              	 15380
5              	 3100
6              	 508
7              	 65
8              	 12
9              	 2
10             	 3238
100            	 90880
1000           	 406
10000          	 219
100000         	 10

About

A tool to generate some stats about RDF files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0