A standalone Java command line tool that converts DOC, DOCX, PPT and PPTX documents to pdf files. (Requires JRE 7.)
Why?
I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program.
Usage:
java -jar doc-converter.jar -type "type" -input "path" -output "path" -verbose
eg.
java -jar doc-converter.jar -input test.doc
java -jar doc-converter.jar -i test.ppt -o ~\output.pdf
java -jar doc-converter.jar -i ~\no-extension-file -o ~\output.pdf -t docx
Parameters:
-inputPath (-i, -in, -input) "path" : specifies a path for the input file
-outputPath (-o, -out, -output) "path" : specifies a path for the output PDF, use input file directory and name.pdf if not specified (Optional)
-type (-t) [doc | docx | ppt | pptx] : Specifies doc converter. Leave blank to let program infer by input extension (Optional)
-verbose (-v) : To view intermediate processing messages. (Optional)
Caveats:
This tool relies on Apache POI and docx4j libraries. They are not 100% reliable and the output format may not always be what you desire.
DOC and DOCX:
Generally ok. I notice that after conversion, the paragraph spacing tends to increase affecting your page layout.
PPT and PPTX
Resulting file is a PDF comprising of PNG images in each page. This is the limitation of the Apache POI and docx4j libraries.
Main Libraries
Apache POI: https://poi.apache.org/
docx4j: http://www.docx4java.org/
and others...
The MIT License (MIT)
Copyright (c) 2013-2014 Yeo Kheng Meng