What's the best program / API for converting Word docs to PDF that does NOT require Office to be installed? [closed]

StackOverflow https://stackoverflow.com/questions/3815983

Question

Well, really the title of the question says it all. There are similar questions on SO already. But here's some differences... I absolutely cannot use Office on the server. I must be able to convert the document programmatically. I don't really care how much it costs. Obviously cheaper is better, but if you have a good suggestion that happens to be pricey, please feel free to include it.

------ Edit ------

I accepted the OpenOffice.org answer because it seemed like the most intriguing. However, I really am curious what other people think. Keep posting answers and voting and I'll accept whichever has the most votes.

Was it helpful?

Solution

OpenOffice can be run in a GUI-less server mode. Using it that way, you can connect to it, stream a document to it, and then convert to any type it supports and stream it back.

OTHER TIPS

You might want to have a look at the CloudConvert API. They are using native Office and they are a way cheaper than Aspose.

If you upload a document (.doc, .docx, .odf) to google documents you can download as a PDF. This is an easy free solution but it might be hard to integrate.

You might want to try Aspose, which is also used by Google.

IMO, PDFCreator (open source) is the best bet for your pupose. Install PDFCreator, then have a look at the COM subfolder, there you can find examples of how to use its API.

If you want to use OpenOffice then you may use from the command line unoconv with latest LibreOffice. This works in general but be prepared to:

  • have some more fancy formatting lost,
  • repeat conversion due to some exception on LibreOffice side.

I would discourage using UNO API directly. This is possible (and you may have a pool of running OpenOffice servers that will perform the conversion) but definitely not easy.

Google Docs is for me of no use as the conversion file size limit is 2 MB.

Using Convert API it is simple as this HTTP request:

POST https://v2.convertapi.com/doc/to/pdf?Secret=XXX&File=http://example.com/myfile.doc

if file is accessible from the internet. If file is not accessible from the internet, Convert API supports many ways how to pass a file for conversion.

I'd recommend using the conversion engine that ships with LibreOffice - unlike OpenOffice you can push through multi-threaded conversions. It supports both the main flavours of Word docs (i.e. .doc and .docx) with pretty good fidelity and is under active development.

Working out the right incantation to use from the command line can be tricky, so to make life easier you can use unoconv which acts as a wrapper and does (some) of the heavy lifting for you. It needs some hand holding from time to time, so is not completely unattended.

Alternatively if you want an even easier life you could use a commercial grade API such as https://developers.zamzar.com. This service has been around for 10+ years, provides an API for file conversion from any language (PHP, Python, Ruby, Java, .NET etc) and has bells and whistles to allow you to import and export files to and from Amazon S3, FTP servers etc.

Full disclosure: I'm the lead developer for the Zamzar API.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top