Question

Convert a .doc or .pdf to an image and display a thumbnail in Ruby?
Does anyone know how to generate document thumbnails in Ruby (or C, python...)

Was it helpful?

Solution

A simple RMagick example to convert a PDF to a PNG would be:

require 'RMagick'
pdf = Magick::ImageList.new("doc.pdf")
thumb = pdf.scale(300, 300)
thumb.write "doc.png"

To convert a MS Word document, it won't be as easy. Your best option may be to first convert it to a PDF before generating the thumbnail. Your options for generating the PDF depend heavily on the OS you're running on. One might be to use OpenOffice and the Python Open Document Converter. There are also online conversion services you could try, including http://Zamzar.com.

OTHER TIPS

Sample code to answer the comment by @aisensiy above :

require 'rmagick'
pdf_path = "/path/to/interesting/file.pdf"
page_index_path = pdf_path + "[0]" # first page in PDF
pdf_page = Magick::Image.read( page_index_path ).first # first item in Magick::ImageList
pdf_page.write( "/tmp/indexed-page.png" ) # implicit conversion based on file extension

Based on the path clue in answer to another question :

https://stackoverflow.com/a/6369524/765063

Not sure about .doc support in any open source library but ImageMagick (and the RMagick gem) can be compiled with pdf support (I think it's on by default)

PDF support is a little buggy in ImageMagick - but it's by far the best OS way for ruby. There's also a google summer of code project for pure Ruby PDF support.

I've read stuff about using OpenOffice without the GUI to transform .doc files - but it'll be complicated at best.

As the 2 previous posters said, ImageMagick's probably the easiest way to generate the thumbnails.

You could exec something like:

´convert -size 300x300 doc.pdf doc.png´

(The backquotes tell Ruby to shell it out).

If you don't want to use exec to do the conversion you could use the RMagick gem to do it for you but it's probably a bit more of code.

If you don't mind paying for Imgix, it handles PDFs too. You get all the benefits of a fast CDN with it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top