Question

So, I have an option of sending a document from a database to print either in PDF or XPS. I need to be able to extract specific data, such as name, date, etc. from one of those formats and inserting that data into a word template. The word template is not editable. You can only type within fields... each field has a heading before it, such as name, dob, etc.

Basically I need to be able to automate transferring that information from the PDF or XPS file into the word template.

I'm familiar enough with C++, Python and Java.. so I have no language preference -- whatever gets the job done.

Could you suggest a way I can manage to accomplish this? I've having a bit of a difficulty figuring out the way I can parse/extract data from one of those file types and which file type would be a better candidate. And I definitely have no idea how I can automate the population of fields in the Word Template.

Oh and forgot to mention, this is on Windows 7 (and maybe 8, but mostly 7) machines.

Thank a lot for your help in advance!

Was it helpful?

Solution

This is for anyone who has the same sort of question, so this is how I did it:

I used PDFBox (http://pdfbox.apache.org/) to parse the document and extract the needed data and then I used docx4j (http://www.docx4java.org/trac/docx4j) to insert data into word template. Both are incredible tools and have excellent communities that help out almost immediately.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top