Question

We are going to develop tool which has to initialize large folder structure (for engineering project) with many structured MS Office documents (Word, Excel). So the question is about best suitable MS technology for this task. This task is very similar to building static content from templates in web applications.

I even thinking about {{CustomTemplateEngine}} inside office documents. But it's surely bad idea...

I know about VSTO, but it seems it is purposed generally for Extending Office with Add-Ins. Am I right?

Also it's preferred to implement this documentation generation module as Workflow and invoke it from various interfaces.

Well, any suggestions are welcome.

Was it helpful?

Solution

For docx, you could have a look at my presentation http://www.slideshare.net/plutext/document-generation-2012osdcsydney for an overview of approaches

For xlsx, see http://office.microsoft.com/en-au/excel-help/overview-of-xml-in-excel-HA010206396.aspx

I know about VSTO, but it seems it is purposed generally for Extending Office with Add-Ins. Am I right?

Correct. From a document generation perspective, you might use VSTO to create the authoring tool; that's the technology I use for authoring in the content control data binding approach.

At runtime (bulk generation), you can (and arguably should) avoid a dependency on Word. That would mean not using VSTO in your run time component.

OTHER TIPS

After spending a number of hours creating a fully templated data merging tool by automating MS Word via C# and VB.Net (with no small amount of vexation) I found that for generating documents in bulk it is very slow. MS Word does sneaky things behind the scenes while you are busy duplicating, deleting and replacing via code leading to headaches.

  • Performing replace operations in document positions prior to Ranges can cause the range to not only change location but size as well. Creating a data marge with a hierarchy of tags (as I am) will cause no small amount of pain managing the Ranges associated with them.
  • Word's is effectively brain dead in its ability to search and replace within itself. Removing redundant blank lines (as in addresses) is a simple task in open text but in Word it is a serious chore.
  • From a performance perspective there is the fact that you are dealing with COM automation and an application that is always busy wanting to do other things while you work. also the bigger the document and detail the slower Word becomes.
  • Finally from a deployment perspective who wants to have MS Word installed on their server or try to insure that a client as the correct (or complete) installation of Word?

Again after completing a full template processing system built around Word I found I could load a document and try to generate 3,700 or so PDF's in 3 hours before Word itself crashed on a 69 page master/detail document. Without crashing I can get about 2 documents per second on a REAL DOCUMENT.

Picture of WordMerge Tool

Contrast this with a commercial library I found on the web. I was able to convert my code to use the library in 2 days give or take. The speed increase was stellar - almost 20 documents per second on an impressive three page master/detail with headers, footers, page numbers etc. The same input that crashed Word after 3 hours sailed through the commercial library in under 5 minutes - including the 69 page doc. I also gained the ability to create one large document (easily) rather than thousands of individual ones.

Overall I'd say if you're doing this for business and your number of docs is small, your feature list simple, and you don't mind dealing with Word Quirks then go with Word otherwise create your Docs in Word and build your app around a solid commercial library.

As a last resort you can build your docs in Word or Google Docs and use one of the many could based services for creating and emailing documents in bulk.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top