Question

What solutions are there? I know only solutions for replacing Bookmarks in Word (.doc) files with Apache POI?

Are there also possibilities to change images, layouts, text-styles in .doc and .ppt documents?

I think about replacement of areas in Word and PowerPoint documents for bulk processing.

Platform: MS-Office 2003

Was it helpful?

Solution

What are your platform limitations?

Obviously Apache POI will get you at least part of the way there.

Microsoft's own COM API's are fairly powerful and are documented here. I would recommend using them if a) you are not running in a server (many users, multithreaded) environment; b) you can have a proper version of powerpoint installed on the production machine; and c) you can code against a COM object model.

OTHER TIPS

It's a bit pricey, but Aspose.Slides is a very powerful library for manipulating PowerPoint files

If you include using other Office suits as an option, here's a list of possible solutions:

Using POI you can't edit .pptx file format, but you don't depend on the apps installed on the system. Other two options, on the contrary, make use of other apps, but they are definitely better for dealing with presentations. OpenOffice has better compability with older formats, by the way. Also if you use UNO, you'll have a great choice of languages, UNO exists for Java, C++, Python and other languages.

My experience is not directly with Power Point, but I've actually rolled my own WordML (XML) generator. It a) removed all dependencies on Word, b) was very fast c) and let me build up documents from scratch.

But it was a lot of work to create. And I was only creating a write only implementation.

I'm not as familiar with Power Point, so this is conjecture, but you may be able to roll your own by reading XML (Power Point 2003??) and/or cracking the Office Open XML file (zipped XML), then using XPath to manipulate the data, and then saving everything back to disk.

This won't work on older OLE Compound Document based Power Point files though.

I've done something like that before: programmatically accessed and manipulated PowerPoint presentations. Back when I did it, it was all in C++ using COM, but similar principles apply to C#/VB .NET apps, since they do COM interop very easily.

What you're looking for is called the Office Document Model. Basically, Office applications expose their documents programmatically, as trees of objects that define their contents. These objects are accessible via an API, and you can manipulate them, add new ones, and do whatever other processing you want. It's exceedingly powerful; you can use it to manipulate pretty much all aspects of a document. But you'll need an installation of Office and Visual Studio to be able to use it.

Some links:

Hope this helps!

Apparently new users can only include one link per posting. How lame! :)

Here's the other link I meant to include:

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top