Question

I'm trying to open MS Word 2003 document in java, search for a specified String and replace it with a new String. I use APACHE POI to do that. My code is like the following one:

public void searchAndReplace(String inputFilename, String outputFilename,
            HashMap<String, String> replacements) {
    File outputFile = null;
    File inputFile = null;
    FileInputStream fileIStream = null;
    FileOutputStream fileOStream = null;
    BufferedInputStream bufIStream = null;
    BufferedOutputStream bufOStream = null;
    POIFSFileSystem fileSystem = null;
    HWPFDocument document = null;
    Range docRange = null;
    Paragraph paragraph = null;
    CharacterRun charRun = null;
    Set<String> keySet = null;
    Iterator<String> keySetIterator = null;
    int numParagraphs = 0;
    int numCharRuns = 0;
    String text = null;
    String key = null;
    String value = null;
        try {
            // Create an instance of the POIFSFileSystem class and
            // attach it to the Word document using an InputStream.
            inputFile = new File(inputFilename);
            fileIStream = new FileInputStream(inputFile);
            bufIStream = new BufferedInputStream(fileIStream);
            fileSystem = new POIFSFileSystem(bufIStream);
            document = new HWPFDocument(fileSystem);
            docRange = document.getRange();
            numParagraphs = docRange.numParagraphs();
            keySet = replacements.keySet();
            for (int i = 0; i < numParagraphs; i++) {
                paragraph = docRange.getParagraph(i);
                text = paragraph.text();
                numCharRuns = paragraph.numCharacterRuns();
                for (int j = 0; j < numCharRuns; j++) {
                    charRun = paragraph.getCharacterRun(j);
                    text = charRun.text();
                    System.out.println("Character Run text: " + text);
                    keySetIterator = keySet.iterator();
                    while (keySetIterator.hasNext()) {
                        key = keySetIterator.next();
                        if (text.contains(key)) {
                            value = replacements.get(key);
                            charRun.replaceText(key, value);
                            docRange = document.getRange();
                            paragraph = docRange.getParagraph(i);
                            charRun = paragraph.getCharacterRun(j);
                            text = charRun.text();
                        }
                    }
                }
            }
            bufIStream.close();
            bufIStream = null;
            outputFile = new File(outputFilename);
            fileOStream = new FileOutputStream(outputFile);
            bufOStream = new BufferedOutputStream(fileOStream);
            document.write(bufOStream);
        } catch (Exception ex) {
            System.out.println("Caught an: " + ex.getClass().getName());
            System.out.println("Message: " + ex.getMessage());
            System.out.println("Stacktrace follows.............");
            ex.printStackTrace(System.out);
        }
}

I call this function with following arguments:

HashMap<String, String> replacements = new HashMap<String, String>();
replacements.put("AAA", "BBB");
searchAndReplace("C:/Test.doc", "C:/Test1.doc", replacements);

When the Test.doc file contains a simple line like this : "AAA EEE", it works successfully, but when i use a complicated file it will read the content successfully and generate the Test1.doc file but when I try to open it, it will give me the following error:

Word unable to read this document. It may be corrupt. Try one or more of the following: * Open and repair the file. * Open the file with Text Recovery converter. (C:\Test1.doc)

Please tell me what to do, because I'm a beginner in POI and I have not found a good tutorial for it.

Was it helpful?

Solution

You could try OpenOffice API, but there arent many resources out there to tell you how to use it.

OTHER TIPS

First of all you should be closing your document.

Besides that, what I suggest doing is resaving your original Word document as a Word XML document, then changing the extension manually from .XML to .doc . Then look at the XML of the actual document you're working with and trace the content to make sure you're not accidentally editing hexadecimal values (AAA and EEE could be hex values in other fields).

Without seeing the actual Word document it's hard to say what's going on.

There is not much documentation about POI at all, especially for Word document unfortunately.

I don't know : is its OK to answer myself, but Just to share the knowledge, I'll answer myself.

After navigating the web, the final solution i found is : The Library called docx4j is very good for dealing with MS docx file, although its documentation is not enough till now and its forum is still in a beginning steps, but overall it help me to do what i need..

Thanks 4 all who help me..

You can also try this one: http://www.dancrintea.ro/doc-to-pdf/

Looks like this could be the issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top