Question

I am trying to remove paragraphs that contains "{Some Text}". The method below does just that, but I noticed that after I remove the paragraphs, there are empty paragraph elements left over.

How can I remove <w:p /> elements programmatically?

Below is what I initially used to remove paragraphs.

 using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true))
        {
            MainDocumentPart mainPart = wordDoc.MainDocumentPart;
            Document D = mainPart.Document;

            foreach (Paragraph P in D.Descendants<Paragraph>())
            {
                if (P.InnerText.Contains("{SomeText}"))
                {
                    P.RemoveAllChildren();
                    //P.Remove();   //doesn't remove
                }
            }
            D.Save();
        }

This is how the document.xml looks like afterwords:

<w:p />
<w:p />
<w:p />
<w:p />
<w:p />
<w:p />
<w:p />
Was it helpful?

Solution

The problem here:

        foreach (Paragraph P in D.Descendants<Paragraph>())
        {
            if (P.InnerText.Contains("{SomeText}"))
            {
                P.Remove();   //doesn't remove
            }
        }

Is that you are trying to remove an item from the collection while you are still iterating it. For some strange reason, the OpenXML SDK doesn't actually throw an exception here, it just silently quits the foreach loop. Attaching a debugger and stepping through will show you that. The fix is simple:

        foreach (Paragraph P in D.Descendants<Paragraph>().ToList())
        {
            if (P.InnerText.Contains("{SomeText}"))
            {
                P.Remove();   //will now remove
            }
        }

By adding ToList() you are copying (shallow copy) the paragraphs to a separate list and iterating through that list. Now when you remove a paragraph it is removed from the D.Descendants<Paragraph>() collection, but not from your list and the iteration will continue.

OTHER TIPS

The answer above helped me to create following code snippet which deletes paragraphs from begin to end (excluding begin and end). This approach is quite handy when you must use a template as input, but you do not want some parts of it in the output.

public void RemoveParagraphsFromDocument(string begin, string end)
{
    using (var wordDoc = WordprocessingDocument.Open(OutputPath, true))
    {
        var mainPart = wordDoc.MainDocumentPart;
        var doc = mainPart.Document;
        var paragraphs = doc.Descendants<Paragraph>().ToList();
        var beginIndex = paragraphs.FindIndex(par => par.InnerText.Equals(begin));
        var endIndex = paragraphs.FindIndex(par => par.InnerText.Equals(end));

        for (var i = beginIndex + 1; i < endIndex; i++)
        {
            paragraphs[i].Remove();
        }

        doc.Save();
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top