Question

I need to retain paragraph breaks in a .docx file, but get rid of linebreaks which are often in the wrong place when copying from one file to another (due to different page sizes, and when the font is changed).

Using the DocX Library, I'm trying this:

private void ReplaceLineBreaksWithBoo(string filename)
{
    List<string> lineBreaks;
    using (DocX document = DocX.Load(filename))
    {
        lineBreaks = document.FindUniqueByPattern("\n", System.Text.RegularExpressions.RegexOptions.None);
        if (lineBreaks.Count > 0)
        {
            foreach (string s in lineBreaks)
            {
                document.ReplaceText(s, string.empty); // <-- or a space?
            }
        }
        document.Save();
    }
}

...but it doesn't work - "\n" is not the right thing to pass, I reckon; I don't know what I need for that first arg to the FindUniqueByPattern() method. Documentation is nil and the discussion forum there resembles Bodie, California:

enter image description here

Was it helpful?

Solution

I guess you can't do it using FindUniqueByPattern or FindAll. Newline is not represented by any symbol but stored as a paragraph with empty text. You can peek document representation in xml format from document.Xml property, there you'll see empty line stored as single <w:p> element.

Therefore you can search for Paragraphs with empty text instead of searching for newline character :

using (DocX document = DocX.Load(filename))
{
    var emptyLines = document.Paragraphs.Where(o => string.IsNullOrEmpty(o.Text));
    foreach (var paragraph in emptyLines)
    {
        paragraph.Remove(false);
    }
    document.Save();
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top