OpenXML - Infopath RichText Box to Word Document gives formatting errors

https://stackoverflow.com/questions/16171488

11-04-2022
|

Question

I've setup Rich Text Box in InfoPath form, my program parses through the Infopath XML as below:

XPathNavigator formNameNode = root.SelectSingleNode("/my:myFields/my:Responses/my:Q1", nsMgr);
string response1 = formNameNode.InnerXml;

The following code is then used to open a word document and get a Plain Text Content Control called response1:

    using (WordprocessingDocument myDoc =
WordprocessingDocument.Open(ms, true))
    {
        MainDocumentPart mainPart = myDoc.MainDocumentPart;

    List<OpenXmlElement> sdtList = InfoPathToWord.GetContentControl(mainPart.Document, "response1");
            InfoPathToWord.AddRichText(0, response1, ref mainPart, ref sdtList);
}

The code then calls InfoPathToWord.AddRichText which is as below:

public static void AddRichText(int id, string rtfValue,
          ref MainDocumentPart mainPart, ref List<OpenXmlElement> sdtList)
        {
            if (sdtList.Count != 0)
            {
                id++;
                string altChunkId = "AltChunkId" + id;
                AlternativeFormatImportPart chunk =
                  mainPart.AddAlternativeFormatImportPart(
                  AlternativeFormatImportPartType.Xhtml, altChunkId);

                using (MemoryStream ms = new MemoryStream(System.Text.Encoding.Default.GetBytes(rtfValue)))
                {
                    chunk.FeedData(ms);
                    ms.Close();
                }

                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;

                InfoPathToWord.ReplaceContentControl(sdtList, altChunk);
            }
        }

And finally the altChunk replaces the "response1"

    public static void ReplaceContentControl(
      List<OpenXmlElement> sdtList, OpenXmlElement element)
    {
        if (sdtList.Count != 0)
        {
            foreach (OpenXmlElement sdt in sdtList)
            {
                OpenXmlElement parent = sdt.Parent;
                parent.InsertAfter(element, sdt);
                sdt.Remove();
            }
        }
    }

The issue is that it replaces the text but the formatting is not correct and shows "?" character in the Output text. Not sure if its being caused because of encoding, I've also tried System.Text.Encoding.UTF8.GetBytes(rtfValue), System.Text.Encoding.ASCII.GetBytes(rtfValue) but none of this seems to help.

Please could someone tell me what I'm doing wrong.

Thanks in advance.

Mave

La solution

I'm using a regx to clean the string prior to save.

html = Regex.Replace(html, "/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/u", "") ' allows tab and other printable chars

Dim ms As New MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)) ' Create alternative format import part. Dim formatImportPart As AlternativeFormatImportPart = mainDocPart.AddAlternativeFormatImportPart("application/xhtml+xml", altChunkId)

Regex to remove all special characters from string?

UPDATE... after rigorous testing I've found too many character encoding issues with InfoPath RTF in a docx.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow