سؤال

I have an RTF file with a content like this:

{\object\objemb{\*\objclass Excel.Sheet.12}\objw8415\objh3015{\*\objdata 
01050000
02000000
0f000000...}}}

(may be Excel or Word)

What I need is to extract the \objdata part into an external file to be able to edit it. After that, the file shall be converted back to an embedded object in an RTF file.

I already searched around, and it seems that this is not a trivial problem. From this post and with a small modification, I tried to get access to the objdata and to save it to file, but this does not lead to a valid Excel file:

if (RtfReader.MoveToNextControlWord(enumerator, "objdata"))
{
    byte[] data = RtfReader.GetNextTextAsByteArray(enumerator);
    using (MemoryStream packageData = new MemoryStream())
    {
        RtfReader.ExtractObjectData(new MemoryStream(data), packageData);
        File.WriteAllBytes(@"c:\temp\some-excel.xls", ReadToEnd(packageData));
    }
}

Are there any ideas out there how to achieve the mentioned goals?

Thanks a lot in advance for any help!

هل كانت مفيدة؟

المحلول

In this case, the content of the objdata is a Compound File. You can spot the famous 'd0cf11e0' header (looks like "docfile"). More on this here: Developing a tool to recognise MS Office file types ( .doc, .xls, .mdb, .ppt ).

I have written a small example that you can use to extract the data. You can use it like this:

        string ole = "2090_Object_Text_0.ole"; // your file
        string text = File.ReadAllText(ole);
        DocFile.Save(text, "mydoc.doc"); // you should adapt this depending on the object class (Word.Document.8 is a .doc).

And the DocFile helper code:

public static class DocFile
{
    // magic Doc File header
    // check this for more: http://social.msdn.microsoft.com/Forums/en-US/343d09e3-5fdf-4b4a-9fa6-8ccb37a35930/developing-a-tool-to-recognise-ms-office-file-types-doc-xls-mdb-ppt-
    private const string Header = "d0cf11e0";

    public static void Save(string text, string filePath)
    {
        if (text == null)
            throw new ArgumentNullException("text");

        if (filePath == null)
            throw new ArgumentNullException("filePath");

        int start = text.IndexOf(Header);
        if (start < 0)
            throw new ArgumentException(null, "Text does not contain a doc file.");

        int end = text.IndexOf('}', start);
        if (end < 0)
        {
            end = text.Length;
        }

        using (MemoryStream bytes = new MemoryStream())
        {
            bool highByte = true;
            byte b = 0;
            for (int i = start; i < end; i++)
            {
                char c = text[i];
                if (char.IsWhiteSpace(c))
                    continue;

                if (highByte)
                {
                    b = (byte)(16 * GetHexValue(c));
                }
                else
                {
                    b |= GetHexValue(c);
                    bytes.WriteByte(b);
                }
                highByte = !highByte;
            }
            File.WriteAllBytes(filePath, bytes.ToArray());
        }
    }

    private static byte GetHexValue(char c)
    {
        if (c >= '0' && c <= '9')
            return (byte)(c - '0');

        if (c >= 'a' && c <= 'f')
            return (byte)(10 + (c - 'a'));

        if (c >= 'A' && c <= 'F')
            return (byte)(10 + (c - 'A'));

        throw new ArgumentException(null, "c");
    }
}
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top