How to read text from Microsoft word document into memory sentence by sentence?

StackOverflow https://stackoverflow.com/questions/22280976

  •  11-06-2023
  •  | 
  •  

Вопрос

I am doing Microsoft Word add-in in C#. I read text from word document by Selection function. But I don't need to read all text into memory but I can read text sentence by sentence into memory, because my word document is very large. I know, there is Range function, but this function can divide word.

Это было полезно?

Решение

This code allows you to read each paragraph from a Word document.

I made a few adaptations in the code provided here

There's also this SO question that uses an adaptation from the mantascode link.

I don't really know if this will help you, as

Word.Documents.Open()

is already loading the entire file in memory (and it's prohibitively slow for large files)

Reading the doc once and storing the result in a string seems to be the fastest you can do it.

using System;
using System.Globalization;

public class Program {
    private static void Main(string[] args) {
        var wordDocParagraphReader = new WordDocParagraphReader(@"E:\someDoc.docx");
        Console.WriteLine(wordDocParagraphReader.GetParagraph(0));
        Console.ReadLine();
        wordDocParagraphReader.Docs.Close();
        wordDocParagraphReader.Word.Quit();
    }
}

public class WordDocParagraphReader {
    public int ParagraphsCount { get; private set; }
    public Microsoft.Office.Interop.Word.Document Docs { get; private set; }
    public Microsoft.Office.Interop.Word.Application Word { get; private set; }


    public WordDocParagraphReader(object @path) {
        Word = new Microsoft.Office.Interop.Word.Application();
        object miss = System.Reflection.Missing.Value;
        object readOnly = true;
        Docs = Word.Documents.Open(ref path,
                                   ref miss,
                                   ref readOnly,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss,
                                   ref miss);

        ParagraphsCount = Docs.Paragraphs.Count;
    }

    public string GetParagraph(int paragraphNumber) {
        if (paragraphNumber + 1 <= ParagraphsCount || paragraphNumber < 0) {
            return Docs.Paragraphs[paragraphNumber + 1].Range.Text.ToString(CultureInfo.InvariantCulture);
        }

        Console.WriteLine(String.Format("invalid paragraph requests {0} \n( the total paragraphs in file is {1})",
                                        paragraphNumber,
                                        ParagraphsCount));
        return string.Empty;
    }
}

Другие советы

using word = Microsoft.Office.Interop.Word;    

word.Document worddoc = new word.Document();

for (int abc = 1; abc < worddoc.Sentences.Count; abc++)

{

MessageBox.Show("Sentence value "+worddoc.Sentences[abc].Text.ToString());

}

This Code Would Provide you with all Sentences present into document one by one

This code Works For Me, Just Create and Open a word Document using Word Interop.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top