I'm trying to build an application that can read PDF files. I use this guide:

http://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET

but do not understand what it means by "file" is the entire url from your computer. Because when I try it as it says that it is in the wrong format.

String file = "C:/project/test2.pdf";
// create an instance of the pdfparser class
PDFParser pdfParser = new PDFParser();

// extract the text
String result = pdfParser.ExtractText(file);

Wrong message:

Error 1 No overload for method 'ExtractText' takes 1 arguments

有帮助吗?

解决方案

If you want to extract pdf text into a astring, try to use PdfTextExtractor.GetTextFromPage, a sampe code:

public string ReadPdfFile(string fileName)
{
    var text = new StringBuilder();

    if (File.Exists(fileName))
    {
        var pdfReader = new PdfReader(fileName);

        for (int page = 1; page <= pdfReader.NumberOfPages; page++)
        {
            var strategy = new SimpleTextExtractionStrategy();
            string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

            currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
            text.Append(currentText);
        }
        pdfReader.Close();
    }
    return text.ToString();
}

其他提示

I think the ExtractText have two arguments one is PDF Source file and Second is Text Destination file

So try like below and your error get resolved:

pdfParser.ExtractText(file,Path.GetFileNameWithoutExtension(file)+".txt");

First of all, you should correctly specify path. You can download test project from the link to codeproject which you have posted.

And you should use it like that:

string sourceFile =  "C:\\Folder\\File.pdf";
string outputFile =  "C:\\Folder\\File2.txt"

PDFParser pdfParser = new PDFParser();
pdfParser.ExtractText(sourceFile, outputFile);

UPD: You use it WRONG (and you certainly get your error: cannot implicitly convert bool to string):

string result = pdfParser.ExtractText(sourceFile, outputFile);

RIGHT WAY IS:

pdfParser.ExtractText(sourceFile, outputFile);
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top