I am writing a program that uses OCR (tessnet2) to scan an image file and extract certain information. This was easy before I found out that I was going to be scanning attachments of PDFs from an Exchange server.

The first problem I am working on is how to convert my PDFs to BMP files. From what I can tell so far of TessNet2, it can only read in image files - specifically BMP. So I am now tasked with converting a PDF of indeterminate size (2 - 15 pages) to BMP image. After that is done I can easily scan each image using the code I have built already with TessNet2.

I have seen things using Ghostscript to do this task - i'm just wondering if there was another free solution or if one of you fine humans could give me a crash course on how to do this using Ghostscript.

有帮助吗?

解决方案 2

Found a CodeProject article on converting PDFs to Images:

http://www.codeproject.com/Articles/57100/Simple-and-Free-PDF-to-Image-Conversion

其他提示

You can use ImageMagick too. And it's totally free! No trial or payment.

Just download the ImageMagick .exe from here. Install it and download the NuGet file in here.

There is the code! Hope I helped! (even though the question was made 6 years ago...)

Procedure:

     using ImageMagick;
     public void PDFToBMP(string output)
     {
        MagickReadSettings settings = new MagickReadSettings();
        // Settings the density to 500 dpi will create an image with a better quality
        settings.Density = new Density(500);

        string[] files= GetFiles();
        foreach (string file in files)
        {
            string fichwithout = Path.GetFileNameWithoutExtension(file);
            string path = Path.Combine(output, fichwithout);
            using (MagickImageCollection images = new MagickImageCollection())
            {
                images.Read(fich);
                foreach (MagickImage image in images)
                {
                    settings.Height = image.Height;
                    settings.Width = image.Width;
                    image.Format = MagickFormat.Bmp; //if you want to do other formats of image, just change the extension here! 
                    image.Write(path + ".bmp"); //and here!
                }
            }
        }
    }

Function GetFiles():

    public string[] GetFiles()
    {
        if (!Directory.Exists(@"your\path"))
        {
            Directory.CreateDirectory(@"your\path");
        }

        DirectoryInfo dirInfo = new DirectoryInfo(@"your\path");
        FileInfo[] fileInfos = dirInfo.GetFiles();
        ArrayList list = new ArrayList();
        foreach (FileInfo info in fileInfos)
        {
            if(info.Name != file)
            {
                // HACK: Just skip the protected samples file...
                if (info.Name.IndexOf("protected") == -1)
                    list.Add(info.FullName);
            }

        }
        return (string[])list.ToArray(typeof(string));
    }
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top