Question

I want to use the pdfminer for extracting the text info. I have downloaded the pdfminer-20131113. I have installed the python in C:\python34. Now using cmd, I am setting the path to the setup.py file of pdfminer. and running the following command.

python setup.py install

But I am getting the below error.

> D:\pdfminer-20101226>python setup.py install
Traceback (most recent call last):
  File "setup.py", line 3, in <module>
    from pdfminer import __version__
  File "D:\pdfminer-20101226\pdfminer\__init__.py", line 4
    if __name__ == '__main__': print __version__
                                               ^
SyntaxError: invalid syntax

It seems to be some error in the setup.py file of pdfminer, which I am not sure how to resolve.

Also, I saw a pdf2txt.py file in the build folder of pdfminer. I tried to use that also as pdf2txt.py -o output.html pdffilename.pdf (with full path). but instead of converting it. it opens the pdf2txt.py file.

Was it helpful?

Solution

The PDFMiner project homepage states:

Written entirely in Python. (for version 2.4 or newer)

and further down:

Install Python 2.4 or newer. (Python 3 is not supported.)

so you'll have to install Python 2 to run this project.

Alternatively, you could try the Python 3 port, pdfminer3k; it hasn't seen any updates in 20 months, while PDFMiner does have more recent releases, so your mileage may vary.

OTHER TIPS

This should solve your problem in Python 3

pip install pdfminer.six

pdfminer.six is a fork with Python 2+3 support using six. Last commit was 15 days ago.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top