Is there a way to extract text from PostScript (.ps , .eps) files using Java?
-
19-09-2019 - |
Question
I am looking for a solution similiar to PDFBox for PDFs of Apache Tika, however, for PS files.
thanks.
Solution
Like James Black says, it's probably best just to convert to PDF and use your familiar tools.
However, there does exist pstotext which is available in, e.g., the Ubuntu universe in its own package.
Ghostscript itself also comes with both ps2txt and ps2ascii which can also do this.
OTHER TIPS
You could use Ghostscript to convert to a pdf, http://www.osalt.com/ghostscript, then there are various libraries to handle a pdf.
This has an advantage in that you are only pulling from PDFs, so you can handle other formats as long as you can convert them to PDFs.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow