Found a working solution, though probably not the ideal one.
Replace all the current libraries (not using Maven) with tika-server-1.4.jar
Please feel free to comment.
Question
I have created a small java test project locally in my NetBeans IDE (7.4 on Mac OSX) in order to extract content and meta data from various files.
I've tried to extract PDF, TXT, and PPT, and the only Meta data I'm getting back is "Content-Type". I have tried both InputStream, and the new TikaInputStream, but have had no success so far.
I have compiled the 1.4 version of Tika, and added tika-parsers-1.4.jar and tika-core-1.4.jar to the project.
Hope someone can spot the obvious
public static void TikaExtract(String fileName) throws Exception {
TikaInputStream tikaStream = TikaInputStream.get(new File(fileName));
ContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
ParseContext context = new ParseContext();
parser.parse(tikaStream, textHandler, metadata, context);
//Check if there is anything in tikaStream
out.println("File Length: " + tikaStream.getLength());
out.println("Title: " + metadata.get("title"));
out.println("Content type: " + metadata.get("Content-Type"));
out.println("Author: " + metadata.get("Author"));
out.println("content: " + textHandler.toString());
System.out.println(tikaStream.toString());
tikaStream.close();
}
Output from the above code (with data/sample.pdf as input) looks like this:
File Length: 730808
Title: null
Content type: application/pdf
Author: null
content:
TikaInputStream of data/sample.pdf
La solution
Found a working solution, though probably not the ideal one.
Replace all the current libraries (not using Maven) with tika-server-1.4.jar
Please feel free to comment.
Autres conseils
Using tika-server instead of tika-core solved the problem for me, too. I was able to do this using Maven, via Grape.
That is, simply replacing:
@Grab(group='org.apache.tika', module='tika-core', version='1.4')
with:
@Grab(group='org.apache.tika', module='tika-server', version='1.4')
worked.