문제

In our application we are processing files using Apache Tika. But there are some files (e.g. *.mov, *.mp4) which Tika cannot process and leaves the corresponding *.tmp file in the user's Temp folder. After some research I found that it is a known bug: https://issues.apache.org/jira/browse/TIKA-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

In the last comment a user talks about a workaround but it does not work for me:

final Tika tika = new Tika();
final TikaInputStream fileStream = TikaInputStream.get(/*some InputStream*/);
try {
    final String extractedString = tika.parseToString(fileStream);
    //do something with the string
} finally {
    CloseUtils.close(fileStream);
}

Using the code above still leaves temp files in the Temp folder. What could be a solution to this?

도움이 되었습니까?

해결책

The get() method should be called with a File object instead of an InputStream:

final File file = new File("c:/some_file.mov");
final TikaInputStream fileStream = TikaInputStream.get(file);

Tika still cannot process it but it actually manages to delete the correspondig tmp file.

다른 팁

Another workaround is disabling the org.apache.tika.parser.mp4.MP4Parser. Two solutions are here:

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top