質問

I read over some articles that tika in server mode improves performance. Can someone explain how? Can we implement similar functionality within our java application for better performance?

Running tika in server mode

役に立ちましたか?

解決 2

I looked at the code in TikaServer, only Parser object seems to be initialized. Other Socket related code is not required here. I tried out the code to initialize Parser only once, but didnt see any improvement (for extracting content of 100 files).

So as per vadchen's answer

  1. JVM initialization isn't an issue of a running application; it will anyway happen only once.
  2. Tika initializes Parser object, which doesn't seem to have much impact on performance.

So there isn't any performance improvement as claimed by the article.

他のヒント

In the example you provided when tika is executed as standalone application using jar, there are additional steps that are performed before actually processing pdf file. You can roughly split it to 3:

  1. JVM is instantiated
  2. Tika classes loaded and configured (e.g.: parsers, etc...)
  3. (only then) tika performs content processing

In server mode first two steps are performed on server startup, and it is ready to process files as it receives them.

You can do the same in your application if it performs some processing of input data and the processing time is measurably less that instantiating and configuring the app.

As for implementation you can have a look at tika source code

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top