Best approaches to reduce the number of searches between the filenet object stores to find a document based on the time of the document creation?

https://stackoverflow.com/questions/9952554

28-05-2021
|

Question

For example, there are 5 object stores. I am thinking of inserting documents into them, but not in sequential order. Initially it might be sequential, but if i could insert by using some ranking method it would be easier to know which object store to search to find the document. The goal is to reduce the number of object store searches. This can only be achieved if the insertion uses some intelligent algorithm.

One method i found useful is using the current year MOD N (number of object stores) to determine where a document goes. Could we have some better approaches to this?

Solution

If you want fast access there are a couple of criteria:

The hash function has to be reproducible based on the data which is queried. This means, a lot depends on the queries you expect.
You usually want to distribute your object as much evenly accross stores as possible. If you want to go parallel, you want to access each document for a given query from different stores, so they will not block each other. Hence your hashing function should spread as much as possible to different stores for similar documents. If you expect documents related to the same query to be from the same year, do not use the year directly.

This assumes, you want to be able to have fast queries which can be paralised. If you instead have a system in which you first have to open a potentially expensive connection to the store, then most documents related to the same query should go in the same store and you should not take my advice above.

OTHER TIPS

Your criteria for "what goes in a FileNet object store?" is basically "what documents logically belong together?".

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow