Using JCR with very large query results of >> 100k hits. How far does it scale?

https://stackoverflow.com/questions/19314590

30-06-2022
|

Question

I was testing running some simple queries (SELECT * from [myNodeType]) on a large node set of 100.000 up to > 1 million nodes. It seems like query performance is going down rather quickly on result sets larger than 100k hits.

Has someone experienced similar issues with jackrabbit? Am i hitting a design limitation? Is there any way to deal with such large result sets? (also memory usage seems quite excessive)

Solution

The total number of nodes is not the problem, we are running repositories with a similar size.

The architecture of your permission model and the queries itself have a bigger impact.

Jackrabbit will evaluate the permissions of all of your query hits based on the utilized session. The exception are administrative sessions which ar enot permision checked at all. Due to the resource based access control concept, complexity of your permission concept e.g. by excessive usage of inheritance and the total number of hits will influence the response time of your query.

Generally speaking making the query more specific by introducing addional parameters that limit the size of your resultset and using a more simple permission architecture or bypassing access control completly (which is not recommended for security reasons in many use cases) increases the response time of your queries.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow