Question

We have a 64 bit linux machine and we make multiple HTTP connections to other services and Drools Guvnor website(Rule engine if you don't know) is one of them. In drools, we create knowledge base per rule being fired and creation of knowledge base makes a HTTP connection to Guvnor website.

All other threads are blocked and CPU utilization goes up to ~100% resulting into OOM. We can make changes to compile the rules after 15-20 mins. but I want to be sure of the problem if someone has already faced it.

I checked for "cat /proc/sys/kernel/threads-max" and it shows 27000 threads, Can it be a reason?

I have a couple of question:

  1. When do we know that we are running over capacity?
  2. How many threads can be spawned internally (any rough estimate or formula relating diff parameters will work)?
  3. Has anyone else seen similar issues with Drools? Concurrent access to Guvnor website is basically causing the issue.

Thanks,

Was it helpful?

Solution

I checked for "cat /proc/sys/kernel/threads-max" and it shows 27000 threads, Can it be a reason?

That number does look large but we dont know if a majority of those threads belong to you java app. Create a java thread dump to confirm this. Your thread dump will also show the CPU time taken by each thread.

When do we know that we are running over capacity?

You have 100% CPU and an OOM error. You are over capacity :) Jokes aside, you should monitor your HTTP connection queue to determine what you are doing wrong. Your post says nothing about how you are handling the HTTP connections (presumably through some sort of pooling mechanism backed by a queue ?). I've seen containers and programs queue requests infinitely causing them to crash with a big bang. Plot the following graphs to isolate your problem

  1. The number of blocking threads over time
  2. Time taken for each thread
  3. Number of threads per thread pool and how they increase / decrease with time (pool size)

How many threads can be spawned internally (any rough estimate or formula relating diff parameters will work)?

Only a load test can answer this question. Load your server and determine the number of concurrent users it can support at 60-70% capacity. Note the number of threads spawned internally at this point. That is your peak capacity (allowing room for unexpected traffic)

Has anyone else seen similar issues with Drools? Concurrent access to Guvnor website is basically causing the issue

I cant help there since I've not accessed drools this way. Sorry.

OTHER TIPS

I am basing my answer on the assumption that you are creating a knowledge base for each request, and this knowledge base creation incudes the download of latest rule sources from Guvnor please correct if I am mistaken.

I suspect that the build /compilation of packages is taking time and hog your system.

Instead of compiling packages on each and every request, you can download pre build packages from guvnor, and also you can cache this packages locally if your rules does not change much. Only restriction is that you need to use the same version of drools both on guvnor and in your application.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top