Background
I have implemented the consumer-producer pattern supporting multiple consumer threads, and multiple producers.
The consumers wait on a pulse to compete for a job:
private void DistributedConsume()
{
while (_continueConsuming)
{
try
{
Monitor.Wait(_workerLockObject);
IModelJob job = _jobProvider.GetNextJob();
if (job != null)
{
//we found and dequeued job for processing
if (job.JobType != JobType.TerminateWorker)
{
job.PrioritizationStatus = PrioritizationStatus.Dequeued;
try
{
job.Execute();
}
catch (ThreadAbortException)
{
LoggerFacade.Log(LogCategory.Debug, "WorkerController.DistributedConsume(): Thread {0} has been aborted", Thread.CurrentThread.Name);
throw;
}
catch (Exception ex)
{
LoggerFacade.Log(LogCategory.Debug, "WorkerController.DistributedConsume(): Unhandled exception on thread {0}", Thread.CurrentThread.Name);
}
}
else
{
_continueConsuming = false;
}
}
}
catch (ThreadAbortException)
{
LoggerFacade.Log(LogCategory.Debug, "WorkerController.DistributedConsume(): Thread {0} has been aborted", Thread.CurrentThread.Name);
throw;
}
catch (Exception ex)
{
ExceptionHandler.ProcessException(ex, "Unexpected Exception occured attempting to fetch the next job");
}
}
LoggerFacade.Log(LogCategory.OperationalInfo, "WorkerController.DistributedConsume(): Thread {0} is exiting", Thread.CurrentThread.Name);
}
The executing job can consume quite a large amount of memory, but as you can see the job is declared with inner scope, so I expect the job to go out of scope on each loop, and thus it and all its referenced data will qualify for garbage collection.
The Problem
A job executes, consumes large amounts of memory, the job completes, and the consumer thread loops back to the Monitor.Wait and waits on a pulse. By this stage the job is out of scope and no longer referenced. What is problematic is that if no new job is queued and the thread is not pulsed the memory usage stays very high - no matter how long I wait. Using WinDbg I can also see that the job on the heap along with all descendant objects it references.
This might make you think I have a memory leak, but WinDbg also verifies that the job object is not rooted. But as soon as we submit another job to the system which this same consumer thread picks up, the memory is released and objects are cleaned up. But if other consumers pick up this new job it appears the memory is not released.
I can't make sense of this. None of my paranoid theories of what might be happening match up with my understanding of how the G.C works.
Theory 1. The job has made it to gen 1 or 2, and so isn't going to be collected unless the system is starved of memory. But then it doesn't make sense that it would collect it as soon as the thread is woken and begins executing another job as memory is far from being low.
Theory 2. I don't have a second theory worth sharing.
When the system is busy with many automated jobs a consumer won't sit waiting for too long so the problem isn't visible. But on quiet days where only users manually submit a few large jobs questions are being raised about why the service is sitting idle with such a high memory usage. Because of this it is not a critical issue but it is perplexing.
Any pointers?
Thanks