Question

I am in the process of beginning to write a worker queue for node using node's cluster API and mongoose.

I noticed that a lot of libs exist that already do this but using redis and forking. Is there a good reason to fork versus using the cluster API?

edit and now i also find this: https://github.com/xk/node-threads-a-gogo -- too many options!

I would rather not add redis to the mix since I already use mongo. Also, my requirements are very loose, I would like persistence but could go without it for the first version.

Part two of the question: What are the most stable/used nodejs worker queue libs out there today?

Was it helpful?

Solution

Wanted to follow up on this. My solution ended up being a roll your own cluster impl where some of my cluster workers are dedicated job workers (ie they just have code to work on jobs).

I use agenda for job scheduling.

Cron type jobs are scheduled by the cluster master. The rest of the jobs are created in the non-worker clusters as they are needed. (verification emails etc)

Before that I was using kue but dropped it because the rest of my app uses mongodb and I didnt like having to use redis just for job scheduling.

OTHER TIPS

Have u tried https://github.com/rvagg/node-worker-farm? It is very light weight and doesn't require a separate server.

I personally am partial to cluster-master.

https://github.com/isaacs/cluster-master

The reason I like cluster master is because it does very little besides add in logic for forking your process, and give you the ability to manage the number of process you're running, and a little bit of logging/recovery to boot! I find overly bloated process management libraries tend to be unstable, and sometimes even slow things down.

This library will be good for you if the following are true:

  • Your module is largely asynchronous
  • You don't have a huge amount of different types of events triggering
  • The events that fire have small amounts of work to do, but you have lots of similar events firing(things like web servers)

The reason for the above list, is the reason why threads-a-gogo may be good for you, for the opposite reasons. If you have a few spots in your code, where there is a lot of work to do within your event loop, something like threads-a-gogo that launches a "thread" specifically for this work is awesome, because you aren't determining ahead of time how many workers to spawn, but rather spawning them to do work when needed. Note: this can also be bad if there is the potential for a lot of them to spawn, if you start launching too many processes things can actually bog down, but I digress.

To summarize, if your module is largely asynchronous already, what you really want is a worker pool. To minimize the down time when your process is not listening for events, and to maximize the amount of processor you can use. Unless you have a very busy syncronous call, a single node event loop will have troubles taking advantage of even a single core of a processor. Under this circumstance, you are best off with cluster-master. What I recommend is doing a little benchmarking, and see how much of a single core your program can use under the "worst case scenario". Let's say this is 33% of one core. If you have a quad core machine, you then tell cluster master to launch you 12 workers.

Hope this helped!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top