Question

I am using an ampq queue with my webcrawler - each crawler instance will get an url to crawl from a message in the queue then add the urls it has found to the queue.

As there will be multiple crawler instances each may find a the same url and add it to the queue.

Is there a built in way to tell rabbitmq to drop the message if the url is know, or to check the queue if a message with the url exists already?

Was it helpful?

Solution

No. There are no way to check message uniqueness with RabbitMQ mechanism.

AMQP queues and especially RabbitMQ queues are pure FIFO queues.

Probably, you have to implement uniqueness check on application side.

P.S.:

There are nifty workaround to declare queues with the same name as unique field (or it hash) with x-max-length set to 1 so duplicates will be lost if there are unprocessed message in queue. But this requires a lot of queues with urls (unique field - url hash) and thus is not the best solution , especially when it comes to consume all that messages from thousands of queues with non-obvious names.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top