Amazon SQS Dead Letter Queue: Is it really dead letter or poison?

Question 1

Good question.

Based on the definition from the canonical source, which you quoted (citations removed for clarity):

The specific way a Dead Letter Channel works depends on the specific messaging system’s implementation, if it provides one at all. The channel may be called a “dead message queue” or “dead letter queue.” Typically, each machine the messaging system is installed on has its own local Dead Letter Channel so that whatever machine a message dies on, it can be moved from one local queue to another without any networking uncertainties. This also records what machine the message died on. When the messaging system moves the message, it may also record the original channel the message was supposed to be delivered on.

...it's not clear if there's really a difference. I understand what you mean by "poison queue," and your understanding of how SQS works is sound. Semantically, the difference between a DLQ and a PQ -- "undeliverable" in the style of email versus "poison" -- isn't clear to me. Perhaps a PQ is a flavor of a DLQ.

FWIW, ActiveMQ's redelivery policy uses the same definition of DLQ -- a hybrid DLQ / PQ -- as SQS does.

Can SQS behave like a message bus?

SQS can't, but there are similar products that can.

Amazon SNS

SNS (Simple Notification Service) is a generalized publish-subscribe topic system. SNS allows you to create topics, and then register subscribers that receive push notifications. Currently, push notifications can come in the form of HTTP/S, email, SMS, SQS, and mobile device push notifications.

SNS has a pretty sane retry policy for HTTP/S, but does not support a DLQ or PQ AFAIK.
IronMQ's Push Queues

IronMQ is another REST-ful message queueing service that is a little more fully-featured than SQS. (True FIFO message ordering, longer delays, and so on, but sadly smaller message sizes.) Push queues allow you to set up push "subscribers," which then receive an HTTP POST any time a new message is put onto the queue.

If IronMQ fails to deliver a message -- the HTTP POST times out, or your endpoint returns anything but a 2xx -- then it will retry the delivery. If it runs out of retries, then it will put the message onto an error queue -- a combination DLQ and PQ in this case.

This is probably as close as you're going to get to a true "ESB" in a managed service.

Of course, then there are true open-source ESBs and SOA frameworks -- MULE, ServiceMix, and so on -- but I don't know nearly enough about what you're trying to do to make any kind of recommendation there. :)

Question 2

I'm not sure in most cases that a distinction between DLQ and PQ is necessary. In fact I find this definition to be rather arbitrary. For most transactional messaging implementations if the message isn't successfully consumed off the queue within the specified number of retries it goes to the DLQ. Having a separate queue for malformed messages means that you now just have two places to look for your messages that aren't being successfully processed, two exception queues to monitor or operational considerations, and some percentage of messages that seem like they might belong in either queue(batch processing scenarios come to mind).

Question 3

No, it will not behave like an active ESB. Simple Queue Service is simple by definition. There is an "at-least-once" delivery guarantee, but beyond that it makes very few promises.

It's designed only for polling/long polling. You can have multiple queues each serving a different purpose, but a single queue is very simple and not intended to service multiple functions or provide advanced logic. SWF may provide what you want, but chances are you'll need to implement an ESB.

http://aws.amazon.com/swf/