Disclaimer: This post is based on my understanding of RabbitMQ as part of my current discovery process, so take it with a grain of salt, any mistruths or noteworthy additions will be fixed or appended as they come to light.
Out of the box RabbitMQ offers 3 types of multinode configuration: Clustering (with/without high availability), Federation and Shovelling.
Clustering (active/active queues):
RabbitMQ documentation explains it better than I ever could:
All data/state required for the operation of a RabbitMQ broker is replicated across all nodes, for reliability and scaling, with full ACID properties. An exception to this are message queues, which by default reside on the node that created them, though they are visible and reachable from all nodes. 1
High Availability Clustering (active/passive queues):
High availability (HA) clustering is the same as normal clustering, however queues are mirrored across all the nodes in the cluster. A node is elected as the master, all other nodes are slaves. When the master fails, the next oldest slave is elected as master. Any messages no synchronized between the master and slave are lost. 2
One of the pitfalls of Clustering, when a node disconnects abruptly from the cluster (i.e. loss of network, server crash etc) it will not gracefully reconnect. A manual process is required to reconnect the node to the cluster. Clustering requires a highly available network (i.e. lan)
Federation allows you to run disparate brokers in a non-clustered fashion, but still propagate messages between exchanges. Federation will copy messages from one exchange to one or many others (entirely configurable). I can see uses for federation in sharded architectures. 3
Shovelling, while similar in concept to federation, differs in that rather than copying a message from one exchange to another, it acts like a pipe and moves the messages instead.
A great comparison of the different configurations can be found here. Since RabbitMQ is a broker and not a bus, it comes with the inherent architecture attributes of a centralized message based router.
One of the primary differences between an NServiceBus/MassTransit implementation using MSMQ and one that uses RabbitMQ is that MSMQ used in a bus architecture, in a high availability scenario clustering is only required for the distributors . It also means subscribers can operate in a truly autonomous fashion, without message loss during failure like you might experience with a RabbitMQ cluster (unless, of course, you have an unrecoverable HDD failure – in which case messages will be lost). I’ll address the second point first.
In a RabbitMQ HA clustered environment, if the master drops from the cluster, we know that any unsynchronized messages will be lost (my understanding is regardless of where the messages existed; master or slave, unsychronized == gone). In my testing (granted, not extensive) during a simulated network failure, I saw messages being “double processed” after the network split and both nodes assumed the role of master, with both clients on those machines inserting their copy of the same messages into the shared database (when the network was reconnected). This resulted in 1009 messages inserted, when only 1000 were sent. I would like to retest this to confirm the behavior. This means it is critical messages are idempotent when using high availability clustering, an ideal situation we should strive for that is not always an easy thing to achieve…
In the same scenario, with a bus (ala NServiceBus/MSMQ) implementation, using a clustered distributor, due to local queues, transactional (durable) messaging, and store and forward support, the message will only exist in either the distributors queue, or the consumers local queue (provided you didn’t do something stupid like make the consumer read from a remote queue). This is the same for RabbitMQ (again, don’t read messages from a remote machine). In the event of a network failure, a consumer can operate without connectivity to the distributor. All work can be processed locally. Both MSMQ and RabbitMQ will allow the consumers to operate autonomously in this configuration.
Where the concepts begin to diverge, however, is in the recovery stage. With MSMQ, the message will be placed in the local outbound queue while waiting for the connection to be remade (in my experience this can take some time – restarting the msmq instance can expedite the process). When MSMQ is able to reconnect to any of recipients of the messages in the outbound queues, the messages will be dispatched automatically*. This means in an event of a catastrophic event (e.g. complete network or power failure) there is no issue of order when machines rediscover one another. RabbitMQ on the other hand requires a more manual, and thought out process when rejoining any broker to the cluster.4
Because the distributor is not distributed across the consumers machines (like you would want in a HA RabbitMQ broker architecture) when part of the distributor cluster goes down, all consumers will pull from the true master. Unlike RabbitMQ, where the perceived ‘master’ is a matter of where you are positioned. As an added advantage, there is no additional configuration required to join a cluster as you scale out consumers (admittedly, a trivial task with Rabbit but a task nonetheless – script it!), as the only part of your architecture that needs to be clustered in a high availability scenario are the distributors. So, in the imaginary super geeky scenario where you get hammered with load all of a sudden and need to re-purpose every machine in the building to cope with it, you can do so with relative ease… so long as your network policies are already pre-configured for MSDTC.
* If an exception occurs, for example due to connectivity problems – the message will be “retried” n times and then placed in the error queue. However, messages will not be lost or duplicated.