Rabbitmq performance improvement tips part 1

Make sure your queues stay short

Many messages in a queue can put a heavy load on RAM usage. In order to free up RAM, RabbitMQ start flushing (page out) messages to disk. This process deteriorates queueing speed. The page out process usually takes time and blocks the queue from processing messages when there are many messages to page out. Many messages might affect the performance of the broker negatively. Many messages in queue will also increase the time broker spend restart in order to rebuild the index.
in order to force to keep queues short where throughput is more important than anything else you can set a max-length on the queue or a ttl on messages.
Queue max-length force the size of the queue discarding exceeding messages from the head. Set message ttl discard messages older than ttl. The second approach maybe not be sufficient to mitigate spike of messages in your application.
If your application cannot loose messages you need to monitor queues size very frequently and react manually to solve the problem.

Message size matter

If you care about latency and message rates then use smaller messages. Both for queues in memory and for queues on disk, read and write operations take time (for queues on disk they are strongly buffered and asynchronous), and easy to image this time depends on the size of the message. In order to reduce the size a optimal format should chosen, prefer json over xml for example but also consider to compress the payload of the message. RabbitMQ don’t care about what is in the payload, is possible to use any format that sender and receiver can exchange, zip, protobuffer etc. reducing size for the broker and on the network.

Experimenting HiPE

It’s possible to enable HiPE, High performance Erlang for RabbitMQ.
Enabling HiPE RabbitMQ is compiled at start up. The throughput increases with 20-80% according to benchmarks. The only draw back is that the startup time increases quite a lot too, 1-3 minutes.
HiPE is still marked as experimental in RabbitMQ’s documentation but i’m using it in production since more than 1 year and seems to be very stable. However is also true that the improvement claimed by the creator are limited to just some use cases and not in general. In my experience the improvement is visible but not so high.

Prefer direct or a fanout exchange over topic or headers exchange

Considering this post about different exchange types is easy to understand that fanout and direct exchange are easier to implement that headers and topic, the difference in throughput is very very very small but when possible it better to use the easier implementation.

Give a price to message reliability

Persistent messages and durable queues are really good, they let messages survive at broker restarts, broker hardware failure, or broker crashes. To ensure that messages and broker definitions survive restarts, we need to ensure that they are on disk. Messages, exchanges, and queues that are not durable and persistent will be lost during a broker restart.
Persistent messages are heavier as they have to be written to disk. For high performance transient messages are the best choice this means that in your solution you can loose message in case of failures.

Explore lazy queues solutions for your architecture

Consider that enabling lazy queues messages go directly on the disk and loaded when they are needed, normally this decrease performance however in some situation for example batch processing or if consumer is slower than producer or triggered after some time, lazy queues performance are more predictable and memory consumption have not peaks. This means that based to your architecture lazy queues can be better than any other type.

How Many Queues Are Best For Max RabbitMQ Performance?

As always, the answer to such a question is linked to many other factors.
However this question is relevant in many implementations using Rabbitmq. You can reach 1 Million of messages per seconds as pivotal did in this article, but they didn’t have a huge amount of queues, they had multiple queues and this improve the parallelism using multiple cpus and huge amount of messages. In many real implementations is possible to have a big amount of queues and a reasonable message rate. So what happen to Rabbitmq performance?

From the RabbitMQ blog

RabbitMQ’s queues are fastest when they’re empty. When a queue is empty, and it has consumers ready to receive messages, then as soon as a message is received by the queue, it goes straight out to the consumer. In the case of a persistent message in a durable queue, yes, it will also go to disk, but that’s done in an asynchronous manner and is buffered heavily. The main point is that very little book-keeping needs to be done, very few data structures are modified, and very little additional memory needs allocating.

This small post tell some important things to us. Rabbitmq use memory to save memory structure for each queue, we will came back to this topic, however this footprint is related to the messages stored and not yet delivered and to some data structures used by the broker. this mean that until queues are kept almost empty the performance is not impacted as huge as can be the number of the queues.

This means that the queues can be infinite?

The answer is no of curse as for all computer related stuff. Queue use a file descriptor and this means that huge amount of queues use a huge amount of file descriptors.
According to the kernel documentation, /proc/sys/fs/file-max is the maximum, total, global number of file descriptors the kernel will allocate before choking. This is the kernel’s limit, not your current user’s. So you can open 812158, provided you’re alone on an idle system (single-user mode, no daemons running). we can say that the maximum number or queues manageable by a single linux machine is less than this number and with the right amount of ram and disk it should keep good performance.

So how many queues i should have for best performance?

Even if thousands of queues can be managed by Rabbitmq maintaining good performance, queues are single-threaded, and one queue can handle up to about 50k messages/s. You will achieve better throughput on a multi-core system if you have multiple queues and consumers. You will achieve optimal throughput if you have as many queues as cores on the underlying node.