Restart Docker container on memory drift

For each developer is software is immune to memory lake, this is usually false and in many cases is not related to bugs introduced by the developer but in the interpreter on the libraries used. In other cases the memory drift can be related to a unexpected behaviour, people try to think to everything but the world (including the computer world) is not lacking in inventiveness.
I started to think about when one of our dockerised microservice that we considered bulletproof, because working well for years, started grow in memory without apparently reasons, arriving to consume the 2000% of normal. We discovered that the library we used to connect to mysql was the guilty for this problem. This library started to consume a huge amount of memory when mysql went down for 30 seconda after a wrong minor update from AWS (RDS db). In order to avoid similar problem we decided to limit our docker containers in memory.

Docker implements a way to limit the memory consumption of a container on the run. More information and options can be found on this page.
Is important to consider a good maximum value of memory to kill the containers. The Docker daemon is a process itself and it set it’s own OOM priority to avoid to be killed before the single containers, but this is true until kernel have containers to kill, after that Docker can be killed like other processes.
We didn’t mount any swap space on disk so we decided to simply limit the memory to 1024M using -m / –memory= option, this mean that the container can consume up to 1GB after that an out of memory error will be generated.
In order to kill the hungry container we must be sure that the oom killer is enabled for docker daemon. This option is a bit misleading, because we should set –oom-disable-kill=False.
Ok now if everything works as expected we have our container off and our system is safe but someone should wake up to restart the container. Also in this case Docker guys thought to us and they implemented a restart policy for the container, we use it setting –restart always.

Now it’s time to test, we must create a container with a script to simulate an abnormal memory consumption.
First of all we create the script crash.sh

#/bin/bash
sleep 5
yes | tr \\n x | head -c $((1024*1024*1024)) | pv -L $((1024*1024)) | grep n

this simple script will use grep to use ram memory at a rate of about 1MB per second until 1GB.
We need also to create the Dockerfile:

FROM ubuntu:14.04
ADD crash.sh /
CMD /bin/bash /crash.sh

With the Dockerfile defined, we can now build our custom container using the docker build command and next run it using the docker run command with the parameter we described above.

sudo docker build -t testing_restarts ./
sudo docker run -d --name testing_restarts --restart always -m 1024M --oom-disable-kill=False testing_restarts

Now using the docker stats you can see the memory consumption of container testing_restarts increase and at one point disappear and restart again. If everything work as expected you can remove the container using

docker rm -f testing_restarts

In our case we would add this behaviour to our docker-compose.yml and we did in this way:

version: "2.2"

services:
  container1:
    image: dockerhubUser/image1
    cpuset: "1"
    mem_limit: 1024M
    environment:
      LOGGING_LEVEL: INFO
      ENVIRONMENT: production
    volumes:
    restart: always
    network_mode: "bridge"
  container2:
    image: dockerhubUser/image2
    ports:
      - 2244:2233
    cpuset: "0"
    mem_limit: 1024M
    oom_kill_disable: False
    environment:
      LOGGING_LEVEL: INFO
      ENVIRONMENT: aws-production
    restart: always
    network_mode: "bridge"

Backup and Restore RabbitMQ Data & Configurations

Every RabbitMQ node has a data directory that stores all the information that resides on that node.
A data directory contains two types of data: definitions (metadata, schema/topology) and message store data.
Nodes and clusters store information that can be thought of schema, metadata or topology. Users, vhosts, queues, exchanges, bindings, runtime parameters all fall into this category.

Definitions are stored in an internal database and replicated across all cluster nodes. Every node in a cluster has its own replica of all definitions. When a part of definitions changes, the update is performed on all nodes in a single transaction. In the context of backups this means that in practice definitions can be exported from any cluster node with the same result.
Messages instead are stored in a message store. A “message store” can be easy and incompletely defined as an internal store for messages, a single entity that’s transparent to the user.

Each node has its own data directory and stores messages for the queues that have their master hosted on that node. Messages can be replicated between nodes using queue mirroring. Messages are stored in subdirectories of the node’s data directory.

Definitions are usually mostly static, while messages are continuously flowing from publishers to consumers when performing a backup, first step is deciding whether to back up only definitions or the message store as well. Because messages are often short-lived and possibly transient, a backup of them is hight recommended from a stopped system to avoid inconsistencies.
Definitions can only be backed up from a running node.

Using rabbitmqadmin

This type of backup backup doesn’t include Messages since they are stored in a separate message store. It will only backup RabbitMQ users, vhosts, queues, exchanges, and bindings. The backup file is a JSON representation of RabbitMQ metadata. We will do a backup using rabbitmqadmincommand line tool.

First check if rabbitmq_managment is enabled on one of your RabbitMQ nodes:

rabbitmq-plugins enable rabbitmq_management

After that you will be able to download rabbitmqadmin on the RabbitMQ node with the following command:

wget https://{rabbitmq-node-hostname}:15672/cli/

Once downloaded, make the file executable and move it to the /usr/local/bin directory on the RabbitMQ node:

chmod +x rabbitmqadmin 
sudo mv rabbitmqadmin /usr/local/bin

To backup the RabbitMQ instance configuration, use the following command:

rabbitmqadmin export <configuration-backup-file-name.json>

To restore the RabbitMQ instance configuration, user the following command:

rabbitmqadmin import <configuration-backup-file-name.json>

Using Rabbitmq data directory

RabbitMQ Definitions and Messages are stored in an internal database located in the node’s data directory. To get the directory path, run the following command against a running RabbitMQ node:
rabbitmqctl eval 'rabbit_mnesia:dir().'

This directory contains many files:

# ls /var/lib/rabbitmq/mnesia/[email protected]
cluster_nodes.config  nodes_running_at_shutdown    rabbit_durable_route.DCD       rabbit_user.DCD             schema.DAT
DECISION_TAB.LOG      rabbit_durable_exchange.DCD  rabbit_runtime_parameters.DCD  rabbit_user_permission.DCD  schema_version
LATEST.LOG            rabbit_durable_exchange.DCL  rabbit_serial                  rabbit_vhost.DCD
msg_stores            rabbit_durable_queue.DCD     rabbit_topic_permission.DCD    rabbit_vhost.DCL

In RabbitMQ versions starting with 3.7.0 all messages data is combined in the msg_stores/vhosts directory and stored in a subdirectory per vhost. Each vhost directory is named with a hash and contains a .vhost file with the vhost name, so a specific vhost’s message set can be backed up separately.

In order to make a backup copy or archive this folder after rabbitmq has been stopped.

sudo systemctl stop rabbitmq-server.service

The example below will create an archive:

tar cvf rabbit-backup.tgz /var/lib/rabbitmq/mnesia/[email protected]

To restore from Backup, extract the files from backup to the data directory.
Internal node database stores node’s name in certain records. Should node name change, the database must first be updated to reflect the change using the following rabbitmqctl command:

rabbitmqctl rename_cluster_node <oldnode> <newnode>

When a new node starts with a backed up directory and a matching node name, it should perform the upgrade steps as needed and proceed to boot.

Rabbitmq performance improvement tips part 1

Make sure your queues stay short

Many messages in a queue can put a heavy load on RAM usage. In order to free up RAM, RabbitMQ start flushing (page out) messages to disk. This process deteriorates queueing speed. The page out process usually takes time and blocks the queue from processing messages when there are many messages to page out. Many messages might affect the performance of the broker negatively. Many messages in queue will also increase the time broker spend restart in order to rebuild the index.
in order to force to keep queues short where throughput is more important than anything else you can set a max-length on the queue or a ttl on messages.
Queue max-length force the size of the queue discarding exceeding messages from the head. Set message ttl discard messages older than ttl. The second approach maybe not be sufficient to mitigate spike of messages in your application.
If your application cannot loose messages you need to monitor queues size very frequently and react manually to solve the problem.


Message size matter

If you care about latency and message rates then use smaller messages. Both for queues in memory and for queues on disk, read and write operations take time (for queues on disk they are strongly buffered and asynchronous), and easy to image this time depends on the size of the message. In order to reduce the size a optimal format should chosen, prefer json over xml for example but also consider to compress the payload of the message. RabbitMQ don’t care about what is in the payload, is possible to use any format that sender and receiver can exchange, zip, protobuffer etc. reducing size for the broker and on the network.


Experimenting HiPE

It’s possible to enable HiPE, High performance Erlang for RabbitMQ.
Enabling HiPE RabbitMQ is compiled at start up. The throughput increases with 20-80% according to benchmarks. The only draw back is that the startup time increases quite a lot too, 1-3 minutes.
HiPE is still marked as experimental in RabbitMQ’s documentation but i’m using it in production since more than 1 year and seems to be very stable. However is also true that the improvement claimed by the creator are limited to just some use cases and not in general. In my experience the improvement is visible but not so high.


Prefer direct or a fanout exchange over topic or headers exchange

Considering this post about different exchange types is easy to understand that fanout and direct exchange are easier to implement that headers and topic, the difference in throughput is very very very small but when possible it better to use the easier implementation.


Give a price to message reliability

Persistent messages and durable queues are really good, they let messages survive at broker restarts, broker hardware failure, or broker crashes. To ensure that messages and broker definitions survive restarts, we need to ensure that they are on disk. Messages, exchanges, and queues that are not durable and persistent will be lost during a broker restart.
Persistent messages are heavier as they have to be written to disk. For high performance transient messages are the best choice this means that in your solution you can loose message in case of failures.


Explore lazy queues solutions for your architecture

Consider that enabling lazy queues messages go directly on the disk and loaded when they are needed, normally this decrease performance however in some situation for example batch processing or if consumer is slower than producer or triggered after some time, lazy queues performance are more predictable and memory consumption have not peaks. This means that based to your architecture lazy queues can be better than any other type.

How Many Queues Are Best For Max RabbitMQ Performance?

As always, the answer to such a question is linked to many other factors.
However this question is relevant in many implementations using Rabbitmq. You can reach 1 Million of messages per seconds as pivotal did in this article, but they didn’t have a huge amount of queues, they had multiple queues and this improve the parallelism using multiple cpus and huge amount of messages. In many real implementations is possible to have a big amount of queues and a reasonable message rate. So what happen to Rabbitmq performance?

From the RabbitMQ blog

RabbitMQ’s queues are fastest when they’re empty. When a queue is empty, and it has consumers ready to receive messages, then as soon as a message is received by the queue, it goes straight out to the consumer. In the case of a persistent message in a durable queue, yes, it will also go to disk, but that’s done in an asynchronous manner and is buffered heavily. The main point is that very little book-keeping needs to be done, very few data structures are modified, and very little additional memory needs allocating.

This small post tell some important things to us. Rabbitmq use memory to save memory structure for each queue, we will came back to this topic, however this footprint is related to the messages stored and not yet delivered and to some data structures used by the broker. this mean that until queues are kept almost empty the performance is not impacted as huge as can be the number of the queues.

This means that the queues can be infinite?

The answer is no of curse as for all computer related stuff. Queue use a file descriptor and this means that huge amount of queues use a huge amount of file descriptors.
According to the kernel documentation, /proc/sys/fs/file-max is the maximum, total, global number of file descriptors the kernel will allocate before choking. This is the kernel’s limit, not your current user’s. So you can open 812158, provided you’re alone on an idle system (single-user mode, no daemons running). we can say that the maximum number or queues manageable by a single linux machine is less than this number and with the right amount of ram and disk it should keep good performance.

So how many queues i should have for best performance?

Even if thousands of queues can be managed by Rabbitmq maintaining good performance, queues are single-threaded, and one queue can handle up to about 50k messages/s. You will achieve better throughput on a multi-core system if you have multiple queues and consumers. You will achieve optimal throughput if you have as many queues as cores on the underlying node.