As part of the increased resilience for our RabbitMQ cluster, we added the argument x-delivery-limit to our service’s quorum queues in order to avoid poisonous messages in our service queues.

The idea is that if we have a message that cannot be deserialized in our custom object types (or other errors, like NPE), it would try to perform the operation for a finite number of times, in our case 10, and after that, the not consumable message would be routed to the service’s DLQ.

After the message arrives in the DLQ, it would be consumed as a default Message, logged the message metadata for debugging purposes, and after we would just discard the message.

One possibility that we didn’t think of was that even the basic Message maybe was not able to be received by the consumer, being itself a poisonous message in the Dead Letter Queue (DLQ).

That is what happened in one case. What happened was that one of the services published a message with a payload a little bit bigger than 64MB and we had the default value from the rabbitmq-java-client (which comes from Spring AMQP) library which is by default 64 MB.

Default maximum consumer message size in the RabbitMQ Client ConnectionFactory.java class

Default maximum consumer message size in the RabbitMQ Client ConnectionFactory.java class

When the consumer of the DLQ was trying to consume the message, it was throwing the following error message

Stack trace with error

Stack trace with error

But despite adding x-delivery-limit to all service queues, we forgot to add it also to the DLQ, so when the message was not able to be delivered, it was returned to the broker and being tried again, in an infinite loop.

But that is not the only and worse problem. What happens is that when the channel is faced with a big message like this, based on the current implementation, the channel is to shut down and stops the connection from receiving work

Method handleFailure in RabbitMQ Client AMQConnection.java class

Method handleFailure in RabbitMQ Client AMQConnection.java class

As described in the docs.

shutdown method on RabbitMQ AMQConnection.java class

shutdown method on RabbitMQ AMQConnection.java class

So, besides being poisonous, it was also killing the connection affecting all other channels and consumers for other queues.

This shutdown of the connection caused multiple messages to be delivered more than one time as the connection was killed before the acknowledgment was delivered to the broker, so it tried to deliver all other messages again.

That is why is also important to have an x-delivery-limit for DQL as they also can drop messages that have a larger body size than allowed by the consumers (or other unforeseen errors).

Happy coding!