Published events are pending in the stream
Symptom
You publish events, but some of them are not received by the subscriber and stay pending in the stream.
Cause
When the NATS EventingBackend has more than 1 replica, and the Clustering
property on the NATS Server is enabled, one replica is elected as a leader on the stream and consumer levels (see NATS Documentation).
When the leader is elected, all the messages are replicated across the replicas.
Sometimes replicas can go out of sync with the other replicas. As a result, messages on some consumers can stop being acknowledged and start piling up in the stream.
Remedy
To fix the "broken" consumers with pending messages, trigger a leader reelection. You can do this either on the consumers that have pending messages, or if that fails, on the stream level.
You need the latest version of NATS CLI installed on your machine.
Consumer leader reelection
First, find out which consumer(s) have pending messages. You can find the broken consumer either with the NATS CLI command or with a Grafana dashboard.
Option 1: Find the broken consumers with NATS CLI
Port forward to a NATS replica:
Click to copykubectl port-forward -n kyma-system eventing-nats-0 4222Run this shell script:
Click to copyfor consumer in $(nats consumer list -n sap) # sap is the stream namedonats consumer info sap $consumer -j | jq -c '{name: .name, pending: .num_pending, leader: .cluster.leader}'doneYou get an output like the following:
Click to copy{"name":"ebcabfe5c902612f0ba3ebde7653f30b","pending":25,"leader":"eventing-nats-1"}{"name":"c74c20756af53b592f87edebff67bdf8","pending":0,"leader":"eventing-nats-0"}Check the output to see which consumer has pending messages and which replica is the leader. In this example, the consumer
ebcabfe5c902612f0ba3ebde7653f30b
has 25 pending messages and has the leader. The other one has no pending message and is successfully processing events.
Option 2: Find the broken consumers using Grafana dashboard
- Access and Expose Grafana.
- Find the NATS JetStream Dashboard and check the pending messages:
Find the consumer with pending messages and encode it as an
md5
hash:Click to copyecho -n "tunas-testing/test-noapp3/kyma.noapp.order.created.v1" | md5This shell command results in
ebcabfe5c902612f0ba3ebde7653f30b
.Port forward to a NATS replica:
Click to copykubectl port-forward -n kyma-system eventing-nats-0 4222Get information about the consumer:
Click to copynats consumer info sap ebcabfe5c902612f0ba3ebde7653f30bIn the output, find the consumer's leader. In the following example, the leader is the
eventing-nats-1
replica:Click to copyInformation for Consumer sap > ebcabfe5c902612f0ba3ebde7653f30b created 2022-10-24T15:49:43+02:00Configuration:Name: ebcabfe5c902612f0ba3ebde7653f30bDescription: tunas-testing/test-noapp3/kyma.noapp.order.created.v1...Cluster Information:Name: eventing-natsLeader: eventing-nats-1 # that's what we needReplica: eventing-nats-0, current, seen 0.96s agoReplica: eventing-nats-2, current, seen 0.96s ago
Trigger the consumer leader reelection
Knowing the name of the broken consumer and its leader, you can trigger the reelection:
Port forward the leader replica:
Click to copykubectl port-forward -n kyma-system eventing-nats-1 4222Trigger the leader reelection for that broken consumer:
Click to copynats consumer cluster step-down sap ebcabfe5c902612f0ba3ebde7653f30bAfter execution, you see a message like the following:
Click to copyNew leader elected "eventing-nats-2"Information for Consumer sap > ebcabfe5c902612f0ba3ebde7653f30b created 2022-10-24T15:49:43+02:00Check the consumer and confirm that the pending messages started to be dispatched.
Stream leader reelection
Sometimes triggering the leader reelection on the broken consumers doesn't work. In that case, you must restart the NATS Pods to trigger leader reelection on the stream level.
Run the NATS command:
Click to copynats stream cluster step-down sapCheck that your result looks like the following example:
Click to copy11:08:22 Requesting leader step down of "eventing-nats-1" in a 3 peer RAFT group11:08:23 New leader elected "eventing-nats-0"Information for Stream sap created 2022-10-24 15:47:19Subjects: kyma.>Replicas: 3Storage: File