You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a service consuming from kafka which stopped consuming and was not failing.
All we saw in logs that
Kafkajs crashed because co-rodinator is loading. It crashed multiple times and restarted the consumer every-time
Even after restarts it shows that it was unable to connect to the leader with below error.
["This server is not the leader for that topic-partition"],"stack":["KafkaJSProtocolError: This server is not the leader for that topic-partition\\n at createErrorFromCode (/usr/src/node_modules/kafkajs/src/protocol/error.js:537:10)\\n at Object.parse (/usr/src/node_modules/kafkajs/src/protocol/requests/listOffsets/v2/response.js:43:11)\\n at Connection.send (/usr/src/node_modules/kafkajs/src/network/connection.js:311:35)\\n at runMicrotasks (\u003canonymous\u003e)\\n at processTicksAndRejections (internal/process/task_queues.js:93:5)\\n at async Broker.listOffsets (/usr/src/node_modules/kafkajs/src/broker/index.js:413:20)\\n at async /usr/src/node_modules/kafkajs/src/cluster/index.js:419:43\\n at async Promise.all (index 0)\\n at async Cluster.fetchTopicsOffset (/usr/src/node_modules/kafkajs/src/cluster/index.js:431:23)\\n at async /usr/src/node_modules/kafkajs/src/admin/index.js:193:22"]}}
And when Kafka consumer comes back we are seeing the below error "error":"This is not the correct coordinator for this group"
On checking kafka logs for the same time we see that the consumer group was getting disconnected from kafka co-ordinator.
If the consumer is disconnected the service retried connecting and keep on failing with kafkajs crashed retry-able error and once we bounce the service, everything started working correctly.
The text was updated successfully, but these errors were encountered:
Initial investigation indicated that kafkajs' metadata (contains which broker owns which topics/partitions) might be stale and therefore it errors every single time we try to retrieve topic offsets. The fact that it worked after a restart points in that direction as well.
As a workaround (assuming the issue can't be found / get fixed in kafkajs) we could terminate the kafka input source if the code that fetches the topic offsets for metrics is constantly getting the "server is not the leader for that topic-partition" error.
We have a service consuming from kafka which stopped consuming and was not failing.
All we saw in logs that
Kafkajs crashed because co-rodinator is loading. It crashed multiple times and restarted the consumer every-time
Even after restarts it shows that it was unable to connect to the leader with below error.
["This server is not the leader for that topic-partition"],"stack":["KafkaJSProtocolError: This server is not the leader for that topic-partition\\n at createErrorFromCode (/usr/src/node_modules/kafkajs/src/protocol/error.js:537:10)\\n at Object.parse (/usr/src/node_modules/kafkajs/src/protocol/requests/listOffsets/v2/response.js:43:11)\\n at Connection.send (/usr/src/node_modules/kafkajs/src/network/connection.js:311:35)\\n at runMicrotasks (\u003canonymous\u003e)\\n at processTicksAndRejections (internal/process/task_queues.js:93:5)\\n at async Broker.listOffsets (/usr/src/node_modules/kafkajs/src/broker/index.js:413:20)\\n at async /usr/src/node_modules/kafkajs/src/cluster/index.js:419:43\\n at async Promise.all (index 0)\\n at async Cluster.fetchTopicsOffset (/usr/src/node_modules/kafkajs/src/cluster/index.js:431:23)\\n at async /usr/src/node_modules/kafkajs/src/admin/index.js:193:22"]}}
"msg":["Kafkajs Crashed"],"err":[{"error":{"name":"KafkaJSProtocolError","retriable":true,"type":"GROUP_LOAD_IN_PROGRESS","code":14
And when Kafka consumer comes back we are seeing the below error
"error":"This is not the correct coordinator for this group"
On checking kafka logs for the same time we see that the consumer group was getting disconnected from kafka co-ordinator.
If the consumer is disconnected the service retried connecting and keep on failing with kafkajs crashed retry-able error and once we bounce the service, everything started working correctly.
The text was updated successfully, but these errors were encountered: