Home
DevCentral
Search
Configure Global Search
Log In
Transactions
T1816
Change Details
Change Details
Old
New
Diff
Yesterday evening, we got a new Kafka lag. Detect if one of the sentry_post_process_forwarder_ container dies. If so, check log for message and if applicable, apply automatically procedure described in the NOG: https://agora.nasqueron.org/Operations_grimoire/Sentry#Kafka That would mean: # Notify the operation start # Stop both containers, as we can update offset as long as a client is connected to that consumer group # Reset Kafka offset # Start both containers # Notify the operation is done If it's another error, as we've a script on it, it would be nice if it creates a task on DevCentral with the log tail. Notification can be done with notification-push Docker Tide can be used to detect the need for healing **Java Kafka API reference** Procedure offers to use CLI client, and comments show we can check the lag column before and after proceeding If we prefer a Java command to automate and check the operation, we can use: - [[ https://kafka.apache.org/34/javadoc/org/apache/kafka/clients/admin/KafkaAdminClient.html#describeConsumerGroups(java.util.Collection,org.apache.kafka.clients.admin.DescribeConsumerGroupsOptions)| KafkaAdminClient.describeConsumerGroups ]] - [[ https://kafka.apache.org/34/javadoc/org/apache/kafka/clients/admin/Admin.html#alterConsumerGroupOffsets(java.lang.String,java.util.Map) | KafkaAdminClient.alterConsumerGroupOffsets ]]
Yesterday evening, we got a new Kafka lag. Detect if one of the sentry_post_process_forwarder_ container dies. If so, check log for message and if applicable, apply automatically procedure described in the NOG: https://agora.nasqueron.org/Operations_grimoire/Sentry#Kafka That would mean: # Notify the operation start # Stop both containers, as we can update offset as long as a client is connected to that consumer group # Reset Kafka offset # Start both containers # Notify the operation is done If it's another error, as we've a script on it, it would be nice if it creates a task on DevCentral with the log tail. Notification can be done with notification-push Docker Tide can be used to detect the need for healing **How to detect the issue** `docker logs` output critical error to stderr, so first redirect to stdout, we're only interested in the last line before the crash, previous ones would be previous occurrences: `docker logs sentry_post_process_forwarder_errors --tail=1 2>&1 | grep -qF arroyo.errors.OffsetOutOfRange` **Java Kafka API reference** Procedure offers to use CLI client, and comments show we can check the lag column before and after proceeding If we prefer a Java command to automate and check the operation, we can use: - [[ https://kafka.apache.org/34/javadoc/org/apache/kafka/clients/admin/KafkaAdminClient.html#describeConsumerGroups(java.util.Collection,org.apache.kafka.clients.admin.DescribeConsumerGroupsOptions)| KafkaAdminClient.describeConsumerGroups ]] - [[ https://kafka.apache.org/34/javadoc/org/apache/kafka/clients/admin/Admin.html#alterConsumerGroupOffsets(java.lang.String,java.util.Map) | KafkaAdminClient.alterConsumerGroupOffsets ]]
Yesterday evening, we got a new Kafka lag. Detect if one of the sentry_post_process_forwarder_ container dies. If so, check log for message and if applicable, apply automatically procedure described in the NOG: https://agora.nasqueron.org/Operations_grimoire/Sentry#Kafka That would mean: # Notify the operation start # Stop both containers, as we can update offset as long as a client is connected to that consumer group # Reset Kafka offset # Start both containers # Notify the operation is done If it's another error, as we've a script on it, it would be nice if it creates a task on DevCentral with the log tail. Notification can be done with notification-push Docker Tide can be used to detect the need for healing
**How to detect the issue** `docker logs` output critical error to stderr, so first redirect to stdout, we're only interested in the last line before the crash, previous ones would be previous occurrences: `docker logs sentry_post_process_forwarder_errors --tail=1 2>&1 | grep -qF arroyo.errors.OffsetOutOfRange`
**Java Kafka API reference** Procedure offers to use CLI client, and comments show we can check the lag column before and after proceeding If we prefer a Java command to automate and check the operation, we can use: - [[ https://kafka.apache.org/34/javadoc/org/apache/kafka/clients/admin/KafkaAdminClient.html#describeConsumerGroups(java.util.Collection,org.apache.kafka.clients.admin.DescribeConsumerGroupsOptions)| KafkaAdminClient.describeConsumerGroups ]] - [[ https://kafka.apache.org/34/javadoc/org/apache/kafka/clients/admin/Admin.html#alterConsumerGroupOffsets(java.lang.String,java.util.Map) | KafkaAdminClient.alterConsumerGroupOffsets ]]
Continue