Tasks about ELK (ElasticSearch, LogStash, Kibana), StatsD, Grafana.
It's all about monitoring, metrics and graphs.
You want data? Let's collect it, transform it, store it, graph it.
Tasks about ELK (ElasticSearch, LogStash, Kibana), StatsD, Grafana.
It's all about monitoring, metrics and graphs.
You want data? Let's collect it, transform it, store it, graph it.
As a minimum, to have somewhere (a reports repository?) where we can write those report queries could already be useful, so we don't lose them.
This deployment created the incident:
If one of the two topics lag, we'd have:
$ notification-push --project Nasqueron --group ops --service monitoring --type autoheal.kafka_offset.start --text "Containers sentry_post_process_forwarder_ have an issue. Identified as Kafka offet issue. Starting automatic healing procedure." $ notification-push --project Nasqueron --group ops --service monitoring --type autoheal.kafka_offset.done --text "Containers sentry_post_process_forwarder_ automatic healing one. Containers should be alive."
We don't use uptime anymore.