Tasks about ELK (ElasticSearch, LogStash, Kibana), StatsD, Grafana.
It's all about monitoring, metrics and graphs.
You want data? Let's collect it, transform it, store it, graph it.
Tasks about ELK (ElasticSearch, LogStash, Kibana), StatsD, Grafana.
It's all about monitoring, metrics and graphs.
You want data? Let's collect it, transform it, store it, graph it.
Both naemon-core and naemon-livestatus deployed successfully on WindRiver. Checks run as expected.
As a minimum, to have somewhere (a reports repository?) where we can write those report queries could already be useful, so we don't lose them.
This deployment created the incident:
If one of the two topics lag, we'd have:
$ notification-push --project Nasqueron --group ops --service monitoring --type autoheal.kafka_offset.start --text "Containers sentry_post_process_forwarder_ have an issue. Identified as Kafka offet issue. Starting automatic healing procedure." $ notification-push --project Nasqueron --group ops --service monitoring --type autoheal.kafka_offset.done --text "Containers sentry_post_process_forwarder_ automatic healing one. Containers should be alive."
We don't use uptime anymore.