Page MenuHomeDevCentral

Collect metrics from RabbitMQ
Open, NormalPublic

Description

Plan:

  • Prometheus: Ensure Prometheus is deployed (T1623)
  • Docker: in rOPS pillar/paas/docker.sls, enable 15692 in rabbitmq_ports (currently commented) -> D3374
  • RabbitMQ: ensure the cluster has a name, regardless it's 1 or 3 nodes
  • RabbitMQ: enable rabbitmq_prometheus plugin -> D3373
  • Prometheus: configure RabbitMQ -> D3376, but still need to configure delay of 15s for scraping
  • Grafana: create dashboards - see https://grafana.com/orgs/rabbitmq

Reference: https://www.rabbitmq.com/prometheus.html

Event Timeline

Pending container redeployment with D3374, we can reach metrics set in D3373 with socat:

docker-002
$ socat TCP4-LISTEN:15692,fork,reuseaddr,bind=172.27.27.5 TCP4:172.17.0.3:15692 &
[1] 2641449
dereckson triaged this task as Normal priority.Nov 10 2024, 10:50

Grafana dashboard was full N/A.

Redeployed container to apply D3374 change with deploy-container rabbitmq.

After a while, curl http://172.27.27.5:15692/metrics outputs a file with 2050 lines with OpenMetrics format.

Grafana now prints data about a pretty quiet RabbitMQ installation.

Checking the RabbitMQ Monitoring with Prometheus guide:

  • we're OK for cluster name
  • to get sensible values for rate() in Grafana, we need to configure Prometheus to scrape RabbitMQ every 15s ; according Prometheus configuration, the value scrape_interval can be set at job level

Plan:

  • add the scrape interval value in pillar/observability/prometheus.sls
  • add parameter in SCRAPE_CONFIG_OPTIONS_PASSTHROUGH list in _modules/prometheus.py (or SCRAPE_CONFIG_OPTIONS_RENAME if another name than scrape_interval is used)
  • we should then be able to test configuration with salt-call --local saltutil.sync_all && salt-call --local prometheus.get_scrape_configs

docker-002 :: white-rabbit
$ rabbitmq-diagnostics -q cluster_status

Basics
      
Cluster name: rabbit@white-rabbit.nasqueron.org
Total CPU cores available cluster-wide: 8

[…]