Description

Plan:

Prometheus: Ensure Prometheus is deployed (T1623)
Docker: in rOPS pillar/paas/docker.sls, enable 15692 in rabbitmq_ports (currently commented) -> D3374
RabbitMQ: ensure the cluster has a name, regardless it's 1 or 3 nodes
RabbitMQ: enable rabbitmq_prometheus plugin -> D3373
Prometheus: configure RabbitMQ -> D3376, but still need to configure delay of 15s for scraping
Grafana: create dashboards - see https://grafana.com/orgs/rabbitmq

Reference: https://www.rabbitmq.com/prometheus.html

Revisions and Commits

rDRABBITMQ RabbitMQ Docker image
	D3373	rDRABBITMQ0b92cff35baa Enable rabbitmq_prometheus plugin
rOPS Nasqueron Operations
	D3374	rOPSe7a3480a4443 Expose RabbitMQ metrics on port 15692
	D3376	rOPSd17f23473772 Scrape RabbitMQ metrics into Prometheus

Related Objects
Search...

Status	Assigned	Task
Open	None	T1633 Collect metrics from RabbitMQ
Resolved	dereckson	T1623 Deploy Prometheus to gain observability
Open	None	T1983 Enable telemetry on Vault
Resolved	DorianWinty	T1987 Dovecot Metrics
Open	DorianWinty	T1990 Export metrics for Postfix

Event Timeline

dereckson created this task.Sep 30 2020, 01:28

dereckson added a subtask: T1623: Deploy Prometheus to gain observability.

dereckson mentioned this in T1392: Evaluate Prometheus.

dereckson moved this task from Backlog to Not for this sprint on the Operations sprints (Consolidate them all) board.May 18 2023, 11:53

dereckson mentioned this in D3158: Restore metrics collector in management plugin.May 30 2023, 19:24

dereckson mentioned this in rDRABBITMQ41cfea3ddaac: Restore metrics collector in management plugin.May 30 2023, 19:34

dereckson added a revision: D3373: Enable rabbitmq_prometheus plugin.Jul 23 2024, 23:24

dereckson updated the task description. (Show Details)Jul 23 2024, 23:27

dereckson added a revision: D3374: Expose RabbitMQ metrics on port 15692.Jul 23 2024, 23:34

Pending container redeployment with D3374, we can reach metrics set in D3373 with socat:

docker-002

$ socat TCP4-LISTEN:15692,fork,reuseaddr,bind=172.27.27.5 TCP4:172.17.0.3:15692 &
[1] 2641449

dereckson added a commit: rDRABBITMQ0b92cff35baa: Enable rabbitmq_prometheus plugin.Jul 25 2024, 17:37

dereckson added a revision: D3376: Scrape RabbitMQ metrics into Prometheus.Jul 26 2024, 19:34

dereckson added a commit: rOPSd17f23473772: Scrape RabbitMQ metrics into Prometheus.Jul 27 2024, 12:11

Dashboard: https://grafana.nasqueron.org/d/Kn5xm-gZk/rabbitmq-overview?orgId=1&refresh=15s

dereckson added a commit: rOPSe7a3480a4443: Expose RabbitMQ metrics on port 15692.Jul 27 2024, 12:15

dereckson closed subtask T1623: Deploy Prometheus to gain observability as Resolved.Aug 4 2024, 16:34

dereckson triaged this task as Normal priority.Nov 10 2024, 10:50

Grafana dashboard was full N/A.

Redeployed container to apply D3374 change with deploy-container rabbitmq.

After a while, curl http://172.27.27.5:15692/metrics outputs a file with 2050 lines with OpenMetrics format.

Grafana now prints data about a pretty quiet RabbitMQ installation.

dereckson updated the task description. (Show Details)Mon, Nov 10, 00:28

dereckson updated the task description. (Show Details)Mon, Nov 10, 00:32

Checking the RabbitMQ Monitoring with Prometheus guide:

we're OK for cluster name
to get sensible values for rate() in Grafana, we need to configure Prometheus to scrape RabbitMQ every 15s ; according Prometheus configuration, the value scrape_interval can be set at job level

Plan:

add the scrape interval value in pillar/observability/prometheus.sls
add parameter in SCRAPE_CONFIG_OPTIONS_PASSTHROUGH list in _modules/prometheus.py (or SCRAPE_CONFIG_OPTIONS_RENAME if another name than scrape_interval is used)
we should then be able to test configuration with salt-call --local saltutil.sync_all && salt-call --local prometheus.get_scrape_configs

docker-002 :: white-rabbit

$ rabbitmq-diagnostics -q cluster_status

Basics
      
Cluster name: rabbit@white-rabbit.nasqueron.org
Total CPU cores available cluster-wide: 8

[…]

dereckson edited projects, added Operations sprints (Echoes in the Void), Python, Monitoring and reporting; removed Operations sprints (Consolidate them all).Mon, Nov 10, 00:45

dereckson moved this task from Backlog to Prometheus on the Monitoring and reporting board.

dereckson moved this task from Backlog to Next on the Servers board.

Collect metrics from RabbitMQOpen, NormalPublicActions

Description

Revisions and Commits

Related ObjectsSearch...

Event Timeline

Collect metrics from RabbitMQ
Open, NormalPublic
Actions

Related Objects
Search...