Prometheus is available, regardless of the initial goal to offer a service mesh on Kubernetes.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Aug 4 2024
We've already statistics with Prometheus Node Exporter.
Documentation added in https://agora.nasqueron.org/Operations_grimoire/Grafana and links to other dashboards added to relevant places.
Deployed D3248 to docker-002.
Just a small note this product becomes more and more open core, and we're less in favour of that one "specifically".
Aug 3 2024
From router-001 network looks good:
Stopped currently not needed salt and node-exporter on router-001 to see if that helps.
Could be at hypervisor level. SSH failed until 13:22 where it worked immediately.
As of 13:18 UTC, SSH access works.
Also, at the same time, DevCentral is slow for arc diff or to publish this task. This delay behavior is similar as when DNS resolution timeouts occur.
$ salt-minion --versions Salt Version: Salt: 3007.1
We can actually provide P352 as hotfix.
patch is available on Eglide as part of build-essential, so presumed OK for Debian
certbot against Python 3.11 should be checked on dwellers and docker-002
I've applied P352 to replace egrep by grep -E on dwellers and docker-002.
I wanted to apply P354 to fix Salt SELinux issue with patch -p1 < ~/egrep.patch on docker-002.
Jul 31 2024
Already reported upstream: https://github.com/saltstack/salt/issues/65608
$ cd /opt/salt/nasqueron-operations $ salt dwellers state.apply roles/webserver-core/nginx/config […] ---------- [3/295] ID: selinux_context_nginx_logs Function: selinux.fcontext_policy_present Name: /var/log/www Result: False Comment: An exception occurred in this state: Traceback (most recent call last): File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 2428, in call ret = self.states[cdata["full"]]( File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in __call__ ret = self.loader.run(run_func, *args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1269, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1284, in _run_as return _func_or_method(*args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1317, in wrapper return f(*args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/states/selinux.py", line 326, in fcontext_policy_present current_state = __salt__["selinux.fcontext_get_policy"]( File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in __call__ ret = self.loader.run(run_func, *args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1269, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1284, in _run_as return _func_or_method(*args, **kwargs) File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/selinux.py", line 507, in fcontext_get_policy "filespec": parts.group(1).strip(), AttributeError: 'NoneType' object has no attribute 'group' Started: 16:25:51.413301 Duration: 391.186 ms Changes: ---------- ID: selinux_context_nginx_logs_applied Function: selinux.fcontext_policy_applied Name: /var/log/www Result: True Comment: SElinux policies are already applied for filespec "/var/log/www" Started: 16:25:51.804764 Duration: 6.322 ms Changes: ---------- […]
31/07/2024 at 12h the devcentral.nasqueron.org certificate expired
Issue can be repro on Dwellers:
Jul 30 2024
Jul 29 2024
Jul 27 2024
Increasing priority as FreeBSD 13.2 is now EOL for one month (2024-06-30).
Jul 26 2024
Not deployed to Docker but bare-metal.
T651 has a Grafana ready if we wish to retest this on Dwellers, green light.
Deployed at https://grafana.nasqueron.org/
Jul 25 2024
rOPS1e9a54c10365 has worked like a charm on WindRiver to generate grafana.nasqueron.org through DNS.
DNS: grafana. CNAME www-dev.nasqueron.org
Deployment can be using sqlite3 as long as it's still performant
as we want our monitoring tools to be resiliant.
Probably a good part of roles/core/monitoring when grains["os_family"] == "RedHat". Eglide has "Debian" for that grain, but not sure if we've enough RAM there.
Just for reference, this was a test deployment. This is not currently installed on Dwellers, and needs to be in Salt as part of T650.
This task has been created in 2016 to publish metrics from PCP (Performance Co-Pilot) on RHEL-like servers, especially our Docker engines.
RabbitMQ exporters have been added to NetBox under the tag observability -> https://netbox.nasqueron.org/ipam/services/?tag=observability 🔒
Next: memcached