Page MenuHomeDevCentral
Feed Advanced Search

Aug 4 2024

dereckson moved T1897: Provision development sites on WindRiver from Backlog to To watch on the Alkane board.
Aug 4 2024, 16:51 · Alkane, Servers
dereckson moved T1898: Decommission Ysul from Backlog to To watch on the Alkane board.
Aug 4 2024, 16:50 · Dæghrefn, Alkane, discussion, Servers
dereckson closed T1623: Deploy Prometheus to gain observability as Resolved.
Aug 4 2024, 16:34 · Monitoring and reporting, Operations sprints (Consolidate them all), Servers
dereckson closed T1623: Deploy Prometheus to gain observability, a subtask of T1621: Prepare a more flexible containers platform, as Resolved.
Aug 4 2024, 16:34 · Operations sprints (Consolidate them all), Servers
dereckson closed T1623: Deploy Prometheus to gain observability, a subtask of T1633: Collect metrics from RabbitMQ, as Resolved.
Aug 4 2024, 16:34 · Operations sprints (Consolidate them all), Servers
dereckson claimed T1623: Deploy Prometheus to gain observability.

Prometheus is available, regardless of the initial goal to offer a service mesh on Kubernetes.

Aug 4 2024, 16:34 · Monitoring and reporting, Operations sprints (Consolidate them all), Servers
dereckson added a project to T650: Deploy PCP on Docker engines: Monitoring and reporting.
Aug 4 2024, 16:32 · Monitoring and reporting, Servers
dereckson closed T650: Deploy PCP on Docker engines as Wontfix.

We've already statistics with Prometheus Node Exporter.

Aug 4 2024, 16:32 · Monitoring and reporting, Servers
dereckson closed T651: Deploy Grafana as Resolved.
Aug 4 2024, 16:30 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson closed T651: Deploy Grafana, a subtask of T650: Deploy PCP on Docker engines, as Resolved.
Aug 4 2024, 16:30 · Monitoring and reporting, Servers
dereckson added a comment to T651: Deploy Grafana.

Documentation added in https://agora.nasqueron.org/Operations_grimoire/Grafana and links to other dashboards added to relevant places.

Aug 4 2024, 16:30 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson moved T1505: Automate Let's Encrypt TLS certificates management for every server from Backlog to Pending review on the Servers board.
Aug 4 2024, 14:00 · Servers
dereckson added a revision to T1505: Automate Let's Encrypt TLS certificates management for every server: D3406: Don't deploy duplicate renewal service on Debian and RedHat.
Aug 4 2024, 13:59 · Servers
dereckson added a comment to T1505: Automate Let's Encrypt TLS certificates management for every server.

Deployed D3248 to docker-002.

Aug 4 2024, 13:52 · Servers
dereckson added a revision to T1505: Automate Let's Encrypt TLS certificates management for every server: D3404: Read services variable from map.
Aug 4 2024, 11:47 · Servers
dereckson placed T1693: Evaluate Sensu for monitoring up for grabs.

Just a small note this product becomes more and more open core, and we're less in favour of that one "specifically".

Aug 4 2024, 09:54 · Servers, Monitoring and reporting, Product evaluation
dereckson moved T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration from Working on to Pending review on the Servers board.
Aug 4 2024, 09:53 · Vault, Servers, Salt, Eglide
dereckson added a revision to T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration: D3401: Resolve conflict for Salt Vault configuration.
Aug 4 2024, 09:50 · Vault, Servers, Salt, Eglide
dereckson moved T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration from Backlog to Working on on the Servers board.
Aug 4 2024, 09:29 · Vault, Servers, Salt, Eglide
dereckson claimed T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration.
Aug 4 2024, 09:29 · Vault, Servers, Salt, Eglide

Aug 3 2024

dereckson added a revision to T1475: Provision a mail server: D3400: Cleanup LXC-specific mailserver units.
Aug 3 2024, 21:17 · Mail, Restricted Project, Servers
dereckson moved T1693: Evaluate Sensu for monitoring from In progress to Backlog on the User-Dereckson board.
Aug 3 2024, 19:50 · Servers, Monitoring and reporting, Product evaluation
dereckson closed T1989: Merge Nasqueron infrastructure reference into ops grimoire as Resolved.
Aug 3 2024, 19:49 · Mail, Nasqueron Operations Squad, Servers, documentation
dereckson added a comment to T1989: Merge Nasqueron infrastructure reference into ops grimoire.

https://agora.nasqueron.org/Nasqueron_infrastructure_reference -> https://agora.nasqueron.org/Operations_grimoire
https://agora.nasqueron.org/Nasqueron_infrastructure_reference/Mail -> https://agora.nasqueron.org/Operations_grimoire/Mail

Aug 3 2024, 19:49 · Mail, Nasqueron Operations Squad, Servers, documentation
dereckson claimed T1989: Merge Nasqueron infrastructure reference into ops grimoire.
Aug 3 2024, 19:45 · Mail, Nasqueron Operations Squad, Servers, documentation
dereckson moved T1991: Context has again been lost on /var/log/www from Backlog to To check again on the upstream board.
Aug 3 2024, 16:32 · upstream, Regression, Servers, Salt
dereckson moved T1991: Context has again been lost on /var/log/www from Working on to Pending review on the Servers board.
Aug 3 2024, 16:32 · upstream, Regression, Servers, Salt
dereckson moved T1992: Install patch on redhat family as part of core from Working on to Pending review on the Servers board.
Aug 3 2024, 16:32 · Servers
dereckson moved T1994: Upgrade Salt repository on Debian from Backlog to Pending review on the Servers board.
Aug 3 2024, 16:32 · Eglide, Servers, Salt
dereckson moved T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration from Backlog to Bug and issues on the Salt board.
Aug 3 2024, 16:26 · Vault, Servers, Salt, Eglide
dereckson moved T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration from Backlog to Server config on the Eglide board.
Aug 3 2024, 16:26 · Vault, Servers, Salt, Eglide
dereckson added projects to T1998: Resolve conflict between core and shellserver roles for Vault in Salt configuration: Servers, Vault.
Aug 3 2024, 16:24 · Vault, Servers, Salt, Eglide
dereckson added a revision to T1994: Upgrade Salt repository on Debian: D3393: Update Salt repository on Debian.
Aug 3 2024, 15:44 · Eglide, Servers, Salt
dereckson renamed T1994: Upgrade Salt repository on Debian from Upgrade Salt on Eglide to Upgrade Salt repository on Debian.
Aug 3 2024, 15:17 · Eglide, Servers, Salt
dereckson added a comment to T1996: Servers on hyper-001 have network issues.

From router-001 network looks good:

Aug 3 2024, 13:59 · security, Servers
dereckson added a comment to T1996: Servers on hyper-001 have network issues.

Stopped currently not needed salt and node-exporter on router-001 to see if that helps.

Aug 3 2024, 13:58 · security, Servers
dereckson renamed T1996: Servers on hyper-001 have network issues from Server outage: complector to Servers on hyper-001 have network issues.
Aug 3 2024, 13:23 · security, Servers
dereckson shifted T1996: Servers on hyper-001 have network issues from the S1 Nasqueron space to the Restricted Space space.
Aug 3 2024, 13:23 · security, Servers
dereckson lowered the priority of T1996: Servers on hyper-001 have network issues from Unbreak Now! to High.

Could be at hypervisor level. SSH failed until 13:22 where it worked immediately.

Aug 3 2024, 13:23 · security, Servers
dereckson added a comment to T1996: Servers on hyper-001 have network issues.

As of 13:18 UTC, SSH access works.

Aug 3 2024, 13:19 · security, Servers
dereckson added a comment to T1996: Servers on hyper-001 have network issues.

Also, at the same time, DevCentral is slow for arc diff or to publish this task. This delay behavior is similar as when DNS resolution timeouts occur.

Aug 3 2024, 13:14 · security, Servers
dereckson triaged T1996: Servers on hyper-001 have network issues as Unbreak Now! priority.
Aug 3 2024, 13:13 · security, Servers
dereckson added a revision to T1991: Context has again been lost on /var/log/www: D3392: Avoid egrep in Salt code base.
Aug 3 2024, 13:09 · upstream, Regression, Servers, Salt
dereckson added a comment to T1994: Upgrade Salt repository on Debian.
Eglide
$ salt-minion --versions
Salt Version:
          Salt: 3007.1
Aug 3 2024, 13:01 · Eglide, Servers, Salt
dereckson added a subtask for T1950: Deploy PHP 8.3: T1995: PHP 8.2 and PHP 8.3 seems both to be installed on Eglide.
Aug 3 2024, 12:59 · Servers, PHP 8.x support
dereckson edited projects for T1994: Upgrade Salt repository on Debian, added: Servers, Eglide; removed Nasqueron Operations Squad, discussion.
Aug 3 2024, 12:56 · Eglide, Servers, Salt
dereckson added a revision to T1982: Upgrade from Python 3.9 to Python 3.11+: D3391: Bump Sphinx package name to use Python3.11.
Aug 3 2024, 12:18 · Servers
dereckson claimed T1991: Context has again been lost on /var/log/www.

We can actually provide P352 as hotfix.

Aug 3 2024, 12:13 · upstream, Regression, Servers, Salt
dereckson added a revision to T1992: Install patch on redhat family as part of core: D3390: Install patch on RHEL-family servers.
Aug 3 2024, 12:08 · Servers
dereckson added a comment to T1992: Install patch on redhat family as part of core.

patch is available on Eglide as part of build-essential, so presumed OK for Debian

Aug 3 2024, 12:06 · Servers
dereckson added a parent task for T1992: Install patch on redhat family as part of core: T1991: Context has again been lost on /var/log/www.
Aug 3 2024, 12:04 · Servers
dereckson added a subtask for T1991: Context has again been lost on /var/log/www: T1992: Install patch on redhat family as part of core.
Aug 3 2024, 12:04 · upstream, Regression, Servers, Salt
dereckson added a comment to T1982: Upgrade from Python 3.9 to Python 3.11+.

certbot against Python 3.11 should be checked on dwellers and docker-002

Aug 3 2024, 11:45 · Servers
dereckson moved T1992: Install patch on redhat family as part of core from Backlog to Working on on the Servers board.
Aug 3 2024, 11:30 · Servers
dereckson lowered the priority of T1991: Context has again been lost on /var/log/www from High to Normal.
Aug 3 2024, 10:11 · upstream, Regression, Servers, Salt
dereckson added a comment to T1991: Context has again been lost on /var/log/www.

I've applied P352 to replace egrep by grep -E on dwellers and docker-002.

Aug 3 2024, 10:11 · upstream, Regression, Servers, Salt
dereckson added a comment to T1992: Install patch on redhat family as part of core.

I wanted to apply P354 to fix Salt SELinux issue with patch -p1 < ~/egrep.patch on docker-002.

Aug 3 2024, 10:09 · Servers
dereckson updated the task description for T1992: Install patch on redhat family as part of core.
Aug 3 2024, 10:08 · Servers
dereckson triaged T1992: Install patch on redhat family as part of core as Normal priority.
Aug 3 2024, 10:08 · Servers

Jul 31 2024

dereckson moved T1991: Context has again been lost on /var/log/www from Backlog to Bug and issues on the Salt board.
Jul 31 2024, 16:30 · upstream, Regression, Servers, Salt
dereckson added a project to T1991: Context has again been lost on /var/log/www: upstream.
Jul 31 2024, 16:30 · upstream, Regression, Servers, Salt
dereckson added a comment to T1991: Context has again been lost on /var/log/www.

Already reported upstream: https://github.com/saltstack/salt/issues/65608

Jul 31 2024, 16:30 · upstream, Regression, Servers, Salt
dereckson added a comment to T1991: Context has again been lost on /var/log/www.
Complector
$ cd /opt/salt/nasqueron-operations
$ salt dwellers state.apply roles/webserver-core/nginx/config
[…]
----------                                                                                                                                                                                                                                    [3/295]
          ID: selinux_context_nginx_logs
    Function: selinux.fcontext_policy_present
        Name: /var/log/www
      Result: False
     Comment: An exception occurred in this state: Traceback (most recent call last):
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/state.py", line 2428, in call
                  ret = self.states[cdata["full"]](
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in __call__
                  ret = self.loader.run(run_func, *args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1269, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1284, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1317, in wrapper
                  return f(*args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/states/selinux.py", line 326, in fcontext_policy_present
                  current_state = __salt__["selinux.fcontext_get_policy"](
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 160, in __call__
                  ret = self.loader.run(run_func, *args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1269, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1284, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/modules/selinux.py", line 507, in fcontext_get_policy
                  "filespec": parts.group(1).strip(),
              AttributeError: 'NoneType' object has no attribute 'group'
     Started: 16:25:51.413301
    Duration: 391.186 ms
     Changes:
----------
          ID: selinux_context_nginx_logs_applied
    Function: selinux.fcontext_policy_applied
        Name: /var/log/www
      Result: True
     Comment: SElinux policies are already applied for filespec "/var/log/www"
     Started: 16:25:51.804764
    Duration: 6.322 ms
     Changes:
----------
[…]
Jul 31 2024, 16:27 · upstream, Regression, Servers, Salt
DorianWinty added a comment to T1505: Automate Let's Encrypt TLS certificates management for every server.

31/07/2024 at 12h the devcentral.nasqueron.org certificate expired

Jul 31 2024, 16:24 · Servers
dereckson added a comment to T1991: Context has again been lost on /var/log/www.

Issue can be repro on Dwellers:

Jul 31 2024, 16:23 · upstream, Regression, Servers, Salt
dereckson added a project to T1991: Context has again been lost on /var/log/www: Regression.
Jul 31 2024, 16:22 · upstream, Regression, Servers, Salt
dereckson triaged T1991: Context has again been lost on /var/log/www as High priority.
Jul 31 2024, 16:21 · upstream, Regression, Servers, Salt

Jul 30 2024

DorianWinty added a subtask for T1930: Postfix Provisioning: T1990: Export metrics for Postfix.
Jul 30 2024, 20:59 · Mail, Restricted Project, Servers

Jul 29 2024

dereckson moved T1989: Merge Nasqueron infrastructure reference into ops grimoire from Backlog - On hold pending T1475 to Checks after T1475 on the Mail board.
Jul 29 2024, 18:58 · Mail, Nasqueron Operations Squad, Servers, documentation
dereckson triaged T1989: Merge Nasqueron infrastructure reference into ops grimoire as Normal priority.
Jul 29 2024, 18:58 · Mail, Nasqueron Operations Squad, Servers, documentation

Jul 27 2024

dereckson added a revision to T1762: Deploy NetBox: D3385: Switch from fixes to flags in node pillar.
Jul 27 2024, 23:19 · Restricted Project, Servers, Drake network
dereckson raised the priority of T1939: Implement blue/green deployment or immutable artefacts for router-001 from Low to Normal.

Increasing priority as FreeBSD 13.2 is now EOL for one month (2024-06-30).

Jul 27 2024, 20:38 · Servers, Drake network
dereckson moved T1757: docker-001 routing for drake doesn't work on boot from IntraNought to IntraNought / GRE tunnels on the Drake network board.
Jul 27 2024, 20:33 · Operations sprints (Ignite Alkane Propulsion), Salt, Drake network, Servers, Nasqueron Docker deployment squad
dereckson added a revision to T1762: Deploy NetBox: D3383: Drop network:ipv6_native from node pillar.
Jul 27 2024, 19:24 · Restricted Project, Servers, Drake network
dereckson added a revision to T1623: Deploy Prometheus to gain observability: D3381: Configure Docker metrics service in firewalld.
Jul 27 2024, 17:07 · Monitoring and reporting, Operations sprints (Consolidate them all), Servers
dereckson added a revision to T651: Deploy Grafana: D3380: Set correct Grafana URL.
Jul 27 2024, 13:57 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a revision to T651: Deploy Grafana: D3379: Move Grafana plugins directory to default location.
Jul 27 2024, 13:51 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a comment to T1633: Collect metrics from RabbitMQ.

Dashboard: https://grafana.nasqueron.org/d/Kn5xm-gZk/rabbitmq-overview?orgId=1&refresh=15s

Jul 27 2024, 12:14 · Operations sprints (Consolidate them all), Servers

Jul 26 2024

dereckson removed a project from T651: Deploy Grafana: Nasqueron Docker deployment squad.

Not deployed to Docker but bare-metal.

Jul 26 2024, 23:09 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson moved T651: Deploy Grafana from Backlog - Monitoring / misc to Working on on the Operations sprints (Ignite Alkane Propulsion) board.
Jul 26 2024, 23:09 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson moved T651: Deploy Grafana from Backlog to Working on on the Servers board.
Jul 26 2024, 23:09 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a comment to T650: Deploy PCP on Docker engines.

T651 has a Grafana ready if we wish to retest this on Dwellers, green light.

Jul 26 2024, 19:39 · Monitoring and reporting, Servers
dereckson added a comment to T651: Deploy Grafana.

Deployed at https://grafana.nasqueron.org/

Jul 26 2024, 19:38 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a revision to T651: Deploy Grafana: D3377: Deploy Grafana.
Jul 26 2024, 19:38 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a revision to T1633: Collect metrics from RabbitMQ: D3376: Scrape RabbitMQ metrics into Prometheus.
Jul 26 2024, 19:34 · Operations sprints (Consolidate them all), Servers

Jul 25 2024

dereckson added a comment to T1505: Automate Let's Encrypt TLS certificates management for every server.

rOPS1e9a54c10365 has worked like a charm on WindRiver to generate grafana.nasqueron.org through DNS.

Jul 25 2024, 20:43 · Servers
dereckson triaged T1505: Automate Let's Encrypt TLS certificates management for every server as Normal priority.
Jul 25 2024, 20:42 · Servers
dereckson added a comment to T651: Deploy Grafana.

DNS: grafana. CNAME www-dev.nasqueron.org

Jul 25 2024, 18:49 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a comment to T651: Deploy Grafana.

Deployment can be using sqlite3 as long as it's still performant
as we want our monitoring tools to be resiliant.

Jul 25 2024, 18:49 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a comment to T650: Deploy PCP on Docker engines.

Probably a good part of roles/core/monitoring when grains["os_family"] == "RedHat". Eglide has "Debian" for that grain, but not sure if we've enough RAM there.

Jul 25 2024, 18:17 · Monitoring and reporting, Servers
dereckson added a comment to T652: Install PCP on Dwellers.

Just for reference, this was a test deployment. This is not currently installed on Dwellers, and needs to be in Salt as part of T650.

Jul 25 2024, 18:13 · Servers
dereckson renamed T650: Deploy PCP on Docker engines from Give access to Dwellers key statistics to Deploy PCP on Docker engines.
Jul 25 2024, 18:12 · Monitoring and reporting, Servers
dereckson raised the priority of T651: Deploy Grafana from Low to Normal.
Jul 25 2024, 18:12 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson claimed T651: Deploy Grafana.

This task has been created in 2016 to publish metrics from PCP (Performance Co-Pilot) on RHEL-like servers, especially our Docker engines.

Jul 25 2024, 18:12 · Monitoring and reporting, Operations sprints (Ignite Alkane Propulsion), Servers
dereckson added a comment to T1623: Deploy Prometheus to gain observability.

RabbitMQ exporters have been added to NetBox under the tag observability -> https://netbox.nasqueron.org/ipam/services/?tag=observability 🔒

Jul 25 2024, 18:04 · Monitoring and reporting, Operations sprints (Consolidate them all), Servers
dereckson added a subtask for T1623: Deploy Prometheus to gain observability: T1987: Dovecot Metrics.
Jul 25 2024, 18:03 · Monitoring and reporting, Operations sprints (Consolidate them all), Servers
dereckson added a subtask for T1931: Dovecot Provisioning: T1987: Dovecot Metrics.
Jul 25 2024, 18:03 · Mail, Restricted Project, Servers
dereckson added a comment to T1932: ViMbAdmin Provisioning.

Next: memcached

Jul 25 2024, 17:39 · Mail, Restricted Project, Servers
dereckson added a comment to T1633: Collect metrics from RabbitMQ.

Pending container redeployment with D3374, we can reach metrics set in D3373 with socat:

Jul 25 2024, 17:36 · Operations sprints (Consolidate them all), Servers

Jul 24 2024

dereckson added a subtask for T1950: Deploy PHP 8.3: Unknown Object (Maniphest Task).
Jul 24 2024, 19:36 · Servers, PHP 8.x support