Historically, we've the following problems with Dwellers and Equatower:
- lack of server monitoring, e.g. disk space for Mastodon issues
- lack of container monitoring, e.g. crashes of Etherpad
- lack of app monitoring, e.g. 50x for Mastodon
- Web UI was random: shipyard at first, then nothing
- logs collection
- maintenance of the numerical network ports (wiki table, then rOPS file)
- links, e.g. MySQL link acquisitariat <> other containers broke a lot of time
- how to recreate a container?
- nginx front-end configuration was hard to maintain on Dwellers (better on Equatower with rOPS standardisation and concept of services > containers)
- DNS for containers (we had it in ulubis.drake, we don't have it anymore)
- back-ends services are exposed on public IPs
- how to communicate between non container services and container services? e.g. devcentral.ulubis.drake:22 vs devcentral.nasqueron.org:5022
- containers are deployed on one host: it's dwellers OR equatower, fixed on the config file
Some things are now good, like the move to describe all the containers logic in rOPS. But all those concerns are still there.
Three complementary strategies seem to solve all those issues:
- create a service mesh, with Kuma as best candidate, to proxy at both TCP and HTTP level and implement a standard way features like metrics for monitoring, logging, tracing
- use kubernetes to simplify the network, get rid of the numerical ports issues, get DNS, expose services on stable private IPs
- install correct tools like an Elastic stack and Prometheus to collect logs and metrics - good news is Kuma and Kubernetes and CoreDNS are ready to interact with them. Kuma even provide Grafana dashboards.