Docker streams events through docker system events or the SDKs.
Plan is to listen to those events to offer a reactor.
Concept
A configuration file describes events we're interested by and what to do with them.
A daemon listens to Docker events, compare each event received with our config and run what's needed.
Actions would be simple commands to run in a first time.
Prior art
Not a lot of software does that.
- https://tech.instacart.com/introducing-ahab-docker-event-handling-9ee0d30452df - lasts commits unreleased, not actively maintained
- https://github.com/diefans/docker-events - goal was to update SkyDNS records when a container is live
As getting the event loop is as simple as getting a Docker client and calling events() on it, the real code will be the matching with our configuration file.
As such, prior art isn't satisfactory.
Use cases
Use case example: metrics
Prometheus can get very nice timeseries for container start, command execution (so we can notice the healthchecks configured to run every second). See T1623 and T1392.
Use case example: propagate events
We can use notification-push from T771 to get for production Docker engines like docker-002 notifications about containers lifecycles (start/stop/die).
Salt has its own reactor, we can pass Docker events too there to run states when an event occurs.
For example, we can react to Let's Encrypt certificates containers stop to see if we need to restore nginx configuration for a new site once its certificate is first issued.
events_reactor: Fire notification on container lifecycle event: event: type: container run: [ "notification-push", "--project", "Nasqueron", "--service", "Docker", "--type", "container.%%action%%", "--group", "ops", "--text", "Docker container %%actor.name%%: %%action%%", "--link", "%%host%%" ] Propagate all events to Salt: run: ["sudo", "salt-call", "event.send", "docker", "%%payload%%"]
For those two use cases:
- The reactor should pass our run key to subprocess.run after a replace %%key%% by values from the event.
- If no event matching rules are specified, it's a catchall.
Use case example: hotfix
That also can allow to provide hotfixes for operational issues, like "Sentry Relay has a weird issue resolving DNS for sentry_web, but not for sentry_kafka." by updating automatically Relay configuration file with the IP of Sentry Web:
events_reactor: Update Sentry Relay configuration: event: type: container action: start actor: name: sentry_web run: ["docker-paas-propagate-sentry-relay", "%%ip%%"]
That seems more straightforward if we've a Docker library available to use it to get metadata like IP than doing a docker inspect.