Page MenuHomeDevCentral

[Policy] Maintenance windows for Ysul
Closed, ResolvedPublic

Description

Some of use use tmux + irssi/weechat on Ysul, and there are IRC bots.

These applications are expected to be live H24 and can't be replicated to several machines without a disconnect/reconnect.

How plans reboot?

To allow to update with minimal disruption, we could provide deployment windows every N weeks (2 to 52). Such windows would be in three parts:

  • Apply tasks needed to wait just before the reboot
  • Reboot the server
  • Apply tasks needed to wait after the reboot
  • Validate state

If the state can't be validated, in the worst case, several reboots could have to occur, even if I can't really see a scenario where 2 successive reboots were required.

A typical windows would be 1h, with a best effort to limit the service interrupt at 5-10 minutes.

Reboot or not reboot?

Frequent reboots is not a vital requirement, as excepted if there is a problem with the kernel, we can (and we do) hot patch the userland without reboot to apply security updates.

Yet, 2016 servers management lifecycle tends to deprecate the 500 days uptime paradigm once the golden rule for FreeBSD machines.

Event Timeline

dereckson updated the task description. (Show Details)
dereckson added a subscriber: amj.

What to document?

We should encourage each user to add a line to the crontab to relaunch the services. For example, to starttmux and an IRC client:

@reboot tmux -u new -d irssi
@reboot tmux -u new -d weechat

Our Mumble users have also to be taken in consideration.

@amj doesn't have any preference for the windows, but hasn't automated startup yet.

I provided @xcombelle with instructions for a @reboot tmux -u new -d irssi in cron, @amj for weechat.

dereckson claimed this task.