Page MenuHomeDevCentral

Attach OVH VIP to the CARP MASTER MAC
Needs ReviewPublic

Authored by yousra on Tue, Mar 31, 14:17.
Tags
None
Referenced Files
F26376593: D4033.diff
Wed, Apr 22, 09:41
F26365955: D4033.id10696.diff
Wed, Apr 22, 07:24
F26365950: D4033.id10684.diff
Wed, Apr 22, 07:23
F26365945: D4033.id10671.diff
Wed, Apr 22, 07:23
F26365941: D4033.id10668.diff
Wed, Apr 22, 07:23
F26365935: D4033.id10661.diff
Wed, Apr 22, 07:23
F26303566: D4033.id10570.diff
Tue, Apr 21, 18:09
F26303560: D4033.id10548.diff
Tue, Apr 21, 18:09

Details

Summary

Add a Python script to automatically manage OVH failover IP (VIP) assignment based on CARP state changes.
The script runs on router nodes and ensures the VIP is attached to the MAC of the ACTIVE node only.

We use here an absolute shebang (/usr/local/bin/python3) because devd runs with a limited environment and does not include /usr/local/bin in its PATH. This is an exception to the usual #!/usr/bin/env python3 pattern, which cannot work in this context.

Ref T2276

Test Plan
  • Triggered failover between routers
  • Confirmed that the VIP is correctly assigned to the CARP MASTER MAC in OVH
  • Confirmed that it is removed from the previous MAC
  • Verified that everything works as expected

Diff Detail

Repository
rOPS Nasqueron Operations
Lint
Lint Skipped
Unit
No Test Coverage
Branch
arcpatch-D4033
Build Status
Buildable 6637
Build 6921: arc lint + arc unit

Event Timeline

yousra requested review of this revision.Tue, Mar 31, 14:17
yousra created this revision.
This revision is now accepted and ready to land.Tue, Mar 31, 14:18

Fix the path of the script : /usr/local/scripts/carp/carp-ovh.py

dereckson requested changes to this revision.Tue, Mar 31, 21:16

If we're going to flood /var/log/messages with carp debug information, perhaps should we create a separate log topic, but that can be another change. I've created T2292.

roles/router/carp/files/carp-ovh-switch.sh
1 ↗(On Diff #10542)

This file can be skipped as we can call directly the Python script:

#   -------------------------------------------------------------
#   Application entry point
#   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


def run(interface, state):
    # here the logic part (lines 149-188)
    pass


if __name__ == "__main__":
    argc = len(sys.argv)

    if argc < 3:
        print(f"Usage: {sys.argv[0]} <subsystem> <state>", file=sys.stderr)
        sys.exit(1)

    try:
        interface = subsystem.split("@", 1)[1]
    except IndexError:
        print(f"Subsystem doesn't contain @", file=sys.stderr)
        sys.exit(2)

    run(interface, sys.argv[2])
roles/router/carp/files/carp-ovh.py
1 ↗(On Diff #10542)

For executables, best practice is to avoid to hardcode the interpreter path.

25 ↗(On Diff #10542)

Follow https://github.com/nasqueron/snippets/blob/main/python/command.py

We need to organize the script in three parts:

(1) the config
(2) the functions and helper functions applying the changes
(3) the entry point with if __name__ == "__main__"

168 ↗(On Diff #10542)

At the last iteration of the loop, delay is at 16 seconds.

We'll have waited 31 seconds + API response time.

If we abort here, we break the router as the MAC address doesn't go anywhere.

We need to retry *a lot* more time.

Don't break until 1 month.
We're already breaking with the exception if there is any issue with OVH request.

Also we need to find a way to warn for failures when we reach 128 seconds.
That's T771 job normally.

roles/router/carp/init.sls
44

There are folders for executables: /usr/local/bin or /usr/local/libexec

Executable don't have the extension

45
This revision now requires changes to proceed.Tue, Mar 31, 21:17

Many changes:

  • Improved script structure by separating configuration, helper functions, main function and application entry point.
  • Improved retry logic using a while true loop with exponential backoff instead of stopping early. The script now keeps retrying until the deletion is actually confirmed, avoiding inconsistent states where previously, the script could attempt to add the VIP while the deletion from the previous MAC was still in progress. This could make the add operation being rejected, leaving the VIP temporarily not attached to any MAC. So the script now waits and retries until the deletion is fully completed before attempting the addition.
  • Added a warn when we reach 128 seconds and VIP still not removed or added.
  • Set correct permissions (not 0 in front and string (like '0644') for Salt (more like 644), only on Ansible) for the script.
  • Executable files do not require extensions on Unix systems, as execution is determined by the shebang.

When testing the script manually, it’s useful to see the logs directly in the terminal, while still sending them to system logs like sudo python3 carp-ovh-failover 2@vmx1 MASTER

dereckson requested changes to this revision.Sat, Apr 18, 16:35

March 31 review comments seem to still need action - see https://devcentral.nasqueron.org/D4033#63080

roles/router/carp/files/carp-ovh-failover
2

#!/usr/bin/env python3

This revision now requires changes to proceed.Sat, Apr 18, 16:35

!/usr/local/bin/python3 uses a fixed Python path, while !/usr/bin/env python3 finds Python dynamically from the system’s PATH,
making it more portable (Python can be installed elsewhere depending on the OS)

  • use config dictionary directly instead of intermediate variables
  • rename script to .py in repository
  • add in config.yaml, the path of the secret

The CARP OVH failover script remains in the folder libexec as it is an internal executable triggered by devd (UNIX conventions).

The configuration file yaml has been moved to /usr/local/etc/carp/config.yaml, as files yaml should not be stored
in libexec but in etc according to UNIX conventions.

To do that we use the map.jinja to retrieve directory paths (dirs).

Actually, this config CARP will work only in FreeBSD machine because DEVD works only in FreeBSD, so it is not necessary to use dir from map.jinja,
we can put manually the path directory for the scrit carp-ovh-failover and config.yaml.

Use an absolute shebang (/usr/local/bin/python3) because devd runs with a limited environment and does not include /usr/local/bin in its PATH.
This is an exception to the usual #!/usr/bin/env python3 pattern, which cannot work in this context.

yousra edited the test plan for this revision. (Show Details)