Page MenuHomeDevCentral

Attach OVH VIP to the CARP MASTER MAC
Needs ReviewPublic

Authored by yousra on Tue, Mar 31, 14:17.
Tags
None
Referenced Files
F25169796: D4033.diff
Thu, Apr 2, 23:43
Unknown Object (File)
Thu, Apr 2, 07:34
Unknown Object (File)
Thu, Apr 2, 07:07
Unknown Object (File)
Thu, Apr 2, 07:07
Unknown Object (File)
Thu, Apr 2, 05:28
Unknown Object (File)
Wed, Apr 1, 15:26
Unknown Object (File)
Wed, Apr 1, 15:18

Details

Summary

Add a Python script to automatically manage OVH failover IP (VIP) assignment based on CARP state changes.
The script runs on router nodes and ensures the VIP is attached to the MAC of the MASTER node only.

Ref T2276

Test Plan
  • Triggered failover between routers
  • Confirmed that the VIP is correctly assigned to the MASTER MAC in OVH
  • Confirmed that it is removed from the previous MAC
  • Verified that everything works as expected

Diff Detail

Repository
rOPS Nasqueron Operations
Lint
Lint Skipped
Unit
No Test Coverage
Branch
script-ovh-carp
Build Status
Buildable 6556
Build 6840: arc lint + arc unit

Event Timeline

yousra requested review of this revision.Tue, Mar 31, 14:17
yousra created this revision.
This revision is now accepted and ready to land.Tue, Mar 31, 14:18

Fix the path of the script : /usr/local/scripts/carp/carp-ovh.py

dereckson requested changes to this revision.Tue, Mar 31, 21:16

If we're going to flood /var/log/messages with carp debug information, perhaps should we create a separate log topic, but that can be another change. I've created T2292.

roles/router/carp/files/carp-ovh-switch.sh
1 ↗(On Diff #10542)

This file can be skipped as we can call directly the Python script:

#   -------------------------------------------------------------
#   Application entry point
#   - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


def run(interface, state):
    # here the logic part (lines 149-188)
    pass


if __name__ == "__main__":
    argc = len(sys.argv)

    if argc < 3:
        print(f"Usage: {sys.argv[0]} <subsystem> <state>", file=sys.stderr)
        sys.exit(1)

    try:
        interface = subsystem.split("@", 1)[1]
    except IndexError:
        print(f"Subsystem doesn't contain @", file=sys.stderr)
        sys.exit(2)

    run(interface, sys.argv[2])
roles/router/carp/files/carp-ovh.py
1 ↗(On Diff #10542)

For executables, best practice is to avoid to hardcode the interpreter path.

25 ↗(On Diff #10542)

Follow https://github.com/nasqueron/snippets/blob/main/python/command.py

We need to organize the script in three parts:

(1) the config
(2) the functions and helper functions applying the changes
(3) the entry point with if __name__ == "__main__"

168 ↗(On Diff #10542)

At the last iteration of the loop, delay is at 16 seconds.

We'll have waited 31 seconds + API response time.

If we abort here, we break the router as the MAC address doesn't go anywhere.

We need to retry *a lot* more time.

Don't break until 1 month.
We're already breaking with the exception if there is any issue with OVH request.

Also we need to find a way to warn for failures when we reach 128 seconds.
That's T771 job normally.

roles/router/carp/init.sls
44

There are folders for executables: /usr/local/bin or /usr/local/libexec

Executable don't have the extension

45
This revision now requires changes to proceed.Tue, Mar 31, 21:17

Many changes:

  • Improved script structure by separating configuration, helper functions, main function and application entry point.
  • Improved retry logic using a while true loop with exponential backoff instead of stopping early. The script now keeps retrying until the deletion is actually confirmed, avoiding inconsistent states where previously, the script could attempt to add the VIP while the deletion from the previous MAC was still in progress. This could make the add operation being rejected, leaving the VIP temporarily not attached to any MAC. So the script now waits and retries until the deletion is fully completed before attempting the addition.
  • Added a warn when we reach 128 seconds and VIP still not removed or added.
  • Set correct permissions (not 0 in front and string (like '0644') for Salt (more like 644), only on Ansible) for the script.
  • Executable files do not require extensions on Unix systems, as execution is determined by the shebang.