Page MenuHomeDevCentral

Automate CARP VIP MAC reassignment using devd and OVH API
Open, NormalPublic

Description

Implement automatic reassignment of the public VIP to the physical MAC of the router currently in CARP MASTER state.

This is done by reacting to CARP state changes using devd : when a specific event happened, it triggers a script that calls the OVH API to update the physical MAC associated with the VIP.

Why ??? This avoids situations where the backup router receives traffic for the VIP while the other node is the actual CARP master.


Steps before writing the script :

  • 1 Observing the CARP state change through kernel logs.

command : tail -f /var/log/messages

Mar 22 14:49:21 router-002 kernel: carp: 2@vmx1: BACKUP -> MASTER (master timed out)

  • 2. Testing a rule to understand how devd works

On the file /usr/local/etc/devd/carp.conf :

notify 0 {
    match "system" "IFNET";
    match "subsystem" "vmx1";
    action "logger CARP state change detected";
};

Then : sudo service devd restart
Then : simulate failover of Master

Mar 22 15:11:37 router-002 yousra[8464]: CARP state change detected
Mar 22 15:11:37 router-002 kernel: carp: 2@vmx1: BACKUP -> MASTER (master timed out)

The FreeBSD kernel already writes a log when a CARP state change occurs, for example, when switching from BACKUP to MASTER, but this message is purely informational and doesn't trigger any automatic action. Using devd, we can detect this event and execute a custom action, such as running a script that automatically updates the MAC address associated with the VIP at OVH so that traffic always arrives on the MASTER router.


Steps for the preparation of the script carp-ovh-failover.py :

  • 1. Write the secretsmith YAML configuration /usr/local/etc/secretsmith.yaml
  • 2. Verify the connection to Vault via a test script
  • 3. Allow role router to access ops/secrets/network/router/vault useful for Salt (D4029).
  • 4. Verify with a script that we can access to credentials OVH

Steps for the conception of the script carp-ovh-failover.py :

  • 1. Create the deployment of secretsmith.yaml via Salt on role/router D4031
  • 2. The Vault AppRole credentials should be also rotate D4026
  • 3. Generate the script (first without Salt), and test how it works
  • 4. Create the deployment of /usr/local/etc/devd/carp.conf via Salt D4032.
  • 5. Create the deployment of CARP OVH failover python script via Salt D4033.
  • 6. Add some useful scripts via Salt for debugging D4034.

Event Timeline

yousra triaged this task as Normal priority.Tue, Mar 17, 13:39
yousra created this task.

devd

If we trigger the script through devd, we can provide a .conf configuration file in /usr/local/etc/devd.

Use of action "logger CARP is now active/backup" is probably a good idea to ensure we trigger correctly the right event.

OVH API

A Python library as API client exists, https://github.com/ovh/python-ovh
Secrets can be stored in Vault.

It's unclear if we should create a full Python package with abstraction, and then a concrete implementation for OVH, or if we should write a small script doing this task.

It's especially unclear if we need to do that in Python as:

My take on this is we can write it in Python, ideally with separation of concerns between the operation and the OVH API execution, and we'll see if we evolve this into a full application when we need more API calls or more ISP.

Vault

OVH API credentials published to apps/network/carp-hyper-001-switch path,
under application_key, application_secret, consumer_key keys.

D4016 offers a policy to access to that path.

It will store in ops/secrets/network/router/vault credentials allowing the router script to query apps/network/carp-hyper-001-switch.

To clarify which path contains what:

Mounting pointFull path Scope Description
opsops/secrets/network/router/vaultSaltCredentials for the router script to access Vault
appsapps/network/carp-hyper-001-switchCARP switch scriptCredentials to query OVH API
yousra updated the task description. (Show Details)
yousra updated the task description. (Show Details)

A dedicated devd file was placed in /usr/local/etc/devd because this directory is usually used for custom configurations added by administrators, while /etc/devd contains the default system rules from FreeBSD. It makes the setup cleaner, avoids mixing custom logic with system configuration, and makes future maintenance or upgrades easier.

notify 0 {
    match "system" "IFNET";
    match "subsystem" "vmx1";
    action "logger CARP state change detected";
};

We also need to add a type so you know what event you caught, like LINK_UP or LINK_DOWN:

/usr/local/etc/devd/carp.conf
notify 0 {
    match "system" "IFNET";
    match "subsystem" "vmx1";
    match "type" "LINK_UP";
    action "logger CARP primary, need to register this MAC address to our ISP";
};

For Salt, it will need to be a template, as the interface won't be always not vmx1 (even if we don't change the order, the driver name will change when we'll migrate outside VMWare).

@dereckson I first tried to redefine the devd rule by matching specific IFNET event types such as LINK_UP, LINK_DOWN, UP and DOWN, but none of them were triggered during CARP state changes in my tests.

So I temporarily switched to a generic IFNET rule and added logging to inspect the event details:

notify 0 {
    match "system" "IFNET";
    match "subsystem" "vmx1";
    action "logger TYPE=$type SUBSYSTEM=$subsystem";
};

When BACKUP → MASTER, I observed that the event received by devd was ADDR_ADD on vmx1.

Mar 22 16:57:22 router-002 yousra[9909]: TYPE=ADDR_ADD SUBSYSTEM=vmx1

When MASTER → BACKUP, I observed that the event received by devd was ADDR_DEL on vmx1.

Mar 22 17:03:33 router-002 yousra[9967]: TYPE=ADDR_DEL SUBSYSTEM=vmx1

According carp(4) (man carp) examples section, the name has changed.

(At worst, there is also /usr/src/sys/netinet/ip_carp.c to check, but here man page contains the devd hooks name)

For all CARP external documentation, I think I've found the threshold where information is outdated in that man page:

carp(4) states:

In FreeBSD 10.0, carp was significantly
rewritten, and is no longer a pseudo-interface.

FreeBSD 10 was released in January 2014. Older documentation needs to be checked against newer versions.

notify 0 {
    match "system" "CARP";
    match "subsystem" "[0-9]+@[0-9a-z.]+";
    match "type" "(MASTER|BACKUP)";
    action "/usr/local/scripts/carp-test.sh";
};

The script :

#!/bin/sh

STATE=$(ifconfig vmx1 | grep carp)

if echo "$STATE" | grep -q MASTER; then
    logger "HELLO I am the MASTER"
else
    logger "HELLO I AM the BACKUP"
fi
Mar 22 20:14:23 router-002 yousra[11076]: HELLO I AM the BACKUP
Mar 22 20:14:23 router-002 kernel: carp: 2@vmx1: MASTER -> BACKUP (more frequent advertisement received)
Mar 22 20:14:23 router-002 kernel: in_scrubprefix: err=65, prefix delete failed
Mar 22 20:14:37 router-002 kernel: carp: 2@vmx1: BACKUP -> MASTER (master timed out)
Mar 22 20:14:37 router-002 yousra[11086]: HELLO I am the MASTER

The direct CARP devd hook works correctly in practice. I added a script that checks whether the router is currently MASTER or BACKUP, so we can safely trigger the OVH update only when the node becomes MASTER, and avoid applying the MAC reassignment from the backup node.

You can directly use variables in the action to pass interface and state with $subsystem and $type

The file /usr/local/etc/devd/carp.conf :

notify 0 {
    match "system" "CARP";
    match "subsystem" "[0-9]+@[0-9a-z.]+";
    match "type" "(MASTER|BACKUP)";
    action "/usr/local/scripts/carp-test.sh $subsystem $type";
};

The script :

#!/bin/sh

SUBSYSTEM="$1" # first argument
STATE="$2" # second argument

IFACE=$(echo "$SUBSYSTEM" | cut -d@ -f2)

logger "CARP event: interface=$IFACE state=$STATE"
Mar 22 21:25:50 router-002 yousra[11556]: CARP event: interface=vmx1 state=MASTER
Mar 22 21:25:50 router-002 kernel: carp: 2@vmx1: BACKUP -> MASTER (master timed out)
Mar 22 21:27:55 router-002 yousra[11564]: CARP event: interface=vmx1 state=BACKUP
Mar 22 21:27:55 router-002 kernel: carp: 2@vmx1: MASTER -> BACKUP (more frequent advertisement received)

Ah, that's now what we need, nice for the script!

yousra updated the task description. (Show Details)

The script to test the connection to Vault, using a YAML configuration file that tells the secretsmith client how to connect to Vault :

import secretsmith

VAULT_CONFIG_PATH = "/usr/local/etc/secretsmith.yaml"

vault_client = secretsmith.login(config_path=VAULT_CONFIG_PATH)

print("OK connected to Vault")

The result :

[yousra@router-002 /usr/local/scripts]$ python3 /usr/local/scripts/test_connection_vault.py 
OK connected to Vault

Vault will return a client token after authentication. This token can be accessed in the script using vault_client.token.

yousra updated the task description. (Show Details)

The script to test if we can access to the OVH credentials (application_key, application_secret, consumer_key):

import secretsmith
from secretsmith.vault import secrets

VAULT_CONFIG_PATH = "/usr/local/etc/secretsmith.yaml"

vault_client = secretsmith.login(config_path=VAULT_CONFIG_PATH)

print("OK connected")

print("token :", vault_client.token)

secret = secrets.read_secret(vault_client, "apps", "network/carp-hyper-001-switch")

print(secret)

At the end, the script was able to successfully access to the OVH credentials.

[yousra@router-002 /usr/local/scripts]$ python3 /usr/local/scripts/test_access_secrets_ovh.py 
OK connected
token : xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
OVH credentials : {'application_key': 'xxxxx, 'application_secret': 'xxxxx', 'consumer_key': 'xxxxx'}
yousra updated the task description. (Show Details)
yousra updated the task description. (Show Details)

When router-003 (The MASTER) become unavailable :

Apr  1 13:59:33 router-002 kernel: carp: 2@vmx1: BACKUP -> MASTER (master timed out)
Apr  1 13:59:34 router-002 carp-ovh[97816]: Detected MAC on vmx1: 00:50:56:09:3c:f2
Apr  1 13:59:34 router-002 carp-ovh[97820]: Checking current state...
Apr  1 13:59:34 router-002 carp-ovh[97824]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 13:59:34 router-002 carp-ovh[97828]: OVH returned: ['178.32.70.110']
Apr  1 13:59:34 router-002 carp-ovh[97832]: Checking IPs for MAC 00:50:56:09:98:fc
Apr  1 13:59:34 router-002 carp-ovh[97836]: OVH returned: ['51.68.252.230', '178.32.70.111']
Apr  1 13:59:34 router-002 carp-ovh[97840]: Deleting VIP from 00:50:56:09:98:fc
Apr  1 13:59:35 router-002 carp-ovh[97844]: DELETE request accepted
Apr  1 13:59:35 router-002 carp-ovh[97848]: Waiting deletion... (sleep 1s)
Apr  1 13:59:36 router-002 carp-ovh[97852]: Checking IPs for MAC 00:50:56:09:98:fc
Apr  1 13:59:36 router-002 carp-ovh[97856]: OVH returned: ['51.68.252.230', '178.32.70.111']
Apr  1 13:59:36 router-002 carp-ovh[97860]: Waiting deletion... (sleep 2s)
Apr  1 13:59:38 router-002 carp-ovh[97864]: Checking IPs for MAC 00:50:56:09:98:fc
Apr  1 13:59:38 router-002 carp-ovh[97868]: OVH returned: ['51.68.252.230', '178.32.70.111']
Apr  1 13:59:38 router-002 carp-ovh[97872]: Waiting deletion... (sleep 4s)
Apr  1 13:59:42 router-002 carp-ovh[97876]: Checking IPs for MAC 00:50:56:09:98:fc
Apr  1 13:59:42 router-002 carp-ovh[97880]: OVH returned: ['178.32.70.111', '51.68.252.230']
Apr  1 13:59:42 router-002 carp-ovh[97884]: Waiting deletion... (sleep 8s)
Apr  1 13:59:50 router-002 carp-ovh[97888]: Checking IPs for MAC 00:50:56:09:98:fc
Apr  1 13:59:50 router-002 carp-ovh[97892]: OVH returned: ['178.32.70.111', '51.68.252.230']
Apr  1 13:59:50 router-002 carp-ovh[97896]: Waiting deletion... (sleep 16s)
Apr  1 14:00:06 router-002 carp-ovh[97944]: Checking IPs for MAC 00:50:56:09:98:fc
Apr  1 14:00:07 router-002 carp-ovh[97948]: OVH returned: ['178.32.70.111']
Apr  1 14:00:07 router-002 carp-ovh[97952]: Deletion confirmed: VIP removed from 00:50:56:09:98:fc
Apr  1 14:00:07 router-002 carp-ovh[97956]: Adding VIP to 00:50:56:09:3c:f2
Apr  1 14:00:08 router-002 carp-ovh[97960]: ADD request accepted
Apr  1 14:00:08 router-002 carp-ovh[97964]: Checking add... (sleep 1s)
Apr  1 14:00:09 router-002 carp-ovh[97968]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 14:00:09 router-002 carp-ovh[97972]: OVH returned: ['178.32.70.110']
Apr  1 14:00:09 router-002 carp-ovh[97976]: Checking add... (sleep 2s)
Apr  1 14:00:11 router-002 carp-ovh[97980]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 14:00:11 router-002 carp-ovh[97984]: OVH returned: ['178.32.70.110']
Apr  1 14:00:11 router-002 carp-ovh[97988]: Checking add... (sleep 4s)
Apr  1 14:00:15 router-002 carp-ovh[97992]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 14:00:15 router-002 carp-ovh[97996]: OVH returned: ['178.32.70.110']
Apr  1 14:00:15 router-002 carp-ovh[98000]: Checking add... (sleep 8s)
Apr  1 14:00:23 router-002 carp-ovh[98004]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 14:00:24 router-002 carp-ovh[98008]: OVH returned: ['178.32.70.110']
Apr  1 14:00:24 router-002 carp-ovh[98012]: Checking add... (sleep 16s)
Apr  1 14:00:40 router-002 carp-ovh[98016]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 14:00:40 router-002 carp-ovh[98020]: OVH returned: ['51.68.252.230', '178.32.70.110']
Apr  1 14:00:40 router-002 carp-ovh[98024]: Addition confirmed: VIP attached to 00:50:56:09:3c:f2
Apr  1 14:00:40 router-002 carp-ovh[98028]: Script finished successfully

Router-002 takes over as MASTER.
The script detects the state change and checks the current VIP assignment on OVH.

If the VIP is still associated with the MAC address of router-003, it is removed from OVH.
The script then waits until the deletion is confirmed via the API.

Once the VIP is no longer present on router-003, it is assigned to the MAC address of router-002.

This ensures that the OVH failover IP always follows the active (MASTER) router.

Apr  1 14:01:39 router-002 kernel: carp: 2@vmx1: BACKUP -> MASTER (master timed out)
Apr  1 14:01:39 router-002 carp-ovh[98049]: Detected MAC on vmx1: 00:50:56:09:3c:f2
Apr  1 14:01:39 router-002 carp-ovh[98053]: Checking current state...
Apr  1 14:01:39 router-002 carp-ovh[98057]: Checking IPs for MAC 00:50:56:09:3c:f2
Apr  1 14:01:39 router-002 carp-ovh[98061]: OVH returned: ['178.32.70.110', '51.68.252.230']
Apr  1 14:01:39 router-002 carp-ovh[98065]: VIP is already on correct MAC -> nothing to do

When router-002 becomes MASTER and the VIP is already correctly assigned:

The script detects that the VIP is already associated with the MAC address of router-002 on OVH.
In this case, no action is required, and the script exits without performing any modification.

This avoids unnecessary API calls and ensures stability when the state is already correct.

yousra updated the task description. (Show Details)
Apr  1 14:03:22 router-002 kernel: carp: 2@vmx1: MASTER -> BACKUP (more frequent advertisement received)
Apr  1 14:03:22 router-002 kernel: in_scrubprefix: err=65, prefix delete failed
Apr  1 14:03:22 router-002 carp-ovh[98074]: Not MASTER -> exit

When router-002 becomes BACKUP again because router-003 has rejoined the network (higher priority and preemption enabled), the script exits immediately and does not perform any OVH change. This is intentional: only the node entering the MASTER state is allowed to move the VIP. Avoiding actions on BACKUP transitions prevents both routers from trying to update the OVH mapping at the same time and reduces the risk of conflicting operations during CARP failover.