Configuring system settings : Using high availability (HA) : Example: Failover scenarios : Failover scenario 1: Temporary failure of the primary unit
Failover scenario 1: Temporary failure of the primary unit
In this scenario, the primary unit (P1) fails because of a software failure or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.
When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.
Here is what happens during this process:
1. The FortiMail HA group is operating normally.
2. The power is accidentally disconnected from P1.
3. S2’s primary heartbeat test detects that P1 has failed.
How soon this happens depends on the HA daemon configuration of S2.
4. The effective HA operating mode of S2 changes to master.
5. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11.

The following event has occurred
‘MASTER heartbeat disappeared’
The state changed from ‘SLAVE’ to ‘MASTER’
6. S2 records event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
Recovering from temporary failure of the primary unit
After P1 recovers from the hardware failure, what happens next to the HA group depends on P1’s HA On failure settings under System > High Availability > Configuration.
switch off
P1 will not process email or join the HA group until you manually select the effective HA operating mode (see “click HERE to restart the HA system” and “click HERE to restore configured operating mode”).
wait for recovery then restore original role
On recovery, P1’s effective HA operating mode resumes its configured master role. This also means that S2 needs to give back the master role to P1. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.
In the case, the S2 will send out another alert email similar to the following:
This is the HA machine at 172.16.5.11.

The following event has occurred
‘SLAVE asks us to switch roles (recovery after a restart)
The state changed from ‘MASTER’ to ‘SLAVE’
After recovery, P1 also sends out an alert email similar to the following:

This is the HA machine at 172.16.5.10.

The following critical event was detected
The system was shutdown!
wait for recovery then restore slave role
On recovery, P1’s effective HA operating mode becomes slave, and S2 continues to assume the master role. P1 then synchronizes the content of its MTA queue directories with the current master unit, S2. S2 can then deliver email that existed in P1’s MTA queue directory at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see “click HERE to restore configured operating mode”.