Configuring graceful restart for dynamic routing failover

Chapter 9 High Availability for FortiOS 5.0 : HA and failover protection : Synchronizing kernel routing tables : Configuring graceful restart for dynamic routing failover

When an HA failover occurs, neighbor routers will detect that the cluster has failed and remove it from the network until the routing topology stabilizes. During the time the routers may stop sending IP packets to the cluster and communications sessions that would normally be processed by the cluster may time out or be dropped. Also the new primary unit will not receive routing updates and so will not be able to build and maintain its routing database.

You can configure graceful restart (also called nonstop forwarding (NSF)) as described in RFC3623 (Graceful OSPF Restart) to solve the problem of dynamic routing failover. If graceful restart is enabled on neighbor routers, they will keep sending packets to the cluster following the HA failover instead of removing it from the network. The neighboring routers assume that the cluster is experiencing a graceful restart.

After the failover, the new primary unit can continue to process communication sessions using the synchronized routing data received from the failed primary unit before the failover. This gives the new primary unit time to update its routing table after the failover.

You can use the following commands to enable graceful restart or NSF on Cisco routers:

router ospf 1

log-adjacency-changes

nsf ietf helper strict-lsa-checking

If the cluster is running BGP, use the following command to enable graceful restart for BGP:

config router bgp

set graceful-restart enable

end

You can also add BGP neighbors and configure the cluster unit to notify these neighbors that it supports graceful restart.

config router bgp

config neighbor

edit <neighbor_address_Ipv4>

set capability-graceful-restart enable

end

If the cluster is running OSPF, use the following command to enable graceful restart for OSFP:

config router ospf

set restart-mode graceful-restart

end

To make sure the new primary unit keeps its synchronized routing data long enough to acquire new routing data, you should also increase the HA route time to live, route wait, and route hold values to 60 using the following CLI command:

config system ha

set route-ttl 60

set route-wait 60

set route-hold 60

end