Chapter 9 High Availability for FortiOS 5.0 : Operating a cluster : Clusters and logging : Fortigate HA message "HA master heartbeat interface <intf_name> lost neighbor information"
  
Fortigate HA message "HA master heartbeat interface <intf_name> lost neighbor information"
The following HA log messages may be recorded by an operating cluster:
2009-02-16 11:06:34 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=critical vd=root msg="HA slave heartbeat interface internal lost neighbor information"
2009-02-16 11:06:40 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=notice vd=root msg="Virtual cluster 1 of group 0 detected new joined HA member"
2009-02-16 11:06:40 device_id=FG2001111111 log_id=0105035001 type=event subtype=ha pri=notice vd=root msg="HA master heartbeat interface internal get peer information"
These log messages indicate that the cluster units could not connect to each other over the HA heartbeat link for the period of time that is given by hb-interval x hb-lost-threshold, which is 1.2 seconds with the default values.
To diagnose this problem
1. Check all heartbeat interface connections including cables and switches to make sure they are connected and operating normally.
2. Use the following commands to display the status of the heartbeat interfaces.
get hardware nic <heartbeat_interface_name>
diagnose hardware deviceinfo nic <heartbeat_interface_name>
The status information may indicate the interface status and link status and also indicate if a large number of errors have been detected.
3. If the log message only appears during peak traffic times, increase the tolerance for missed HA heartbeat packets by using the following commands to increase the lost heartbeat threshold and heartbeat interval:
config system ha
set hb-lost-threshold 12
set hb-interval 4
end
These settings multiply by 4 the loss detection interval. You can use higher values as well.
This condition can also occur if the cluster units are located in different buildings or even different geographical locations. Called a distributed cluster, as a result of the separation it may take a relatively long time for heartbeat packets to be transmitted between cluster units. You can support a distributed cluster by increasing the heartbeat interval so that the cluster expects extra time between heartbeat packets.
4. Optionally disable session-pickup to reduce the processing load on the heartbeat interfaces.
5. Instead of disabling session-pickup you can enable session-pickup-delay to reduce the number of sessions that are synchronized. With this option enabled only sessions that are active for more than 30 seconds are synchronized.
It may be useful to monitor CPU and memory usage to check for low memory and high CPU usage. You can configure event logging to monitor CPU and memory usage. You can also enable the CPU over usage and memory low SNMP events.
Once this monitoring is in place, try and determine if there have been any changes in the network or an increase of traffic recently that could be the cause. Check to see if the problem happens frequently and if so what the pattern is.
To monitor the CPU of the cluster units and troubleshoot further, use the following procedure and commands:
get system performance status
get sys performance top 2
diagnose sys top 2
These commands repeated at frequent intervals will show the activity of the CPU and the number of sessions.
Search the Fortinet Knowledge Base for articles about monitoring CPU and Memory usage.
If the problem persists, gather the following information (a console connection might be necessary if connectivity is lost) and provide it to Technical Support when opening a ticket:
Debug log from the web‑based manager: System > Config > Advanced > Download Debug Log
CLI command output:
diag sys top 2 (keep it running for 20 seconds)
get sys perf status (repeat this command multiple times to get good samples)
get sys ha status
diagnose sys ha status
diagnose sys ha dump-by {all options}
diagnose netlink device list
diagnose hardware deviceinfo nic <heartbeat-interface-name>
execute log filter category 1
execute log display