Chapter 19 Troubleshooting : Common questions : How to check CPU and memory resources : How to troubleshoot high CPU usage
  
How to troubleshoot high CPU usage
FortiOS has many features. If many of them are used at the same time, it can quickly use up all the CPU resources. When this happens, you will experience connection related problems stemming from the FortiOS unit trying to manage its workload by refusing new connections, or even more aggressive methods.
Some examples of features that are CPU intensive are VPN high level encryption, having all traffic undergo all possible scanning, logging all traffic, and packets, and dashboard widgets that frequently update their data.
1. Determine how high the CPU usage is currently.
There are two main ways to do this. The easiest is to go to System > Dashboard > Status and look at the system resources widget. This is a dial gauge that displays a percentage use for the CPU. If its at the red-line, you should take action. The other method is to use the Dashboard CLI widget to enter diag sys top.
Sample output:
Run Time: 11 days, 23 hours and 36 minutes
0U, 0S, 98I; 1977T, 758F, 180KF
newcli 286 R 0.1 0.8
ipsengine 78 S < 0.0 3.1
ipsengine 64 S < 0.0 3.0
ipsengine 77 S < 0.0 3.0
ipsengine 68 S < 0.0 2.9
ipsengine 66 S < 0.0 2.9
ipsengine 79 S < 0.0 2.9
scanunitd 133 S < 0.0 1.8
pyfcgid 267 S 0.0 1.8
pyfcgid 269 S 0.0 1.7
pyfcgid 268 S 0.0 1.6
httpsd 139 S 0.0 1.6
pyfcgid 266 S 0.0 1.5
scanunitd 131 S < 0.0 1.4
scanunitd 132 S < 0.0 1.4
proxyworker 90 S 0.0 1.3
cmdbsvr 43 S 0.0 1.1
proxyworker 91 S 0.0 1.1
miglogd 55 S 0.0 1.1
httpsd 135 S 0.0 1.0
 
Where the codes displayed on the second output line mean the following:
U is % of user space applications using CPU. In the example, 0U means 0% of the user space applications are using CPU.
S is % of system processes (or kernel processes) using CPU. In the example, 0S means 0% of the system processes are using the CPU.
I is % of idle CPU. In the example, 98I means the CPU is 98% idle.
T is the total FortiOS system memory in Mb. In the example, 1977T means there are 1977 Mb of system memory.
F is free memory in Mb. In the example, 758F means there is 758 Mb of free memory.
KF is the total shared memory pages used. In the example, 180KF means the system is using 180 shared memory pages.
 
Each additional line of the command output displays information for each of the processes running on the FortiGate unit. For example, the third line of the output is:
newcli 286 R 0.1 0.8
Where:
newcli is the process name. Other process names can include ipsengine, sshd, cmdbsrv, httpsd, scanunitd, and miglogd.
286 is the process ID. The process ID can be any number.
R is the current state of the process. The process state can be:
R running
S sleep
Z zombie
D disk sleep.
0.1 is the amount of CPU that the process is using. CPU usage can range from 0.0 for a process that is sleeping to higher values for a process that is taking a lot of CPU time.
0.8 is the amount of memory that the process is using. Memory usage can range from 0.1 to 5.5 and higher.
Enter the following single-key commands when diagnose sys top is running:
Press q to quit and return to the normal CLI prompt.
Press p to sort the processes by the amount of CPU that the processes are using.
Press m to sort the processes by the amount of memory that the processes are using.
2. Determine what features are using most of the CPU resources.
There is a command in the CLI to let you see the top few processes currently running that use the most CPU resources. The CLI command get system performance top outputs a table of information. You are interested in the second most right column, CPU usage by percentage. If the top few entries are using most of the CPU, note which processes they are and investigate those features to try and reduce their CPU load. Some examples of processes you will see include:
ipsengine — the IPS engine that scans traffic for intrusions
scanunitd — antivirus scanner
httpsd — secure HTTP
iked — internet key exchange (IKE) in use with IPsec VPN tunnels
newcli — active whenever you are accessing the CLI
sshd — there are active secure socket connections
cmdbsrv — the command database server application
Go to the features that are at the top of the list and look for evidence of them overusing the CPU. Generally the monitor for a feature is a good place to start.
3. Check for unnecessary CPU “wasters”.
These are some best practises that will reduce your CPU usage, even if you are not experiencing high CPU usage. Note that if you require a feature this section tells you to turn off, ignore it.
Use hardware acceleration wherever possible to offload tasks from the CPU. Offloading tasks such as encryption frees up the CPU for other tasks.
Avoid the use of GUI widgets that require computing cycles, such as the Top Sessions widget. These widgets are constantly polling the system for their information, which uses CPU and other resources.
Schedule antivirus, IPS, and firmware updates during off peak hours. Usually these don’t consume CPU resources but they can disrupt normal operation.
Check the log levels and which events are being logged. This is the severity of the messages that are recorded. Consider going up one level to reduce the amount of logging. Also if there are events you do not need to monitor, remove them from the list.
Log to FortiCloud instead of memory or Disk. Logging to memory quickly uses up resources. Logging to local disk will impact overall performance and reduce the lifetime of the unit. Fortinet recommends logging to FortiCloud which doesn’t use much CPU.
If the disk is almost full, transfer the logs or data off the disk to free up space. When a disk is almost full it consumes a lot of resources to find the free space and organize the files.
If you have packet logging enabled, consider disabling it. When it’s enabled it records every packet that comes through that policy.
Halt all sniffers and traces.
Ensure you are not scanning traffic twice. If traffic enters the FortiGate unit on one interface, goes out another, and then comes back in again that traffic does not need to be rescanned. Doing so is a waste of resources. However, ensure that traffic truly is being scanned once.
Reduce the session timers to close unused sessions faster. To do this in the CLI enter the following commands and values. These values reduce the values from defaults. Note that tcp-timewait has 10 seconds added by the system by default.
config system global
set tcp-halfclose-timer 30
set tcp-halfopen-timer 30
set tcp-timewait-timer 0
set udp-idle-timer 60
end
Enable only features that you need under System > Config > Features.
4. When CPU usage is under control, use SNMP to monitor CPU usage. Alternately, use logging to record CPU and memory usage every 5 minutes.
Once things are back to normal, you should set up a warning system to alert you of future CPU overusage. A common method to do this is with SNMP. SNMP monitors many values on the FortiOS and allows you to set high water marks that will generate events. You run an application on your computer to watch for and record these events. Go to System > Config > SNMP to enable and configure an SNMP community. If this method is too complicated, you can use the System Resources widget to record CPU usage. However, this method will not alert you to problems - it will just record them as they happen.