NP6 and NP6lite acceleration
NP6 and NP6lite network processors provide fastpath acceleration by offloading communication sessions from the FortiGate CPU. When the first packet of a new session is received by an interface connected to an NP6 processor, just like any session connecting with any FortiGate interface, the session is forwarded to the FortiGate CPU where it is matched with a security policy. If the session is accepted by a security policy and if the session can be offloaded its session key is copied to the NP6 processor that received the packet. All of the rest of the packets in the session are intercepted by the NP6 processor and fast-pathed out of the FortiGate unit to their destination without ever passing through the FortiGate CPU. The result is enhanced network performance provided by the NP6 processor plus the network processing load is removed from the CPU. In addition the NP6 processor can handle some CPU intensive tasks, like IPsec VPN encryption/decryption.
NP6lite processors have the same architecture and function in the same way as NP6 processors. All of the descriptions of NP6 processors in this document can be applied to NP6lite possessors except where noted. |
Session keys (and IPsec SA keys) are stored in the memory of the NP6 processor that is connected to the interface that received the packet that started the session. All sessions are fast-pathed and accelerated, even if they exit the FortiGate unit through an interface connected to another NP6. There is no dependence on getting the right pair of interfaces since the offloading is done by the receiving NP6.
The key to making this possible is an Integrated Switch Fabric (ISF) that connects the NP6s and the FortiGate unit interfaces together. Many FortiGate units with NP6 processors also have an ISF. The ISF allows any port connectivity. All ports and NP6s can communicate with each other over the ISF. There are no special ingress and egress fast path requirements as long as traffic enters and exits on interfaces connected to the same ISF.
Some FortiGate units, such as the FortiGate-1000D include multiple NP6 processors that are not connected by an ISF. Because the ISF is not present fast path acceleration is supported only between interfaces connected to the same NP6 processor. Since the ISF introduces some latency, models with no ISF provide low-latency network acceleration between network interfaces connected to the same NP6 processor.
Each NP6 has a maximum throughput of 40 Gbps using 4 x 10 Gbps XAUI or Quad Serial Gigabit Media Independent Interface (QSGMII) interfaces or 3 x 10 Gbps and 16 x 1 Gbps XAUI or QSGMII interfaces.
There are at least two limitations to keep in mind:
- The capacity of each NP6 processor. An individual NP6 processor can support between 10 and 16 million sessions. This number is limited by the amount of memory the processor has. Once an NP6 processor hits its session limit, sessions that are over the limit are sent to the CPU. You can avoid this problem by as much as possible distributing incoming sessions evenly among the NP6 processors. To be able to do this you need to be aware of which interfaces connect to which NP6 processors and distribute incoming traffic accordingly.
- The NP6 processors in some FortiGate units employ NP direct technology that removes the ISF. The result is very low latency but no inter-processor connectivity requiring you to make sure that traffic to be offloaded enters and exits the FortiGate through interfaces connected to the same NP processor.
NP6 session fast path requirements
NP6 processors can offload the following traffic and services:
- IPv4 and IPv6 traffic and NAT64 and NAT46 traffic (as well as IPv4 and IPv6 versions of the following traffic types where appropriate).
- Link aggregation (LAG) (IEEE 802.3ad) traffic and traffic from static redundant interfaces (see Increasing NP6 offloading capacity using link aggregation groups (LAGs)).
- TCP, UDP, ICMP, SCTP, and RDP traffic.
- IPsec VPN traffic, and offloading of IPsec encryption/decryption (including SHA2-256 and SHA2-512)
- NP6 processor IPsec engines support null, DES, 3DES, AES128, AES192, and AES256 encryption algorithms
- NP6 processor IPsec engines support null, MD5, SHA1, SHA256, SHA 384, and SHA512 authentication algorithms
- IPsec traffic that passes through a FortiGate without being unencrypted.
- Anomaly-based intrusion prevention, checksum offload and packet defragmentation.
- IPIP tunneling (also called IP in IP tunneling), SIT tunneling, and IPv6 tunneling sessions.
- Multicast traffic (including Multicast over IPsec).
- CAPWAP and wireless bridge traffic tunnel encapsulation to enable line rate wireless forwarding from FortiAP devices (not supported by the NP6lite).
- Traffic shaping and priority queuing for both shared and per IP traffic shaping.
- Syn proxying (not supported by the NP6lite).
- DNS session helper (not supported by the NP6lite)/
- Inter-VDOM link traffic.
Sessions that are offloaded must be fast path ready. For a session to be fast path ready it must meet the following criteria:
- Layer 2 type/length must be 0x0800 for IPv4 or 0x86dd for IPv6 (IEEE 802.1q VLAN specification is supported).
- Layer 3 protocol can be IPv4 or IPv6.
- Layer 4 protocol can be UDP, TCP, ICMP, or SCTP.
- In most cases, Layer 3 / Layer 4 header or content modification sessions that require a session helper can be offloaded.
- Local host traffic (originated by the FortiGate unit) can be offloaded.
- If the FortiGate supports, NTurbo sessions can be offloaded if they are accepted by firewall policies that include IPS, Application Control, CASI, flow-based antivirus, or flow-based web filtering.
Offloading Application layer content modification is not supported. This means that sessions are not offloaded if they are accepted by firewall policies that include proxy-based virus scanning, proxy-based web filtering, DNS filtering, DLP, Anti-Spam, VoIP, ICAP, Web Application Firewall, or Proxy options.
If you disable anomaly checks by Intrusion Prevention (IPS), you can still enable hardware accelerated anomaly checks using the fp-anomaly field of the config system interface CLI command. See Configuring individual NP6 processors. |
If a session is not fast path ready, the FortiGate unit will not send the session key or IPsec SA key to the NP6 processor. Without the session key, all session key lookup by a network processor for incoming packets of that session fails, causing all session packets to be sent to the FortiGate unit’s main processing resources, and processed at normal speeds.
If a session is fast path ready, the FortiGate unit will send the session key or IPsec SA key to the network processor. Session key or IPsec SA key lookups then succeed for subsequent packets from the known session or IPsec SA.
Packet fast path requirements
Packets within the session must then also meet packet requirements.
- Incoming packets must not be fragmented.
- Outgoing packets must not require fragmentation to a size less than 385 bytes. Because of this requirement, the configured MTU (Maximum Transmission Unit) for a network processor’s network interfaces must also meet or exceed the network processors’ supported minimum MTU of 385 bytes.
Mixing fast path and non-fast path traffic
If packet requirements are not met, an individual packet will be processed by the FortiGate CPU regardless of whether other packets in the session are offloaded to the NP6.
Also, in some cases, a protocol’s session(s) may receive a mixture of offloaded and non-offloaded processing. For example, VoIP control packets may not be offloaded but VoIP data packets (voice packets) may be offloaded.
NP6Lite processors
The NP6Lite works the same way as the NP6. Being a lighter version, the NP6Lite has a lower capacity than the NP6. The NP6lite max throughput is 10 Gbps using 2x QSGMII and 2x Reduced gigabit media-independent interface (RGMII) interfaces.
Also, the NP6lite does not offload the following types of sessions:
- CAPWAP
- Syn proxy
- DNS session helper
NP6 and NP6Lite processors and sFlow and NetFlow
NP6 and NP6Lite offloading is supported when you configure NetFlow for interfaces connected to NP6 or NP6Lite processors. Offloading of other sessions is not affected by configuring NetFlow.
Configuring sFlow on any interface disables all NP6 and NP6Lite offloading for all traffic on that interface.
NP6 processors and traffic shaping
NP6-offloaded sessions support most types of traffic shaping. However, in bandwidth and out bandwidth traffic shaping, set using the following command, is not supported:
config system interface
edit port1
set outbandwidth <value>
set inbandwidth <value>
end
Configuring in bandwidth traffic shaping has no effect. Configuring out bandwidth traffic shaping imposes more limiting than configured, potentially reducing throughput more than expected.
NP Direct
On FortiGates with more than one NP6 processor, removing the Internal Switch Fabric (ISF) for NP Direct architecture provides direct access to the NP6 processors for the lowest latency forwarding. Because the NP6 processors are not connected, care must be taken with network design to make sure that all traffic to be offloaded enters and exits the FortiGate through interfaces connected to the same NP6 processor. As well Link Aggregation (LAG) interfaces should only include interfaces all connected to the same NP6 processor.
Example NP direct hardware with more than one NP6 processor includes:
- Ports 25 to 32 of the FortiGate-3700D in low latency mode.
- FortiGate-2000E
- FortiGate-2500E
Viewing your FortiGate NP6 processor configuration
Use either of the following commands to view the NP6 processor hardware configuration of your FortiGate unit:
get hardware npu np6 port-list
diagnose npu np6 port-list
If your FortiGate has NP6lite processors, you can use either of the following commands:
get hardware npu np6lite port-list
diagnose npu np6lite port-list
For example, for the FortiGate-5001D the output would be:
get hardware npu np6 port-list Chip XAUI Ports Max Cross-chip Speed offloading ------ ---- ------- ----- ---------- np6_0 0 port3 10G Yes 1 2 base1 1G Yes 3 0-3 port1 40G Yes 0-3 fabric1 40G Yes 0-3 fabric3 40G Yes 0-3 fabric5 40G Yes ------ ---- ------- ----- ---------- np6_1 0 1 port4 10G Yes 2 3 base2 1G Yes 0-3 port2 40G Yes 0-3 fabric2 40G Yes 0-3 fabric4 40G Yes ------ ---- ------- ----- ----------
For more example output for different FortiGate models, see FortiGate NP6 architectures and FortiGate NP6lite architectures.
You can also use the following command to view the features enabled or disabled on the NP6 processors in your FortiGate unit:
diagnose npu np6 npu-feature np_0 np_1 ------------------- --------- --------- Fastpath Enabled Enabled HPE-type-shaping Disabled Disabled Standalone No No IPv4 firewall Yes Yes IPv6 firewall Yes Yes IPv4 IPSec Yes Yes IPv6 IPSec Yes Yes IPv4 tunnel Yes Yes IPv6 tunnel Yes Yes GRE tunnel No No GRE passthrough Yes Yes IPv4 Multicast Yes Yes IPv6 Multicast Yes Yes CAPWAP Yes Yes RDP Offload Yes Yes
The following command is available to view the features enabled or disabled on the NP6Lite processors in your FortiGate unit:
diagnose npu np6lite npu-feature np_0 np_1 ------------------- --------- --------- Fastpath Enabled Enabled IPv4 firewall Yes Yes IPv6 firewall Yes Yes IPv4 IPSec Yes Yes IPv6 IPSec Yes Yes IPv4 tunnel Yes Yes IPv6 tunnel Yes Yes GRE tunnel No No IPv4 Multicast Yes Yes IPv6 Multicast Yes Yes
Disabling NP6 and NP6lite hardware acceleration (fastpath)
You can use the following command to disable NP6 offloading for all traffic. This option disables NP6 offloading for all traffic for all NP6 and NP6lite processors.
config system npu
set fastpath disable
end
Optimizing NP6 performance by distributing traffic to XAUI links
On most FortiGate units with NP6 processors, the FortiGate interfaces are switch ports that connect to the NP6 processors with XAUI links. Packets pass from the interfaces to the NP6 processor over the XAUI links. Each NP6 processor has a 40 Gigabit bandwidth capacity. The four XAUI links each have a 10 Gigabit capacity for a total of 40 Gigabits.
On many FortiGate units with NP6 processors, the NP6 processors and the XAUI links are over-subscribed. Since the NP6 processors are connected by an Integrated Switch Fabric, you do not have control over how traffic is distributed to them. In fact traffic is distributed evenly by the ISF.
However, you can control how traffic is distributed to the XAUI links and you can optimize performance by distributing traffic evenly among the XAUI links. For example, if you have a very high amount of traffic passing between two networks, you can connect each network to interfaces connected to different XAUI links to distribute the traffic for each network to a different XAUI link.
For example, on a FortiGate-3200D (See FortiGate-3200D fast path architecture), there are 48 10-Gigabit interfaces that send and receive traffic for two NP6 processors over a total of eight 10-Gigabit XAUI links. Each XAUI link gets traffic from six 10-Gigabit FortiGate interfaces. The amount of traffic that the FortiGate-3200D can offload is limited by the number of NP6 processors and the number of XAUI links. You can optimize the amount of traffic that the FortiGate-3200D can process by distributing it evenly amount the XAUI links and the NP6 processors.
You can see the Ethernet interface, XAUI link, and NP6 configuration by entering the get hardware npu np6 port-list
command. For the FortiGate-3200D the output is:
get hardware npu np6 port-list Chip XAUI Ports Max Cross-chip Speed offloading ------ ---- ------- ----- ---------- np6_0 0 port1 10G Yes 0 port5 10G Yes 0 port10 10G Yes 0 port13 10G Yes 0 port17 10G Yes 0 port22 10G Yes 1 port2 10G Yes 1 port6 10G Yes 1 port9 10G Yes 1 port14 10G Yes 1 port18 10G Yes 1 port21 10G Yes 2 port3 10G Yes 2 port7 10G Yes 2 port12 10G Yes 2 port15 10G Yes 2 port19 10G Yes 2 port24 10G Yes 3 port4 10G Yes 3 port8 10G Yes 3 port11 10G Yes 3 port16 10G Yes 3 port20 10G Yes 3 port23 10G Yes ------ ---- ------- ----- ---------- np6_1 0 port26 10G Yes 0 port29 10G Yes 0 port33 10G Yes 0 port37 10G Yes 0 port41 10G Yes 0 port45 10G Yes 1 port25 10G Yes 1 port30 10G Yes 1 port34 10G Yes 1 port38 10G Yes 1 port42 10G Yes 1 port46 10G Yes 2 port28 10G Yes 2 port31 10G Yes 2 port35 10G Yes 2 port39 10G Yes 2 port43 10G Yes 2 port47 10G Yes 3 port27 10G Yes 3 port32 10G Yes 3 port36 10G Yes 3 port40 10G Yes 3 port44 10G Yes 3 port48 10G Yes ------ ---- ------- ----- ----------
In this command output you can see that each NP6 has for four XAUI links (0 to 3) and that each XAUI link is connected to six 10-gigabit Ethernet interfaces. To optimize throughput you should keep the amount of traffic being processed by each XAUI port to under 10 Gbps. So for example, if you want to offload traffic from four 10-gigabit networks you can connect these networks to Ethernet interfaces 1, 2, 3 and 4. This distributes the traffic from each 10-Gigabit network to a different XAUI link. Also, if you wanted to offload traffic from four more 10-Gigabit networks you could connect them to Ethernet ports 26, 25, 28, and 27. As a result each 10-Gigabit network would be connected to a different XAUI link.
Enabling bandwidth control between the ISF and NP6 XAUI ports
In some cases, the Internal Switch Fabric (ISF) buffer size may be larger than the buffer size of an NP6 XAUI port that receives traffic from the ISF. If this happens, burst traffic from the ISF may exceed the capacity of an XAUI port and sessions may be dropped.
You can use the following command to configure bandwidth control between the ISF and XAUI ports. Enabling bandwidth control can smooth burst traffic and keep the XAUI ports from getting overwhelmed and dropping sessions.
Use the following command to enable bandwidth control:
config system npu
set sw-np-bandwidth {0G | 2G | 4G | 5G | 6G}
end
The default setting is 0G
which means no bandwidth control. The other options limit the bandwidth to 2Gbps, 4Gbps and so on.
Increasing NP6 offloading capacity using link aggregation groups (LAGs)
NP6 processors can offload sessions received by interfaces in link aggregation groups (LAGs) (IEEE 802.3ad). A 802.3ad Link Aggregation and it's management protocol, Link Aggregation Control Protocol (LACP) LAG combines more than one physical interface into a group that functions like a single interface with a higher capacity than a single physical interface. For example, you could use a LAG if you want to offload sessions on a 30 Gbps link by adding three 10-Gbps interfaces to the same LAG.
All offloaded traffic types are supported by LAGs, including IPsec VPN traffic. Just like with normal interfaces, traffic accepted by a LAG is offloaded by the NP6 processor connected to the interfaces in the LAG that receive the traffic to be offloaded. If all interfaces in a LAG are connected to the same NP6 processor, traffic received by that LAG is offloaded by that NP6 processor. The amount of traffic that can be offloaded is limited by the capacity of the NP6 processor.
If a FortiGate has two or more NP6 processors connected by an integrated switch fabric (ISF), you can use LAGs to increase offloading by sharing the traffic load across multiple NP6 processors. You do this by adding physical interfaces connected to different NP6 processors to the same LAG.
Adding a second NP6 processor to a LAG effectively doubles the offloading capacity of the LAG. Adding a third further increases offloading. The actual increase in offloading capacity may not actually be doubled by adding a second NP6 or tripled by adding a third. Traffic and load conditions and other factors may limit the actual offloading result.
The increase in offloading capacity offered by LAGs and multiple NP6s is supported by the integrated switch fabric (ISF) that allows multiple NP6 processors to share session information. Most FortiGate units with multiple NP6 processors also have an ISF. However, FortiGate models such as the 1000D, 2000E, and 2500E do not have an ISF. If you attempt to add interfaces connected to different NP6 processors to a LAG the system displays an error message.
There are also a few limitations to LAG NP6 offloading support for IPsec VPN:
- IPsec VPN anti-replay protection cannot be used if IPSec is configured on a LAG that has interfaces connected to multiple NP6 processors.
- Because the encrypted traffic for one IPsec VPN tunnel has the same 5-tuple, the traffic from one tunnel can only can be balanced to one interface in a LAG. This limits the maximum throughput for one IPsec VPN tunnel in an NP6 LAG group to 10Gbps.
NP6 processors and redundant interfaces
NP6 processors can offload sessions received by interfaces that are part of a redundant interface. You can combine two or more physical interfaces into a redundant interface to provide link redundancy. Redundant interfaces ensure connectivity if one physical interface, or the equipment on that interface, fails. In a redundant interface, traffic travels only over one interface at a time. This differs from an aggregated interface where traffic travels over all interfaces for distribution of increased bandwidth.
All offloaded traffic types are supported by redundant interfaces, including IPsec VPN traffic. Just like with normal interfaces, traffic accepted by a redundant interface is offloaded by the NP6 processor connected to the interfaces in the redundant interface that receive the traffic to be offloaded. If all interfaces in a redundant interface are connected to the same NP6 processor, traffic received by that redundant interface is offloaded by that NP6 processor. The amount of traffic that can be offloaded is limited by the capacity of the NP6 processor.
If a FortiGate has two or more NP6 processors connected by an integrated switch fabric (ISF), you can create redundant interfaces that include physical interfaces connected to different NP6 processors. However, with a redundant interface, only one of the physical interfaces is processing traffic at any given time. So you cannot use redundant interfaces to increase performance in the same way as you can with aggregate interfaces.
The ability to add redundant interfaces connected to multiple NP6s is supported by the integrated switch fabric (ISF) that allows multiple NP6 processors to share session information. Most FortiGate units with multiple NP6 processors also have an ISF. However, FortiGate models such as the 1000D, 2000E, and 2500E do not have an ISF. If you attempt to add interfaces connected to different NP6 processors to a redundant interface the system displays an error message.
Configuring inter-VDOM link acceleration with NP6 processors
FortiGate units with NP6 processors include inter-VDOM links that can be used to accelerate inter-VDOM link traffic.
- For a FortiGate unit with two NP6 processors there are two accelerated inter-VDOM links, each with two interfaces:
- npu0_vlink:
npu0_vlink0
npu0_vlink1 - npu1_vlink:
npu1_vlink0
npu1_vlink1
These interfaces are visible from the GUI and CLI. For a FortiGate unit with NP6 interfaces, enter the following CLI command to display the NP6-accelerated inter-VDOM links:
get system interface
...
== [ npu0_vlink0 ]
name: npu0_vlink0 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable
== [ npu0_vlink1 ]
name: npu0_vlink1 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable
== [ npu1_vlink0 ]
name: npu1_vlink0 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable
== [ npu1_vlink1 ]
name: npu1_vlink1 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable
...
By default the interfaces in each inter-VDOM link are assigned to the root VDOM. To use these interfaces to accelerate inter-VDOM link traffic, assign each interface in the pair to the VDOMs that you want to offload traffic between. For example, if you have added a VDOM named New-VDOM to a FortiGate unit with NP4 processors, you can go to System > Network > Interfaces and edit the npu0-vlink1 interface and set the Virtual Domain to New-VDOM. This results in an accelerated inter-VDOM link between root and New-VDOM. You can also do this from the CLI:
config system interface
edit npu0-vlink1
set vdom New-VDOM
end
Using VLANs to add more accelerated inter-VDOM links
You can add VLAN interfaces to the accelerated inter-VDOM links to create inter-VDOM links between more VDOMs. For the links to work, the VLAN interfaces must be added to the same inter-VDOM link, must be on the same subnet, and must have the same VLAN ID.
For example, to accelerate inter-VDOM link traffic between VDOMs named Marketing and Engineering using VLANs with VLAN ID 100 go to System > Network > Interfaces and select Create New to create the VLAN interface associated with the Marketing VDOM:
Name | Marketing-link |
Type | VLAN |
Interface | npu0_vlink0 |
VLAN ID | 100 |
Virtual Domain | Marketing |
IP/Network Mask | 172.20.120.12/24 |
Create the inter-VDOM link associated with Engineering VDOM:
Name | Engineering-link |
Type | VLAN |
Interface | npu0_vlink1 |
VLAN ID | 100 |
Virtual Domain | Engineering |
IP/Network Mask | 172.20.120.22/24 |
Or do the same from the CLI:
config system interface
edit Marketing-link
set vdom Marketing
set ip 172.20.120.12/24
set interface npu0_vlink0
set vlanid 100
next
edit Engineering-link
set vdom Engineering
set ip 172.20.120.22/24
set interface npu0_vlink1
set vlanid 100
Confirm that the traffic is accelerated
Use the following CLI commands to obtain the interface index and then correlate them with the session entries. In the following example traffic was flowing between new accelerated inter-VDOM links and physical ports port1 and port 2 also attached to the NP6 processor.
diagnose ip address list
IP=172.31.17.76->172.31.17.76/255.255.252.0 index=5 devname=port1
IP=10.74.1.76->10.74.1.76/255.255.252.0 index=6 devname=port2
IP=172.20.120.12->172.20.120.12/255.255.255.0 index=55 devname=IVL-VLAN1_ROOT
IP=172.20.120.22->172.20.120.22/255.255.255.0 index=56 devname=IVL-VLAN1_VDOM1
diagnose sys session list
session info: proto=1 proto_state=00 duration=282 expire=24 timeout=0 session info: proto=1 proto_state=00 duration=124 expire=59 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=3
origin-shaper=
reply-shaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/
state=may_dirty npu
statistic(bytes/packets/allow_err): org=180/3/1 reply=120/2/1 tuples=2
orgin->sink: org pre->post, reply pre->post dev=55->5/5->55 gwy=172.31.19.254/172.20.120.22
hook=post dir=org act=snat 10.74.2.87:768->10.2.2.2:8(172.31.17.76:62464)
hook=pre dir=reply act=dnat 10.2.2.2:62464->172.31.17.76:0(10.74.2.87:768)
misc=0 policy_id=4 id_policy_id=0 auth_info=0 chk_client_info=0 vd=0
serial=0000004e tos=ff/ff ips_view=0 app_list=0 app=0
dd_type=0 dd_mode=0
per_ip_bandwidth meter: addr=10.74.2.87, bps=880
npu_state=00000000
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=160/218, ipid=218/160, vlan=32769/0
session info: proto=1 proto_state=00 duration=124 expire=20 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=3
origin-shaper=
reply-shaper=
per_ip_shaper=
ha_id=0 policy_dir=0 tunnel=/
state=may_dirty npu
statistic(bytes/packets/allow_err): org=180/3/1 reply=120/2/1 tuples=2
orgin->sink: org pre->post, reply pre->post dev=6->56/56->6 gwy=172.20.120.12/10.74.2.87
hook=pre dir=org act=noop 10.74.2.87:768->10.2.2.2:8(0.0.0.0:0)
hook=post dir=reply act=noop 10.2.2.2:768->10.74.2.87:0(0.0.0.0:0)
misc=0 policy_id=3 id_policy_id=0 auth_info=0 chk_client_info=0 vd=1
serial=0000004d tos=ff/ff ips_view=0 app_list=0 app=0
dd_type=0 dd_mode=0
per_ip_bandwidth meter: addr=10.74.2.87, bps=880
npu_state=00000000
npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=219/161, ipid=161/219, vlan=0/32769
total session 2
Disabling offloading IPsec Diffie-Hellman key exchange
You can use the following command to disable using ASIC offloading to accelerate IPsec Diffie-Hellman key exchange for IPsec ESP traffic. By default hardware offloading is used. For debugging purposes or other reasons you may want this function to be processed by software.
Use the following command to disable using ASIC offloading for IPsec Diffie-Hellman key exchange:
config system global
set ipsec-asic-offload disable
end
Access control lists (ACLs)
Access Control Lists (ACLs) use NP6 offloading to drop IPv4 or IPv6 packets at the physical network interface before the packets are analyzed by the CPU. On a busy appliance this can really help the performance. This feature is available on FortiGates with NP6 processors and is not supported by FortiGates with NP6lite processors.
The ACL feature is available only on FortiGates with NP6-accelerated interfaces. ACL checking is one of the first things that happens to the packet and checking is done by the NP6 processor. The result is very efficient protection that does not use CPU or memory resources.
Use the following command to configure IPv4 ACL lists:
config firewall acl
edit 0
set status enable
set interface <interface-name>
set srcaddr <firewall-address>
set dstaddr <firewall-address>
set service <firewall-service>
end
Use the following command to configure IPv6 ACL lists:
config firewall acl6
edit 0
set status enable
set interface <interface-name>
set srcaddr <firewall-address6>
set dstaddr <firewall-address6>
set service <firewall-service>
end
Where:
<interface-name>
is the interface on which to apply the ACL. There is a hardware limitation that needs to be taken into account. The ACL is a Layer 2 function and is offloaded to the ISF hardware, therefore no CPU resources are used in the processing of the ACL. It is handled by the inside switch chip which can do hardware acceleration, increasing the performance of the FortiGate. The ACL function is only supported on switch fabric driven interfaces.
<firewall-address> <firewall-address6>
can be any of the address types used by the FortiGate, including address ranges. The traffic is blocked not on an either or basis of these addresses but the combination of the two, so that they both have to be correct for the traffic to be denied. To block all of the traffic from a specific address all you have to do is make the destination address ALL
.
Because the blocking takes place at the interface based on the information in the packet header and before any processing such as NAT can take place, a slightly different approach may be required. For instance, if you are trying to protect a VIP which has an external address of x.x.x.x and is forwarded to an internal address of y.y.y.y, the destination address that should be used is x.x.x.x, because that is the address that will be in the packet's header when it hits the incoming interface.
<firewall-service>
the firewall service to block. Use ALL
to block all services.
Configuring individual NP6 processors
You can use the config system np6
command to configure a wide range of settings for each of the NP6 processors in your FortiGate unit including enabling session accounting and adjusting session timeouts. As well you can set anomaly checking for IPv4 and IPv6 traffic.
You can also enable and adjust Host Protection Engine (HPE) to protect networks from DoS attacks by categorizing incoming packets based on packet rate and processing cost and applying packet shaping to packets that can cause DoS attacks.
The settings that you configure for an NP6 processor with the config system np6
command apply to traffic processed by all interfaces connected to that NP6 processor. This includes the physical interfaces connected to the NP6 processor as well as all subinterfaces, VLAN interfaces, IPsec interfaces, LAGs and so on associated with the physical interfaces connected to the NP6 processor.
Some of the options for this command apply anomaly checking for NP6 sessions in the same way as the command described in applies anomaly checking for NP4 sessions. |
config system np6
edit <np6-processor-name>
set low-latency-mode {disable | enable}
set per-session-accounting {all-enable | disable | enable-by-log}
set session-timeout-random-range <range>
set garbage-session-collector {disable | enable}
set session-collector-interval <range>
set session-timeout-interval <range>
set session-timeout-random-range <range>
set session-timeout-fixed {disable | enable}
config hpe
set tcpsyn-max <packets-per-second>
set tcp-max <packets-per-second>
set udp-max <packets-per-second>
set icmp-max <packets-per-second>
set sctp-max <packets-per-second>
set esp-max <packets-per-second>
set ip-frag-max <packets-per-second>
set ip-others-max <packets-per-second>
set arp-max <packets-per-second>
set l2-others-max <packets-per-second>
set enable-shaper {disable | enable}
config fp-anomaly
set tcp-syn-fin {allow | drop | trap-to-host}
set tcp_fin_noack {allow | drop | trap-to-host}
set tcp_fin_only {allow | drop | trap-to-host}
set tcp_no_flag {allow | drop | trap-to-host}
set tcp_syn_data {allow | drop | trap-to-host}
set tcp-winnuke {allow | drop | trap-to-host}
set tcp-land {allow | drop | trap-to-host}
set udp-land {allow | drop | trap-to-host}
set icmp-land {allow | drop | trap-to-host}
set icmp-frag {allow | drop | trap-to-host}
set ipv4-land {allow | drop | trap-to-host}
set ipv4-proto-err {allow | drop | trap-to-host}
set ipv4-unknopt {allow | drop | trap-to-host}
set ipv4-optrr {allow | drop | trap-to-host}
set ipv4-optssrr {allow | drop | trap-to-host}
set ipv4-optlsrr {allow | drop | trap-to-host}
set ipv4-optstream {allow | drop | trap-to-host}
set ipv4-optsecurity {allow | drop | trap-to-host}
set ipv4-opttimestamp {allow | drop | trap-to-host}
set ipv4-csum-err {drop | trap-to-host}
set tcp-csum-err {drop | trap-to-host}
set udp-csum-err {drop | trap-to-host}
set icmp-csum-err {drop | trap-to-host}
set ipv6-land {allow | drop | trap-to-host}
set ipv6-proto-err {allow | drop | trap-to-host}
set ipv6-unknopt {allow | drop | trap-to-host}
set ipv6-saddr-err {allow | drop | trap-to-host}
set ipv6-daddr-err {allow | drop | trap-to-host}
set ipv6-optralert {allow | drop | trap-to-host}
set ipv6-optjumbo {allow | drop | trap-to-host}
set ipv6-opttunnel {allow | drop | trap-to-host}
set ipv6-opthomeaddr {allow | drop | trap-to-host}
set ipv6-optnsap {allow | drop | trap-to-host}
set ipv6-optendpid {allow | drop | trap-to-host}
set ipv6-optinvld {allow | drop | trap-to-host}
end
Command syntax
Command | Description | Default |
---|---|---|
low-latency-mode {disable | enable}
|
Enable low-latency mode. In low latency mode the integrated switch fabric is bypassed. Low latency mode requires that packet enter and exit using the same NP6 processor. This option is only available for NP6 processors that can operate in low-latency mode, currently only np6_0 and np6_1 on the FortiGate-3700D and DX. | disable |
per-session-accounting {all-enable | disable | enable-by-log}
|
Disable NP6 per-session accounting or enable it and control how it works. If set to enable-by-log (the default) NP6 per-session accounting is only enabled if firewall policies accepting offloaded traffic have traffic logging enabled. If set to all-enable , NP6 per-session accounting is always enabled for all traffic offloaded by the NP6 processor.Enabling per-session accounting can affect performance. |
enable-by-log |
garbage-session-collector {disable | enable}
|
Enable deleting expired or garbage sessions. | disable |
session-collector-interval <range>
|
Set the expired or garbage session collector time interval in seconds. The range is 1 to 100 seconds. | 64 |
session-timeout-interval <range>
|
Set the timeout for checking for and removing inactive NP6 sessions. The range is 0 to 1000 seconds. | 40 |
session-timeout-random-range <range>
|
Set the random timeout for checking and removing inactive NP6 sessions. The range is 0 to 1000 seconds. | 8 |
session-timeout-fixed {disable | enable}
|
Enable to force checking for and removing inactive NP6 sessions at the session-timeout-interval time interval. Set to disable (the default) to check for and remove inactive NP6 sessions at random time intervals. |
disable |
config hpe | ||
hpe
|
Use the following options to use HPE to apply DDoS protection at the NP6 processor by limiting the number packets per second received for various packet types by each NP6 processor. This rate limiting is applied very efficiently because it is done in hardware by the NP6 processor. | |
enable-shaper {disable | enable}
|
Enable or disable HPE DDoS protection. | disable |
tcpsyn-max
|
Limit the maximum number of TCP SYN packets received per second. The range is 10,000 to 4,000,000,000 pps. The default limits the number of packets per second to 5,000,000 pps. | 5000000 |
tcp-max
|
Limit the maximum number of TCP packets received per second. The range is 10,000 to 4,000,000,000 pps. The default limits the number of packets per second to 5,000,000 pps. | 5000000 |
udp-max
|
Limit the maximum number of UDP packets received per second. The range is 10,000 to 4,000,000,000 pps. The default limits the number of packets per second to 5,000,000 pps. | 5000000 |
icmp-max
|
Limit the maximum number of ICMP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
sctp-max
|
Limit the maximum number of SCTP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
esp-max
|
Limit the maximum number of ESP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
ip-frag-max
|
Limit the maximum number of fragmented IP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
ip-others-max
|
Limit the maximum number of other types of IP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
arp-max
|
Limit the maximum number of ARP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
l2-others-max
|
Limit the maximum number of other layer-2 packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. | 100000 |
config fp-anomaly | ||
fp-anomaly
|
Configure how the NP6 processor does traffic anomaly protection. In most cases you can configure the NP6 processor to allow or drop the packets associated with an attack or forward the packets that are associated with the attack to FortiOS (called trap-to-host ). Selecting trap-to-host turns off NP6 anomaly protection for that anomaly. If you require anomaly protection but don't want to use the NP6 processor, you can select trap-to-host and enable anomaly protection with a DoS policy. |
|
tcp-syn-fin {allow | drop | trap-to-host}
|
Detects TCP SYN flood SYN/FIN flag set anomalies. | allow |
tcp_fin_noack {allow | drop | trap-to-host}
|
Detects TCP SYN flood with FIN flag set without ACK setting anomalies. | trap-to-host |
tcp_fin_only {allow | drop | trap-to-host}
|
Detects TCP SYN flood with only FIN flag set anomalies. | trap-to-host |
tcp_no_flag {allow | drop | trap-to-host}
|
Detects TCP SYN flood with no flag set anomalies. | allow |
tcp_syn_data {allow | drop | trap-to-host}
|
Detects TCP SYN flood packets with data anomalies. | allow |
tcp-winnuke {allow | drop | trap-to-host}
|
Detects TCP WinNuke anomalies. | trap-to-host |
tcp-land {allow | drop | trap-to-host}
|
Detects TCP land anomalies. | trap-to-host |
udp-land {allow | drop | trap-to-host}
|
Detects UDP land anomalies. | trap-to-host |
icmp-land {allow | drop | trap-to-host}
|
Detects ICMP land anomalies. | trap-to-host |
icmp-frag {allow | drop | trap-to-host}
|
Detects Layer 3 fragmented packets that could be part of a layer 4 ICMP anomalies. | allow |
ipv4-land {allow | drop | trap-to-host}
|
Detects IPv4 land anomalies. | trap-to-host |
ipv4-proto-err {allow | drop | trap-to-host}
|
Detects invalid layer 4 protocol anomalies. For information about the error codes that are produced by setting this option to drop , see NP6 anomaly error codes. |
trap-to-host |
ipv4-unknopt {allow | drop | trap-to-host}
|
Detects unknown option anomalies. | trap-to-host |
ipv4-optrr {allow | drop | trap-to-host}
|
Detects IPv4 with record route option anomalies. | trap-to-host |
ipv4-optssrr {allow | drop | trap-to-host}
|
Detects IPv4 with strict source record route option anomalies. | trap-to-host |
ipv4-optlsrr {allow | drop | trap-to-host}
|
Detects IPv4 with loose source record route option anomalies. | trap-to-host |
ipv4-optstream {allow | drop | trap-to-host}
|
Detects stream option anomalies. | trap-to-host |
ipv4-optsecurity {allow | drop | trap-to-host}
|
Detects security option anomalies. | trap-to-host |
ipv4-opttimestamp {allow | drop | trap-to-host}
|
Detects timestamp option anomalies. | trap-to-host |
ipv4-csum-err {drop | trap-to-host}
|
Detects IPv4 checksum errors. | drop |
tcp-csum-err {drop | trap-to-host}
|
Detects TCP checksum errors. | drop |
udp-csum-err {drop | trap-to-host}
|
Detects UDP checksum errors. | drop |
icmp-csum-err {drop | trap-to-host}
|
Detects ICMP checksum errors. | drop |
ipv6-land {allow | drop | trap-to-host}
|
Detects IPv6 land anomalies | trap-to-host |
ipv6-unknopt {allow | drop | trap-to-host}
|
Detects unknown option anomalies. | trap-to-host |
ipv6-saddr-err {allow | drop | trap-to-host}
|
Detects source address as multicast anomalies. | trap-to-host |
ipv6-daddr_err {allow | drop | trap-to-host}
|
Detects destination address as unspecified or loopback address anomalies. | trap-to-host |
ipv6-optralert {allow | drop | trap-to-host}
|
Detects router alert option anomalies. | trap-to-host |
ipv6-optjumbo {allow | drop | trap-to-host}
|
Detects jumbo options anomalies. | trap-to-host |
ipv6-opttunnel {allow | drop | trap-to-host}
|
Detects tunnel encapsulation limit option anomalies. | trap-to-host |
ipv6-opthomeaddr {allow | drop | trap-to-host}
|
Detects home address option anomalies. | trap-to-host |
ipv6-optnsap {allow | drop | trap-to-host}
|
Detects network service access point address option anomalies. | trap-to-host |
ipv6-optendpid {allow | drop | trap-to-host}
|
Detects end point identification anomalies. | trap-to-host |
ipv6-optinvld {allow | drop | trap-to-host}
|
Detects invalid option anomalies. | trap-to-host |
Enabling per-session accounting for offloaded NP6 and NP6lite sessions
Per-session accounting is a logging feature that allows the FortiGate to report the correct bytes/pkt numbers per session for sessions offloaded to an NP6 or NP6lite processor. This information appears in traffic log messages as well as in FortiView. The following example shows the Sessions dashboard widget tracking SPU and nTurbo sessions. Current sessions shows the total number of sessions, SPU shows the percentage of these sessions that are SPU sessions and Nturbo shows the percentage that are nTurbo sessions.
You configure per-session accounting for each NP6 processor. For example, use the following command to enable per-session accounting for NP6_0 and NP6_1:
config system np6
edit np6_0
set per-session-accounting enable-by-log
next
edit np6_1
set per-session-accounting enable-by-log
end
If your FortiGate has NP6lite processors, you can use the following command to enable per-session accounting for all of the NP6lite processors in the FortiGate unit:
config system npu
set per-session-accounting enable-by-log
end
The option, enable-by-log
enables per-session accounting for offloaded sessions with traffic logging enabled and all-enable
enables per-session accounting for all offloaded sessions.
By default, per-session-accounting
is set to enable-by-log
, which results in per-session accounting being turned on when you enable traffic logging in a policy.
Per-session accounting can affect offloading performance. So you should only enable per-session accounting if you need the accounting information.
Enabling per-session accounting does not provide traffic flow data for sFlow or NetFlow.
Configuring NP6 session timeouts
For NP6 traffic, FortiOS refreshes an NP6 session's lifetime when it receives a session update message from the NP6 processor. To avoid session update message congestion, these NP6 session checks are performed all at once after a random time interval and all of the update messages are sent from the NP6 processor to FortiOS at once. This can result in fewer messages being sent because they are only sent at random time intervals instead of every time a session times out.
In fact, if your NP6 processor is processing a lot of short lived sessions, it is recommended that you use the default setting of random checking every 8 seconds to avoid very bursty session updates. If the time between session updates is very long and very many sessions have been expired between updates a large number of updates will need to be done all at once.
You can use the following command to set the random time range.
config system np6
edit <np6-processor-name>
set session-timeout-fixed disable
set session-timeout-random-range 8
end
This is the default configuration. The random timeout range is 1 to 1000 seconds and the default range is 8. So, by default, NP6 sessions are checked at random time intervals of between 1 and 8 seconds. So sessions can be inactive for up to 8 seconds before they are removed from the FortiOS session table.
If you want to reduce the amount of checking you can increase the session-timeout-random-range
. This could result in inactive sessions being kept in the session table longer. But if most of your NP6 sessions are relatively long this shouldn't be a problem.
You can also change this session checking to a fixed time interval and set a fixed timeout:
config system np6
edit <np6-processor-name>
set session-timeout-fixed enable
set session-timeout-fixed 40
end
The fixed timeout default is every 40 seconds and the rang is 1 to 1000 seconds. Using a fixed interval further reduces the amount of checking that occurs.
You can select random or fixed updates and adjust the time intervals to minimize the refreshing that occurs while still making sure inactive sessions are deleted regularly. For example, if an NP6 processor is processing sessions with long lifetimes you can reduce checking by setting a relatively long fixed timeout.
Configure the number of IPsec engines NP6 processors use
NP6 processors use multiple IPsec engines to accelerate IPsec encryption and decryption. In some cases out of order ESP packets can cause problems if multiple IPsec engines are running. To resolve this problem you can configure all of the NP6 processors to use fewer IPsec engines.
Use the following command to change the number of IPsec engines used for decryption (ipsec-dec-subengine-mask
) and encryption (ipsec-enc-subengine-mask
). These settings are applied to all of the NP6 processors in the FortiGate unit.
config system npu
set ipsec-dec-subengine-mask <engine-mask>
set ipsec-enc-subengine-mask <engine-mask>
end
<engine-mask>
is a hexadecimal number in the range 0x01 to 0xff where each bit represents one IPsec engine. The default <engine-mask>
for both options is 0xff which means all IPsec engines are used. Add a lower <engine-mask>
to use fewer engines. You can configure different engine masks for encryption and decryption.
Stripping clear text padding and IPsec session ESP padding
In some situations, when clear text or ESP packets in IPsec sessions may have large amounts of layer 2 padding, the NP6 IPsec engine may not be able to process them and the session may be blocked.
If you notice dropped IPsec sessions, you could try using the following CLI options to cause the NP6 processor to strip clear text padding and ESP padding before send the packets to the IPsec engine. With padding stripped, the session can be processed normally by the IPsec engine.
Use the following command to strip ESP padding:
config system npu
set strip-esp-padding enable
set strip-clear-text-padding enable
end
Stripping clear text and ESP padding are both disabled by default.
Disable NP6 CAPWAP offloading
By default and where possible, managed FortiAP and FortiLink CAPWAP sessions are offloaded to NP6 processors. You can use the following command to disable CAWAP session offloading:
config system npu
set capwap-offload disable
end
Optionally disable NP6 offloading of traffic passing between 10Gbps and 1Gbps interfaces
Due to NP6 internal packet buffer limitations, some offloaded packets received at a 10Gbps interface and destined for a 1Gbps interface can be dropped, reducing performance for TCP and IP tunnel traffic. If you experience this performance reduction, you can use the following command to disable offloading sessions passing from 10Gbps interfaces to 1Gbps interfaces:
config system npu
set host-shortcut-mode host-shortcut
end
Select host-shortcut
to stop offloading TCP and IP tunnel packets passing from 10Gbps interfaces to 1Gbps interfaces. TCP and IP tunnel packets passing from 1Gbps interfaces to 10Gbps interfaces are still offloaded as normal.
If host-shortcut
is set to the default bi-directional
setting, packets in both directions are offloaded.
This option is only available if your FortiGate has 10G and 1G interfaces accelerated by NP6 processors.
Offloading RDP traffic
FortiOS supports NP6 offloading of Reliable Data Protocol (RDP) traffic. RDP is a network transport protocol that optimizes remote loading, debugging, and bulk transfer of images and data. RDP traffic uses Assigned Internet Protocol number 27 and is defined in RFC 908 and updated in RFC 1151. If your network is processing a lot of RDP traffic, offloading it can improve overall network performance.
You can use the following command to enable or disable NP6 RDP offloading. RDP offloading is enabled by default.
config system npu
set rdp-offload {disable | enable}
end
NP6 session drift
In some cases, sessions processed by NP6 processors may fail to be deleted leading to a large number of idle sessions. This is called session drift. You can use SNMP to be alerted when the number of idle sessions becomes high. SNMP also allows you to see which NP6 processor has the abnormal number of idle sessions and you can use a diagnose command to delete them.
The following MIB fields allow you to use SNMP to monitor session table information for NP6 processors including drift for each NP6 processor:
FORTINET-FORTIGATE-MIB::fgNPUNumber.0 = INTEGER: 2
FORTINET-FORTIGATE-MIB::fgNPUName.0 = STRING: NP6
FORTINET-FORTIGATE-MIB::fgNPUDrvDriftSum.0 = INTEGER: 0
FORTINET-FORTIGATE-MIB::fgNPUIndex.0 = INTEGER: 0
FORTINET-FORTIGATE-MIB::fgNPUIndex.1 = INTEGER: 1
FORTINET-FORTIGATE-MIB::fgNPUSessionTblSize.0 = Gauge32: 33554432
FORTINET-FORTIGATE-MIB::fgNPUSessionTblSize.1 = Gauge32: 33554432
FORTINET-FORTIGATE-MIB::fgNPUSessionCount.0 = Gauge32: 0
FORTINET-FORTIGATE-MIB::fgNPUSessionCount.1 = Gauge32: 0
FORTINET-FORTIGATE-MIB::fgNPUDrvDrift.0 = INTEGER: 0
FORTINET-FORTIGATE-MIB::fgNPUDrvDrift.1 = INTEGER: 0
You can also use the following diagnose command to determine of drift is occurring:
diagnose npu np6 sse-drift-summary NPU drv-drift ----- --------- np6_0 0 np6_1 0 ----- --------- Sum 0 ----- ---------
The command output shows a drift summary for all the NP6 processors in the system, and shows the total drift. Normally the sum is 0. The previous command output, from a FortiGate-1500D, shows that the 1500D's two NP6 processors are not experiencing any drift.
If the sum is not zero, then extra idle sessions may be accumulating. You can use the following command to delete those sessions:
diagnose npu np6 sse-purge-drift <np6_id> [<time>]
Where <np6_id>
is the number (starting with NP6_0 with a np6_id of 0) of the NP6 processor for which to delete idle sessions in. <time>
is the age in seconds of the idle sessions to be deleted. All idle sessions this age and older are deleted. The default time is 300 seconds.
The diagnose npu np6 sse-stats <np6_id>
command output also includes a drv-drift
field that shows the total drift for one NP6 processor.
Optimizing FortiGate-3960E and 3980E IPsec VPN performance
You can use the following command to configure outbound hashing to improve IPsec VPN performance for the FortiGate-3960E and 3980E. If you change these settings, to make sure they take affect, you should reboot your device.
config system np6
edit np6_0
set ipsec-outbound-hash {disable | enable}
set ipsec-ob-hash-function {switch-group-hash | global- hash | global-hash-weighted | round-robin-switch-group | round-robin-global}
end
Where:
ipsec-outbound-hash
is disabled by default. If you enable it you can set ipsec-ob-hash-function
as follows:
switch-group-hash
(the default) distribute outbound IPsec Security Association (SA) traffic to NP6 processors connected to the same switch as the interfaces that received the incoming traffic. This option, keeps all traffic on one switch and the NP6 processors connected to that switch, to improve performance.
global-hash
distribute outbound IPsec SA traffic among all NP6 processors.
global-hash-weighted
distribute outbound IPsec SA traffic from switch 1 among all NP6 processors with more sessions going to the NP6s connected to switch 0. This options is only recommended for the FortiGate-3980E because it is designed to weigh switch 0 hider to send more sessions to switch 0 which on the FortiGate-3980E has more NP6 processors connected to it. On the FortiGate-3960E both switches have the same number of NP6s so for best performance one switch shouldn't have a higher weight.
round-robin-switch-group
round-robin distribution of outbound IPsec SA traffic among the NP6 processors connected to the same switch.
round-robin-global
round-robin distribution of outbound IPsec SA traffic among all NP6 processors.
FortiGate-3960E and 3980E support for high throughput traffic streams
FortiGate devices with multiple NP6 processors support high throughput by distributing sessions to multiple NP6 processors. However, default ISF hash-based load balancing has some limitations for single traffic streams or flows that use more than 10Gbps of bandwidth. Normally, the ISF sends all of the packets in a single traffic stream over the same 10Gbps interface to an NP6 processor. If a single traffic stream is larger than 10Gbps, packets are also sent to 10Gbps interfaces that may be connected to the same NP6 or to other NP6s. Because the ISF uses hash-bsed load balancing, this can lead to packets being processed out of order and other potential drawbacks.
You can configure the FortiGate-3960E and 3980E to support single traffic flows that are larger than 10Gbps. To enable this feature, you can assign interfaces to round robin groups using the following configuration. If you assign an interface to a Round Robin group, the ISF uses round-robin load balancing to distribute incoming traffic from one stream to multiple NP6 processors. Round-robin load balancing prevents the potential problems associated with hash-based load balancing of packets from a single stream.
config system npu
config port-npu-map
edit <interface>
set npu-group-index <npu-group>
end
end
<interface> is the name of an interface that receives or sends large traffic streams.
<npu-group>
is the number of an NPU group.To enable round-robin load balancing select a round-robin NPU group. Use ?
to see the list of NPU groups. The output shows which groups support round robin load balancing. For example, the following output shows that NPU group 30 supports round robin load balancing to NP6 0 to 7.
set npu-group-index ? index: npu group 0 : NP#0-7 2 : NP#0 3 : NP#1 4 : NP#2 5 : NP#3 6 : NP#4 7 : NP#5 8 : NP#6 9 : NP#7 10 : NP#0-1 11 : NP#2-3 12 : NP#4-5 13 : NP#6-7 14 : NP#0-3 15 : NP#4-7 30 : NP#0-7 - Round Robin
For example, use the following command to assign port1, port2, port17 and port18 to NPU group 30.
config system npu
config port-npu-map
edit port1
set npu-group-index 30
next
edit port2
set npu-group-index 30
next
edit port7
set npu-group-index 30
next
edit port18
set npu-group-index 30
next
end
end
Recalculating packet checksums if the iph.reserved bit is set to 0
NP6 and the NP6lite processors clear the iph.flags.reserved bit. This results in the packet checksum becoming incorrect because by default the packet is changed but the checksum is not recalculated. Since the checksum is incorrect these packets may be dropped by the network stack. You can enable this option to cause the system to re-calculate the checksum. Enabling this option may cause a minor performance reduction. This option is disabled by default.
To enabled checksum recalculation for packets with the iph.flags.reserved header:
config system npu
set iph-rsvd-re-cksum enable
end
Improving LAG performance on some FortiGate models
Some FortiGate models support one of the following commands that might improve link aggregation (LAG) performance by reducing the number of dropped packets that can occur with some LAG configurations.
If the command is available, depending on hardware architecture, on some models its available under config system npu
:
config system npu
set lag-sw-out-trunk {disable | enable}
end
And on others the following option is available under config system np6
:
config system np6
edit np6_0
set lag-npu {disable | enable}
end
If you notice NP6- accelerated LAG interface performance is lower than expected or if you notice excessive dropped packets for sessions over LAG interfaces, you could see if your FortiGate has one of these options in the CLI and if available try enabling it and see if performance improves.
If the option is available for your FortiGate under config system np6
, you should enable it for every NP6 processor that is connected to a LAG interface.
NP6 IPsec engine status monitoring
Use the following command to configure NP6 IPsec engine status monitoring.
config monitoring np6-ipsec-engine
set status enable
set interval 5
set threshold 10 10 8 8 6 6 4 4
end
Use this command to configure NP6 IPsec engine status monitoring. NP6 IPsec engine status monitoring writes a system event log message if the IPsec engines in an NP6 processor become locked after receiving malformed packets.
If an IPsec engine becomes locked, that particular engine can no longer process IPsec traffic, reducing the capacity of the NP6 processor. The only way to recover from a locked IPsec engine is to restart the FortiGate device. If you notice an IPsec performance reduction over time on your NP6 accelerated FortiGate device, you could enable NP6 IPsec engine monitoring and check log messages to determine if your NP6 IPsec engines are becoming locked.
To configure IPsec engine status monitoring you set status to enable and then configure the following options:
interval
Set the IPsec engine status check time interval in seconds (range 1 to 60 seconds, default = 1).
threshold <np6_0-threshold> <np6_1-threshold>...<np6_7-threshold>
Set engine status check thresholds. An NP6 processor has eight IPsec engines and you can set a threshold for each engine. NP6 IPsec engine status monitoring regularly checks the status of all eight engines in all NP6 processors in the FortiGate device.
Each threshold can be an integer between 1 and 255 and represents the number of times the NP6 IPsec engine status check detects that the NP6 processor is busy before generating a log message.
The default thresholds are 15 15 12 12 8 8 5 5. Any IPsec engine exceeding its threshold triggers the event log message. The default interval and thresholds have been set to work for most network topologies based on a balance of timely reporting a lock-up and accuracy and on how NP6 processors distribute sessions to their IPsec engines. The default settings mean:
- If engine 1 or 2 are busy for 15 checks (15 seconds) trigger an event log message.
- If engine 3 or 4 are busy for 12 checks (15 seconds) trigger an event log message.
- And so on.
NP6 IPsec engine monitoring writes three levels of log messages:
- Information if an IPsec engine is found to be busy.
- Warning if an IPsec engine exceeds a threshold.
- Critical if a lockup is detected, meaning an IPsec engine continues to exceed its threshold.
The log messages include the NP6 processor and engine affected.