NP6 and NP6lite acceleration

NP6 and NP6lite network processors provide fastpath acceleration by offloading communication sessions from the FortiGate CPU. When the first packet of a new session is received by an interface connected to an NP6 processor, just like any session connecting with any FortiGate interface, the session is forwarded to the FortiGate CPU where it is matched with a security policy. If the session is accepted by a security policy and if the session can be offloaded its session key is copied to the NP6 processor that received the packet. All of the rest of the packets in the session are intercepted by the NP6 processor and fast-pathed out of the FortiGate unit to their destination without ever passing through the FortiGate CPU. The result is enhanced network performance provided by the NP6 processor plus the network processing load is removed from the CPU. In addition the NP6 processor can handle some CPU intensive tasks, like IPsec VPN encryption/decryption.

note icon NP6lite processors have the same architecture and function in the same way as NP6 processors. All of the descriptions of NP6 processors in this document can be applied to NP6lite possessors except where noted.

Session keys (and IPsec SA keys) are stored in the memory of the NP6 processor that is connected to the interface that received the packet that started the session. All sessions are fast-pathed and accelerated, even if they exit the FortiGate unit through an interface connected to another NP6. There is no dependence on getting the right pair of interfaces since the offloading is done by the receiving NP6.

The key to making this possible is an Integrated Switch Fabric (ISF) that connects the NP6s and the FortiGate unit interfaces together. Many FortiGate units with NP6 processors also have an ISF. The ISF allows any port connectivity. All ports and NP6s can communicate with each other over the ISF. There are no special ingress and egress fast path requirements as long as traffic enters and exits on interfaces connected to the same ISF.

Some FortiGate units, such as the FortiGate-1000D include multiple NP6 processors that are not connected by an ISF. Because the ISF is not present fast path acceleration is supported only between interfaces connected to the same NP6 processor. Since the ISF introduces some latency, models with no ISF provide low-latency network acceleration between network interfaces connected to the same NP6 processor.

Each NP6 has a maximum throughput of 40 Gbps using 4 x 10 Gbps XAUI or Quad Serial Gigabit Media Independent Interface (QSGMII) interfaces or 3 x 10 Gbps and 16 x 1 Gbps XAUI or QSGMII interfaces.

There are at least two limitations to keep in mind:

  • The capacity of each NP6 processor. An individual NP6 processor can support between 10 and 16 million sessions. This number is limited by the amount of memory the processor has. Once an NP6 processor hits its session limit, sessions that are over the limit are sent to the CPU. You can avoid this problem by as much as possible distributing incoming sessions evenly among the NP6 processors. To be able to do this you need to be aware of which interfaces connect to which NP6 processors and distribute incoming traffic accordingly.
  • The NP6 processors in some FortiGate units employ NP direct technology that removes the ISF. The result is very low latency but no inter-processor connectivity requiring you to make sure that traffic to be offloaded enters and exits the FortiGate through interfaces connected to the same NP processor.

NP6 session fast path requirements

NP6 processors can offload the following traffic and services:

  • IPv4 and IPv6 traffic and NAT64 and NAT46 traffic (as well as IPv4 and IPv6 versions of the following traffic types where appropriate).
  • Link aggregation (LAG) (IEEE 802.3ad) traffic and traffic from static redundant interfaces (see Increasing NP6 offloading capacity using link aggregation groups (LAGs)).
  • TCP, UDP, ICMP, SCTP, and RDP traffic.
  • IPsec VPN traffic, and offloading of IPsec encryption/decryption (including SHA2-256 and SHA2-512)
  • NP6 processor IPsec engines support null, DES, 3DES, AES128, AES192, and AES256 encryption algorithms
  • NP6 processor IPsec engines support null, MD5, SHA1, SHA256, SHA 384, and SHA512 authentication algorithms
  • IPsec traffic that passes through a FortiGate without being unencrypted.
  • Anomaly-based intrusion prevention, checksum offload and packet defragmentation.
  • IPIP tunneling (also called IP in IP tunneling), SIT tunneling, and IPv6 tunneling sessions.
  • Multicast traffic (including Multicast over IPsec).
  • CAPWAP and wireless bridge traffic tunnel encapsulation to enable line rate wireless forwarding from FortiAP devices (not supported by the NP6lite).
  • Traffic shaping and priority queuing for both shared and per IP traffic shaping.
  • Syn proxying (not supported by the NP6lite).
  • DNS session helper (not supported by the NP6lite)/
  • Inter-VDOM link traffic.

Sessions that are offloaded must be fast path ready. For a session to be fast path ready it must meet the following criteria:

  • Layer 2 type/length must be 0x0800 for IPv4 or 0x86dd for IPv6 (IEEE 802.1q VLAN specification is supported).
  • Layer 3 protocol can be IPv4 or IPv6.
  • Layer 4 protocol can be UDP, TCP, ICMP, or SCTP.
  • In most cases, Layer 3 / Layer 4 header or content modification sessions that require a session helper can be offloaded.
  • Local host traffic (originated by the FortiGate unit) can be offloaded.
  • If the FortiGate supports, NTurbo sessions can be offloaded if they are accepted by firewall policies that include IPS, Application Control, CASI, flow-based antivirus, or flow-based web filtering.

Offloading Application layer content modification is not supported. This means that sessions are not offloaded if they are accepted by firewall policies that include proxy-based virus scanning, proxy-based web filtering, DNS filtering, DLP, Anti-Spam, VoIP, ICAP, Web Application Firewall, or Proxy options.

note icon If you disable anomaly checks by Intrusion Prevention (IPS), you can still enable hardware accelerated anomaly checks using the fp-anomaly field of the config system interface CLI command. See Configuring individual NP6 processors.

If a session is not fast path ready, the FortiGate unit will not send the session key or IPsec SA key to the NP6 processor. Without the session key, all session key lookup by a network processor for incoming packets of that session fails, causing all session packets to be sent to the FortiGate unit’s main processing resources, and processed at normal speeds.

If a session is fast path ready, the FortiGate unit will send the session key or IPsec SA key to the network processor. Session key or IPsec SA key lookups then succeed for subsequent packets from the known session or IPsec SA.

Packet fast path requirements

Packets within the session must then also meet packet requirements.

  • Incoming packets must not be fragmented.
  • Outgoing packets must not require fragmentation to a size less than 385 bytes. Because of this requirement, the configured MTU (Maximum Transmission Unit) for a network processor’s network interfaces must also meet or exceed the network processors’ supported minimum MTU of 385 bytes.

Mixing fast path and non-fast path traffic

If packet requirements are not met, an individual packet will be processed by the FortiGate CPU regardless of whether other packets in the session are offloaded to the NP6.

Also, in some cases, a protocol’s session(s) may receive a mixture of offloaded and non-offloaded processing. For example, VoIP control packets may not be offloaded but VoIP data packets (voice packets) may be offloaded.

NP6Lite processors

The NP6Lite works the same way as the NP6. Being a lighter version, the NP6Lite has a lower capacity than the NP6. The NP6lite max throughput is 10 Gbps using 2x QSGMII and 2x Reduced gigabit media-independent interface (RGMII) interfaces.

Also, the NP6lite does not offload the following types of sessions:

  • CAPWAP
  • Syn proxy
  • DNS session helper

NP6 and NP6Lite processors and sFlow and NetFlow

NP6 and NP6Lite offloading is supported when you configure NetFlow for interfaces connected to NP6 or NP6Lite processors. Offloading of other sessions is not affected by configuring NetFlow.

Configuring sFlow on any interface disables all NP6 and NP6Lite offloading for all traffic on that interface.

NP6 processors and traffic shaping

NP6-offloaded sessions support most types of traffic shaping. However, in bandwidth and out bandwidth traffic shaping, set using the following command, is not supported:

config system interface

edit port1

set outbandwidth <value>

set inbandwidth <value>

end

Configuring in bandwidth traffic shaping has no effect. Configuring out bandwidth traffic shaping imposes more limiting than configured, potentially reducing throughput more than expected.

NP Direct

On FortiGates with more than one NP6 processor, removing the Internal Switch Fabric (ISF) for NP Direct architecture provides direct access to the NP6 processors for the lowest latency forwarding. Because the NP6 processors are not connected, care must be taken with network design to make sure that all traffic to be offloaded enters and exits the FortiGate through interfaces connected to the same NP6 processor. As well Link Aggregation (LAG) interfaces should only include interfaces all connected to the same NP6 processor.

Example NP direct hardware with more than one NP6 processor includes:

  • Ports 25 to 32 of the FortiGate-3700D in low latency mode.
  • FortiGate-2000E
  • FortiGate-2500E

Viewing your FortiGate NP6 processor configuration

Use either of the following commands to view the NP6 processor hardware configuration of your FortiGate unit:

get hardware npu np6 port-list

diagnose npu np6 port-list

If your FortiGate has NP6lite processors, you can use either of the following commands:

get hardware npu np6lite port-list

diagnose npu np6lite port-list

For example, for the FortiGate-5001D the output would be:

get hardware npu np6 port-list
Chip   XAUI Ports            Max   Cross-chip 
                             Speed offloading 
------ ---- -------          ----- ---------- 
np6_0  0    port3            10G   Yes        
       1    
       2    base1            1G    Yes        
       3    
       0-3  port1            40G   Yes        
       0-3  fabric1          40G   Yes        
       0-3  fabric3          40G   Yes        
       0-3  fabric5          40G   Yes        
------ ---- -------          ----- ---------- 
np6_1  0    
       1    port4            10G   Yes        
       2    
       3    base2            1G    Yes        
       0-3  port2            40G   Yes        
       0-3  fabric2          40G   Yes        
       0-3  fabric4          40G   Yes        
------ ---- -------          ----- ---------- 

For more example output for different FortiGate models, see FortiGate NP6 architectures and FortiGate NP6lite architectures.

You can also use the following command to view the features enabled or disabled on the NP6 processors in your FortiGate unit:

diagnose npu np6 npu-feature 
                    np_0      np_1      
------------------- --------- --------- 
Fastpath            Enabled   Enabled   
HPE-type-shaping    Disabled  Disabled  
Standalone          No        No        
IPv4 firewall       Yes       Yes       
IPv6 firewall       Yes       Yes       
IPv4 IPSec          Yes       Yes       
IPv6 IPSec          Yes       Yes       
IPv4 tunnel         Yes       Yes       
IPv6 tunnel         Yes       Yes       
GRE tunnel          No        No        
GRE passthrough     Yes       Yes       
IPv4 Multicast      Yes       Yes       
IPv6 Multicast      Yes       Yes       
CAPWAP              Yes       Yes       
RDP Offload         Yes       Yes       

The following command is available to view the features enabled or disabled on the NP6Lite processors in your FortiGate unit:

diagnose npu np6lite npu-feature
                    np_0      np_1    
------------------- --------- --------- 
Fastpath            Enabled   Enabled   
IPv4 firewall       Yes       Yes       
IPv6 firewall       Yes       Yes       
IPv4 IPSec          Yes       Yes       
IPv6 IPSec          Yes       Yes       
IPv4 tunnel         Yes       Yes       
IPv6 tunnel         Yes       Yes       
GRE tunnel          No        No        
IPv4 Multicast      Yes       Yes       
IPv6 Multicast      Yes       Yes       

Disabling NP6 and NP6lite hardware acceleration (fastpath)

You can use the following command to disable NP6 offloading for all traffic. This option disables NP6 offloading for all traffic for all NP6 and NP6lite processors.

config system npu

set fastpath disable

end

Optimizing NP6 performance by distributing traffic to XAUI links

On most FortiGate units with NP6 processors, the FortiGate interfaces are switch ports that connect to the NP6 processors with XAUI links. Packets pass from the interfaces to the NP6 processor over the XAUI links. Each NP6 processor has a 40 Gigabit bandwidth capacity. The four XAUI links each have a 10 Gigabit capacity for a total of 40 Gigabits.

On many FortiGate units with NP6 processors, the NP6 processors and the XAUI links are over-subscribed. Since the NP6 processors are connected by an Integrated Switch Fabric, you do not have control over how traffic is distributed to them. In fact traffic is distributed evenly by the ISF.

However, you can control how traffic is distributed to the XAUI links and you can optimize performance by distributing traffic evenly among the XAUI links. For example, if you have a very high amount of traffic passing between two networks, you can connect each network to interfaces connected to different XAUI links to distribute the traffic for each network to a different XAUI link.

For example, on a FortiGate-3200D (See FortiGate-3200D fast path architecture), there are 48 10-Gigabit interfaces that send and receive traffic for two NP6 processors over a total of eight 10-Gigabit XAUI links. Each XAUI link gets traffic from six 10-Gigabit FortiGate interfaces. The amount of traffic that the FortiGate-3200D can offload is limited by the number of NP6 processors and the number of XAUI links. You can optimize the amount of traffic that the FortiGate-3200D can process by distributing it evenly amount the XAUI links and the NP6 processors.

You can see the Ethernet interface, XAUI link, and NP6 configuration by entering the get hardware npu np6 port-list command. For the FortiGate-3200D the output is:

get hardware npu np6 port-list 
Chip   XAUI Ports   Max   Cross-chip 
                    Speed offloading 
------ ---- ------- ----- ---------- 
np6_0  0    port1   10G   Yes        
       0    port5   10G   Yes        
       0    port10  10G   Yes        
       0    port13  10G   Yes        
       0    port17  10G   Yes        
       0    port22  10G   Yes        
       1    port2   10G   Yes        
       1    port6   10G   Yes        
       1    port9   10G   Yes        
       1    port14  10G   Yes        
       1    port18  10G   Yes        
       1    port21  10G   Yes        
       2    port3   10G   Yes        
       2    port7   10G   Yes        
       2    port12  10G   Yes        
       2    port15  10G   Yes        
       2    port19  10G   Yes        
       2    port24  10G   Yes        
       3    port4   10G   Yes        
       3    port8   10G   Yes        
       3    port11  10G   Yes        
       3    port16  10G   Yes        
       3    port20  10G   Yes        
       3    port23  10G   Yes        
------ ---- ------- ----- ---------- 
np6_1  0    port26  10G   Yes        
       0    port29  10G   Yes        
       0    port33  10G   Yes        
       0    port37  10G   Yes        
       0    port41  10G   Yes        
       0    port45  10G   Yes        
       1    port25  10G   Yes        
       1    port30  10G   Yes        
       1    port34  10G   Yes        
       1    port38  10G   Yes        
       1    port42  10G   Yes        
       1    port46  10G   Yes        
       2    port28  10G   Yes        
       2    port31  10G   Yes        
       2    port35  10G   Yes        
       2    port39  10G   Yes        
       2    port43  10G   Yes        
       2    port47  10G   Yes        
       3    port27  10G   Yes        
       3    port32  10G   Yes        
       3    port36  10G   Yes        
       3    port40  10G   Yes        
       3    port44  10G   Yes        
       3    port48  10G   Yes        
------ ---- ------- ----- ----------

In this command output you can see that each NP6 has for four XAUI links (0 to 3) and that each XAUI link is connected to six 10-gigabit Ethernet interfaces. To optimize throughput you should keep the amount of traffic being processed by each XAUI port to under 10 Gbps. So for example, if you want to offload traffic from four 10-gigabit networks you can connect these networks to Ethernet interfaces 1, 2, 3 and 4. This distributes the traffic from each 10-Gigabit network to a different XAUI link. Also, if you wanted to offload traffic from four more 10-Gigabit networks you could connect them to Ethernet ports 26, 25, 28, and 27. As a result each 10-Gigabit network would be connected to a different XAUI link.

Enabling bandwidth control between the ISF and NP6 XAUI ports

In some cases, the Internal Switch Fabric (ISF) buffer size may be larger than the buffer size of an NP6 XAUI port that receives traffic from the ISF. If this happens, burst traffic from the ISF may exceed the capacity of an XAUI port and sessions may be dropped.

You can use the following command to configure bandwidth control between the ISF and XAUI ports. Enabling bandwidth control can smooth burst traffic and keep the XAUI ports from getting overwhelmed and dropping sessions.

Use the following command to enable bandwidth control:

config system npu

set sw-np-bandwidth {0G | 2G | 4G | 5G | 6G}

end

The default setting is 0G which means no bandwidth control. The other options limit the bandwidth to 2Gbps, 4Gbps and so on.

Increasing NP6 offloading capacity using link aggregation groups (LAGs)

NP6 processors can offload sessions received by interfaces in link aggregation groups (LAGs) (IEEE 802.3ad). A 802.3ad Link Aggregation and it's management protocol, Link Aggregation Control Protocol (LACP) LAG combines more than one physical interface into a group that functions like a single interface with a higher capacity than a single physical interface. For example, you could use a LAG if you want to offload sessions on a 30 Gbps link by adding three 10-Gbps interfaces to the same LAG.

All offloaded traffic types are supported by LAGs, including IPsec VPN traffic. Just like with normal interfaces, traffic accepted by a LAG is offloaded by the NP6 processor connected to the interfaces in the LAG that receive the traffic to be offloaded. If all interfaces in a LAG are connected to the same NP6 processor, traffic received by that LAG is offloaded by that NP6 processor. The amount of traffic that can be offloaded is limited by the capacity of the NP6 processor.

If a FortiGate has two or more NP6 processors connected by an integrated switch fabric (ISF), you can use LAGs to increase offloading by sharing the traffic load across multiple NP6 processors. You do this by adding physical interfaces connected to different NP6 processors to the same LAG.

Adding a second NP6 processor to a LAG effectively doubles the offloading capacity of the LAG. Adding a third further increases offloading. The actual increase in offloading capacity may not actually be doubled by adding a second NP6 or tripled by adding a third. Traffic and load conditions and other factors may limit the actual offloading result.

The increase in offloading capacity offered by LAGs and multiple NP6s is supported by the integrated switch fabric (ISF) that allows multiple NP6 processors to share session information. Most FortiGate units with multiple NP6 processors also have an ISF. However, FortiGate models such as the 1000D, 2000E, and 2500E do not have an ISF. If you attempt to add interfaces connected to different NP6 processors to a LAG the system displays an error message.

There are also a few limitations to LAG NP6 offloading support for IPsec VPN:

  • IPsec VPN anti-replay protection cannot be used if IPSec is configured on a LAG that has interfaces connected to multiple NP6 processors.
  • Because the encrypted traffic for one IPsec VPN tunnel has the same 5-tuple, the traffic from one tunnel can only can be balanced to one interface in a LAG. This limits the maximum throughput for one IPsec VPN tunnel in an NP6 LAG group to 10Gbps.

NP6 processors and redundant interfaces

NP6 processors can offload sessions received by interfaces that are part of a redundant interface. You can combine two or more physical interfaces into a redundant interface to provide link redundancy. Redundant interfaces ensure connectivity if one physical interface, or the equipment on that interface, fails. In a redundant interface, traffic travels only over one interface at a time. This differs from an aggregated interface where traffic travels over all interfaces for distribution of increased bandwidth.

All offloaded traffic types are supported by redundant interfaces, including IPsec VPN traffic. Just like with normal interfaces, traffic accepted by a redundant interface is offloaded by the NP6 processor connected to the interfaces in the redundant interface that receive the traffic to be offloaded. If all interfaces in a redundant interface are connected to the same NP6 processor, traffic received by that redundant interface is offloaded by that NP6 processor. The amount of traffic that can be offloaded is limited by the capacity of the NP6 processor.

If a FortiGate has two or more NP6 processors connected by an integrated switch fabric (ISF), you can create redundant interfaces that include physical interfaces connected to different NP6 processors. However, with a redundant interface, only one of the physical interfaces is processing traffic at any given time. So you cannot use redundant interfaces to increase performance in the same way as you can with aggregate interfaces.

The ability to add redundant interfaces connected to multiple NP6s is supported by the integrated switch fabric (ISF) that allows multiple NP6 processors to share session information. Most FortiGate units with multiple NP6 processors also have an ISF. However, FortiGate models such as the 1000D, 2000E, and 2500E do not have an ISF. If you attempt to add interfaces connected to different NP6 processors to a redundant interface the system displays an error message.

Configuring inter-VDOM link acceleration with NP6 processors

FortiGate units with NP6 processors include inter-VDOM links that can be used to accelerate inter-VDOM link traffic.

  • For a FortiGate unit with two NP6 processors there are two accelerated inter-VDOM links, each with two interfaces:
  • npu0_vlink:
    npu0_vlink0
    npu0_vlink1
  • npu1_vlink:
    npu1_vlink0
    npu1_vlink1

These interfaces are visible from the GUI and CLI. For a FortiGate unit with NP6 interfaces, enter the following CLI command to display the NP6-accelerated inter-VDOM links:

get system interface

...

== [ npu0_vlink0 ]

name: npu0_vlink0 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable

 

== [ npu0_vlink1 ]

name: npu0_vlink1 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable

 

== [ npu1_vlink0 ]

name: npu1_vlink0 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable

 

== [ npu1_vlink1 ]

name: npu1_vlink1 mode: static ip: 0.0.0.0 0.0.0.0 status: down netbios-forward: disable type: physical sflow-sampler: disable explicit-web-proxy: disable explicit-ftp-proxy: disable mtu-override: disable wccp: disable drop-overlapped-fragment: disable drop-fragment: disable

...

By default the interfaces in each inter-VDOM link are assigned to the root VDOM. To use these interfaces to accelerate inter-VDOM link traffic, assign each interface in the pair to the VDOMs that you want to offload traffic between. For example, if you have added a VDOM named New-VDOM to a FortiGate unit with NP4 processors, you can go to System > Network > Interfaces and edit the npu0-vlink1 interface and set the Virtual Domain to New-VDOM. This results in an accelerated inter-VDOM link between root and New-VDOM. You can also do this from the CLI:

config system interface

edit npu0-vlink1

set vdom New-VDOM

end

Using VLANs to add more accelerated inter-VDOM links

You can add VLAN interfaces to the accelerated inter-VDOM links to create inter-VDOM links between more VDOMs. For the links to work, the VLAN interfaces must be added to the same inter-VDOM link, must be on the same subnet, and must have the same VLAN ID.

For example, to accelerate inter-VDOM link traffic between VDOMs named Marketing and Engineering using VLANs with VLAN ID 100 go to System > Network > Interfaces and select Create New to create the VLAN interface associated with the Marketing VDOM:

Name Marketing-link
Type VLAN
Interface npu0_vlink0
VLAN ID 100
Virtual Domain Marketing
IP/Network Mask 172.20.120.12/24

Create the inter-VDOM link associated with Engineering VDOM:

Name Engineering-link
Type VLAN
Interface npu0_vlink1
VLAN ID 100
Virtual Domain Engineering
IP/Network Mask 172.20.120.22/24

Or do the same from the CLI:

config system interface

edit Marketing-link

set vdom Marketing

set ip 172.20.120.12/24

set interface npu0_vlink0

set vlanid 100

next

edit Engineering-link

set vdom Engineering

set ip 172.20.120.22/24

set interface npu0_vlink1

set vlanid 100

Confirm that the traffic is accelerated

Use the following CLI commands to obtain the interface index and then correlate them with the session entries. In the following example traffic was flowing between new accelerated inter-VDOM links and physical ports port1 and port 2 also attached to the NP6 processor.

diagnose ip address list

IP=172.31.17.76->172.31.17.76/255.255.252.0 index=5 devname=port1

IP=10.74.1.76->10.74.1.76/255.255.252.0 index=6 devname=port2

IP=172.20.120.12->172.20.120.12/255.255.255.0 index=55 devname=IVL-VLAN1_ROOT

IP=172.20.120.22->172.20.120.22/255.255.255.0 index=56 devname=IVL-VLAN1_VDOM1

 

diagnose sys session list

session info: proto=1 proto_state=00 duration=282 expire=24 timeout=0 session info: proto=1 proto_state=00 duration=124 expire=59 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=3

origin-shaper=

reply-shaper=

per_ip_shaper=

ha_id=0 policy_dir=0 tunnel=/

state=may_dirty npu

statistic(bytes/packets/allow_err): org=180/3/1 reply=120/2/1 tuples=2

orgin->sink: org pre->post, reply pre->post dev=55->5/5->55 gwy=172.31.19.254/172.20.120.22

hook=post dir=org act=snat 10.74.2.87:768->10.2.2.2:8(172.31.17.76:62464)

hook=pre dir=reply act=dnat 10.2.2.2:62464->172.31.17.76:0(10.74.2.87:768)

misc=0 policy_id=4 id_policy_id=0 auth_info=0 chk_client_info=0 vd=0

serial=0000004e tos=ff/ff ips_view=0 app_list=0 app=0

dd_type=0 dd_mode=0

per_ip_bandwidth meter: addr=10.74.2.87, bps=880

npu_state=00000000

npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=160/218, ipid=218/160, vlan=32769/0

 

 

session info: proto=1 proto_state=00 duration=124 expire=20 timeout=0 flags=00000000 sockflag=00000000 sockport=0 av_idx=0 use=3

origin-shaper=

reply-shaper=

per_ip_shaper=

ha_id=0 policy_dir=0 tunnel=/

state=may_dirty npu

statistic(bytes/packets/allow_err): org=180/3/1 reply=120/2/1 tuples=2

orgin->sink: org pre->post, reply pre->post dev=6->56/56->6 gwy=172.20.120.12/10.74.2.87

hook=pre dir=org act=noop 10.74.2.87:768->10.2.2.2:8(0.0.0.0:0)

hook=post dir=reply act=noop 10.2.2.2:768->10.74.2.87:0(0.0.0.0:0)

misc=0 policy_id=3 id_policy_id=0 auth_info=0 chk_client_info=0 vd=1

serial=0000004d tos=ff/ff ips_view=0 app_list=0 app=0

dd_type=0 dd_mode=0

per_ip_bandwidth meter: addr=10.74.2.87, bps=880

npu_state=00000000

npu info: flag=0x81/0x81, offload=8/8, ips_offload=0/0, epid=219/161, ipid=161/219, vlan=0/32769

total session 2

Disabling offloading IPsec Diffie-Hellman key exchange

You can use the following command to disable using ASIC offloading to accelerate IPsec Diffie-Hellman key exchange for IPsec ESP traffic. By default hardware offloading is used. For debugging purposes or other reasons you may want this function to be processed by software.

Use the following command to disable using ASIC offloading for IPsec Diffie-Hellman key exchange:

config system global

set ipsec-asic-offload disable

end

Access control lists (ACLs)

Access Control Lists (ACLs) use NP6 offloading to drop IPv4 or IPv6 packets at the physical network interface before the packets are analyzed by the CPU. On a busy appliance this can really help the performance. This feature is available on FortiGates with NP6 processors and is not supported by FortiGates with NP6lite processors.

The ACL feature is available only on FortiGates with NP6-accelerated interfaces. ACL checking is one of the first things that happens to the packet and checking is done by the NP6 processor. The result is very efficient protection that does not use CPU or memory resources.

Use the following command to configure IPv4 ACL lists:

config firewall acl

edit 0

set status enable

set interface <interface-name>

set srcaddr <firewall-address>

set dstaddr <firewall-address>

set service <firewall-service>

end

Use the following command to configure IPv6 ACL lists:

config firewall acl6

edit 0

set status enable

set interface <interface-name>

set srcaddr <firewall-address6>

set dstaddr <firewall-address6>

set service <firewall-service>

end

Where:

<interface-name> is the interface on which to apply the ACL. There is a hardware limitation that needs to be taken into account. The ACL is a Layer 2 function and is offloaded to the ISF hardware, therefore no CPU resources are used in the processing of the ACL. It is handled by the inside switch chip which can do hardware acceleration, increasing the performance of the FortiGate. The ACL function is only supported on switch fabric driven interfaces.

<firewall-address> <firewall-address6> can be any of the address types used by the FortiGate, including address ranges. The traffic is blocked not on an either or basis of these addresses but the combination of the two, so that they both have to be correct for the traffic to be denied. To block all of the traffic from a specific address all you have to do is make the destination address ALL.

Because the blocking takes place at the interface based on the information in the packet header and before any processing such as NAT can take place, a slightly different approach may be required. For instance, if you are trying to protect a VIP which has an external address of x.x.x.x and is forwarded to an internal address of y.y.y.y, the destination address that should be used is x.x.x.x, because that is the address that will be in the packet's header when it hits the incoming interface.

<firewall-service> the firewall service to block. Use ALL to block all services.

Configuring individual NP6 processors

You can use the config system np6 command to configure a wide range of settings for each of the NP6 processors in your FortiGate unit including enabling session accounting and adjusting session timeouts. As well you can set anomaly checking for IPv4 and IPv6 traffic.

You can also enable and adjust Host Protection Engine (HPE) to protect networks from DoS attacks by categorizing incoming packets based on packet rate and processing cost and applying packet shaping to packets that can cause DoS attacks.

The settings that you configure for an NP6 processor with the config system np6 command apply to traffic processed by all interfaces connected to that NP6 processor. This includes the physical interfaces connected to the NP6 processor as well as all subinterfaces, VLAN interfaces, IPsec interfaces, LAGs and so on associated with the physical interfaces connected to the NP6 processor.

note icon Some of the options for this command apply anomaly checking for NP6 sessions in the same way as the command described in applies anomaly checking for NP4 sessions.

config system np6

edit <np6-processor-name>

set low-latency-mode {disable | enable}

set per-session-accounting {all-enable | disable | enable-by-log}

set session-timeout-random-range <range>

set garbage-session-collector {disable | enable}

set session-collector-interval <range>

set session-timeout-interval <range>

set session-timeout-random-range <range>

set session-timeout-fixed {disable | enable}

config hpe

set tcpsyn-max <packets-per-second>

set tcp-max <packets-per-second>

set udp-max <packets-per-second>

set icmp-max <packets-per-second>

set sctp-max <packets-per-second>

set esp-max <packets-per-second>

set ip-frag-max <packets-per-second>

set ip-others-max <packets-per-second>

set arp-max <packets-per-second>

set l2-others-max <packets-per-second>

set enable-shaper {disable | enable}

config fp-anomaly

set tcp-syn-fin {allow | drop | trap-to-host}

set tcp_fin_noack {allow | drop | trap-to-host}

set tcp_fin_only {allow | drop | trap-to-host}

set tcp_no_flag {allow | drop | trap-to-host}

set tcp_syn_data {allow | drop | trap-to-host}

set tcp-winnuke {allow | drop | trap-to-host}

set tcp-land {allow | drop | trap-to-host}

set udp-land {allow | drop | trap-to-host}

set icmp-land {allow | drop | trap-to-host}

set icmp-frag {allow | drop | trap-to-host}

set ipv4-land {allow | drop | trap-to-host}

set ipv4-proto-err {allow | drop | trap-to-host}

set ipv4-unknopt {allow | drop | trap-to-host}

set ipv4-optrr {allow | drop | trap-to-host}

set ipv4-optssrr {allow | drop | trap-to-host}

set ipv4-optlsrr {allow | drop | trap-to-host}

set ipv4-optstream {allow | drop | trap-to-host}

set ipv4-optsecurity {allow | drop | trap-to-host}

set ipv4-opttimestamp {allow | drop | trap-to-host}

set ipv4-csum-err {drop | trap-to-host}

set tcp-csum-err {drop | trap-to-host}

set udp-csum-err {drop | trap-to-host}

set icmp-csum-err {drop | trap-to-host}

set ipv6-land {allow | drop | trap-to-host}

set ipv6-proto-err {allow | drop | trap-to-host}

set ipv6-unknopt {allow | drop | trap-to-host}

set ipv6-saddr-err {allow | drop | trap-to-host}

set ipv6-daddr-err {allow | drop | trap-to-host}

set ipv6-optralert {allow | drop | trap-to-host}

set ipv6-optjumbo {allow | drop | trap-to-host}

set ipv6-opttunnel {allow | drop | trap-to-host}

set ipv6-opthomeaddr {allow | drop | trap-to-host}

set ipv6-optnsap {allow | drop | trap-to-host}

set ipv6-optendpid {allow | drop | trap-to-host}

set ipv6-optinvld {allow | drop | trap-to-host}

end

Command syntax
Command Description Default
low-latency-mode {disable | enable} Enable low-latency mode. In low latency mode the integrated switch fabric is bypassed. Low latency mode requires that packet enter and exit using the same NP6 processor. This option is only available for NP6 processors that can operate in low-latency mode, currently only np6_0 and np6_1 on the FortiGate-3700D and DX. disable
per-session-accounting {all-enable | disable | enable-by-log} Disable NP6 per-session accounting or enable it and control how it works. If set to enable-by-log (the default) NP6 per-session accounting is only enabled if firewall policies accepting offloaded traffic have traffic logging enabled. If set to all-enable, NP6 per-session accounting is always enabled for all traffic offloaded by the NP6 processor.

Enabling per-session accounting can affect performance.
enable-by-log
garbage-session-collector {disable | enable} Enable deleting expired or garbage sessions. disable
session-collector-interval <range> Set the expired or garbage session collector time interval in seconds. The range is 1 to 100 seconds. 64
session-timeout-interval <range> Set the timeout for checking for and removing inactive NP6 sessions. The range is 0 to 1000 seconds. 40
session-timeout-random-range <range> Set the random timeout for checking and removing inactive NP6 sessions. The range is 0 to 1000 seconds. 8
session-timeout-fixed {disable | enable} Enable to force checking for and removing inactive NP6 sessions at the session-timeout-interval time interval. Set to disable (the default) to check for and remove inactive NP6 sessions at random time intervals. disable
config hpe
hpe Use the following options to use HPE to apply DDoS protection at the NP6 processor by limiting the number packets per second received for various packet types by each NP6 processor. This rate limiting is applied very efficiently because it is done in hardware by the NP6 processor.
enable-shaper {disable | enable} Enable or disable HPE DDoS protection. disable
tcpsyn-max Limit the maximum number of TCP SYN packets received per second. The range is 10,000 to 4,000,000,000 pps. The default limits the number of packets per second to 5,000,000 pps. 5000000
tcp-max Limit the maximum number of TCP packets received per second. The range is 10,000 to 4,000,000,000 pps. The default limits the number of packets per second to 5,000,000 pps. 5000000
udp-max Limit the maximum number of UDP packets received per second. The range is 10,000 to 4,000,000,000 pps. The default limits the number of packets per second to 5,000,000 pps. 5000000
icmp-max Limit the maximum number of ICMP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
sctp-max Limit the maximum number of SCTP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
esp-max Limit the maximum number of ESP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
ip-frag-max Limit the maximum number of fragmented IP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
ip-others-max Limit the maximum number of other types of IP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
arp-max Limit the maximum number of ARP packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
l2-others-max Limit the maximum number of other layer-2 packets received. The range is 10,000 to 4,000,000,000 pps. The default is 100,000 pps. 100000
config fp-anomaly
fp-anomaly Configure how the NP6 processor does traffic anomaly protection. In most cases you can configure the NP6 processor to allow or drop the packets associated with an attack or forward the packets that are associated with the attack to FortiOS (called trap-to-host). Selecting trap-to-host turns off NP6 anomaly protection for that anomaly. If you require anomaly protection but don't want to use the NP6 processor, you can select trap-to-host and enable anomaly protection with a DoS policy.
tcp-syn-fin {allow | drop | trap-to-host} Detects TCP SYN flood SYN/FIN flag set anomalies. allow
tcp_fin_noack {allow | drop | trap-to-host} Detects TCP SYN flood with FIN flag set without ACK setting anomalies. trap-to-host
tcp_fin_only {allow | drop | trap-to-host} Detects TCP SYN flood with only FIN flag set anomalies. trap-to-host
tcp_no_flag {allow | drop | trap-to-host} Detects TCP SYN flood with no flag set anomalies. allow
tcp_syn_data {allow | drop | trap-to-host} Detects TCP SYN flood packets with data anomalies. allow
tcp-winnuke {allow | drop | trap-to-host} Detects TCP WinNuke anomalies. trap-to-host
tcp-land {allow | drop | trap-to-host} Detects TCP land anomalies. trap-to-host
udp-land {allow | drop | trap-to-host} Detects UDP land anomalies. trap-to-host
icmp-land {allow | drop | trap-to-host} Detects ICMP land anomalies. trap-to-host
icmp-frag {allow | drop | trap-to-host} Detects Layer 3 fragmented packets that could be part of a layer 4 ICMP anomalies. allow
ipv4-land {allow | drop | trap-to-host} Detects IPv4 land anomalies. trap-to-host
ipv4-proto-err {allow | drop | trap-to-host} Detects invalid layer 4 protocol anomalies. For information about the error codes that are produced by setting this option to drop, see NP6 anomaly error codes. trap-to-host
ipv4-unknopt {allow | drop | trap-to-host} Detects unknown option anomalies. trap-to-host
ipv4-optrr {allow | drop | trap-to-host} Detects IPv4 with record route option anomalies. trap-to-host
ipv4-optssrr {allow | drop | trap-to-host} Detects IPv4 with strict source record route option anomalies. trap-to-host
ipv4-optlsrr {allow | drop | trap-to-host} Detects IPv4 with loose source record route option anomalies. trap-to-host
ipv4-optstream {allow | drop | trap-to-host} Detects stream option anomalies. trap-to-host
ipv4-optsecurity {allow | drop | trap-to-host} Detects security option anomalies. trap-to-host
ipv4-opttimestamp {allow | drop | trap-to-host} Detects timestamp option anomalies. trap-to-host
ipv4-csum-err {drop | trap-to-host} Detects IPv4 checksum errors. drop
tcp-csum-err {drop | trap-to-host} Detects TCP checksum errors. drop
udp-csum-err {drop | trap-to-host} Detects UDP checksum errors. drop
icmp-csum-err {drop | trap-to-host} Detects ICMP checksum errors. drop
ipv6-land {allow | drop | trap-to-host} Detects IPv6 land anomalies trap-to-host
ipv6-unknopt {allow | drop | trap-to-host} Detects unknown option anomalies. trap-to-host
ipv6-saddr-err {allow | drop | trap-to-host} Detects source address as multicast anomalies. trap-to-host
ipv6-daddr_err {allow | drop | trap-to-host} Detects destination address as unspecified or loopback address anomalies. trap-to-host
ipv6-optralert {allow | drop | trap-to-host} Detects router alert option anomalies. trap-to-host
ipv6-optjumbo {allow | drop | trap-to-host} Detects jumbo options anomalies. trap-to-host
ipv6-opttunnel {allow | drop | trap-to-host} Detects tunnel encapsulation limit option anomalies. trap-to-host
ipv6-opthomeaddr {allow | drop | trap-to-host} Detects home address option anomalies. trap-to-host
ipv6-optnsap {allow | drop | trap-to-host} Detects network service access point address option anomalies. trap-to-host
ipv6-optendpid {allow | drop | trap-to-host} Detects end point identification anomalies. trap-to-host
ipv6-optinvld {allow | drop | trap-to-host} Detects invalid option anomalies. trap-to-host

Enabling per-session accounting for offloaded NP6 and NP6lite sessions

Per-session accounting is a logging feature that allows the FortiGate to report the correct bytes/pkt numbers per session for sessions offloaded to an NP6 or NP6lite processor. This information appears in traffic log messages as well as in FortiView. The following example shows the Sessions dashboard widget tracking SPU and nTurbo sessions. Current sessions shows the total number of sessions, SPU shows the percentage of these sessions that are SPU sessions and Nturbo shows the percentage that are nTurbo sessions.

You configure per-session accounting for each NP6 processor. For example, use the following command to enable per-session accounting for NP6_0 and NP6_1:

config system np6

edit np6_0

set per-session-accounting enable-by-log

next

edit np6_1

set per-session-accounting enable-by-log

end

If your FortiGate has NP6lite processors, you can use the following command to enable per-session accounting for all of the NP6lite processors in the FortiGate unit:

config system npu

set per-session-accounting enable-by-log

end

The option, enable-by-log enables per-session accounting for offloaded sessions with traffic logging enabled and all-enable enables per-session accounting for all offloaded sessions.

By default, per-session-accounting is set to enable-by-log, which results in per-session accounting being turned on when you enable traffic logging in a policy.

Per-session accounting can affect offloading performance. So you should only enable per-session accounting if you need the accounting information.

Enabling per-session accounting does not provide traffic flow data for sFlow or NetFlow.

Configuring NP6 session timeouts

For NP6 traffic, FortiOS refreshes an NP6 session's lifetime when it receives a session update message from the NP6 processor. To avoid session update message congestion, these NP6 session checks are performed all at once after a random time interval and all of the update messages are sent from the NP6 processor to FortiOS at once. This can result in fewer messages being sent because they are only sent at random time intervals instead of every time a session times out.

In fact, if your NP6 processor is processing a lot of short lived sessions, it is recommended that you use the default setting of random checking every 8 seconds to avoid very bursty session updates. If the time between session updates is very long and very many sessions have been expired between updates a large number of updates will need to be done all at once.

You can use the following command to set the random time range.

config system np6

edit <np6-processor-name>

set session-timeout-fixed disable

set session-timeout-random-range 8

end

This is the default configuration. The random timeout range is 1 to 1000 seconds and the default range is 8. So, by default, NP6 sessions are checked at random time intervals of between 1 and 8 seconds. So sessions can be inactive for up to 8 seconds before they are removed from the FortiOS session table.

If you want to reduce the amount of checking you can increase the session-timeout-random-range. This could result in inactive sessions being kept in the session table longer. But if most of your NP6 sessions are relatively long this shouldn't be a problem.

You can also change this session checking to a fixed time interval and set a fixed timeout:

config system np6

edit <np6-processor-name>

set session-timeout-fixed enable

set session-timeout-fixed 40

end

The fixed timeout default is every 40 seconds and the rang is 1 to 1000 seconds. Using a fixed interval further reduces the amount of checking that occurs.

You can select random or fixed updates and adjust the time intervals to minimize the refreshing that occurs while still making sure inactive sessions are deleted regularly. For example, if an NP6 processor is processing sessions with long lifetimes you can reduce checking by setting a relatively long fixed timeout.

Configure the number of IPsec engines NP6 processors use

NP6 processors use multiple IPsec engines to accelerate IPsec encryption and decryption. In some cases out of order ESP packets can cause problems if multiple IPsec engines are running. To resolve this problem you can configure all of the NP6 processors to use fewer IPsec engines.

Use the following command to change the number of IPsec engines used for decryption (ipsec-dec-subengine-mask) and encryption (ipsec-enc-subengine-mask). These settings are applied to all of the NP6 processors in the FortiGate unit.

config system npu

set ipsec-dec-subengine-mask <engine-mask>

set ipsec-enc-subengine-mask <engine-mask>

end

<engine-mask> is a hexadecimal number in the range 0x01 to 0xff where each bit represents one IPsec engine. The default <engine-mask> for both options is 0xff which means all IPsec engines are used. Add a lower <engine-mask> to use fewer engines. You can configure different engine masks for encryption and decryption.

Stripping clear text padding and IPsec session ESP padding

In some situations, when clear text or ESP packets in IPsec sessions may have large amounts of layer 2 padding, the NP6 IPsec engine may not be able to process them and the session may be blocked.

If you notice dropped IPsec sessions, you could try using the following CLI options to cause the NP6 processor to strip clear text padding and ESP padding before send the packets to the IPsec engine. With padding stripped, the session can be processed normally by the IPsec engine.

Use the following command to strip ESP padding:

config system npu

set strip-esp-padding enable

set strip-clear-text-padding enable

end

Stripping clear text and ESP padding are both disabled by default.

Disable NP6 CAPWAP offloading

By default and where possible, managed FortiAP and FortiLink CAPWAP sessions are offloaded to NP6 processors. You can use the following command to disable CAWAP session offloading:

config system npu

set capwap-offload disable

end

Optionally disable NP6 offloading of traffic passing between 10Gbps and 1Gbps interfaces

Due to NP6 internal packet buffer limitations, some offloaded packets received at a 10Gbps interface and destined for a 1Gbps interface can be dropped, reducing performance for TCP and IP tunnel traffic. If you experience this performance reduction, you can use the following command to disable offloading sessions passing from 10Gbps interfaces to 1Gbps interfaces:

config system npu

set host-shortcut-mode host-shortcut

end

Select host-shortcut to stop offloading TCP and IP tunnel packets passing from 10Gbps interfaces to 1Gbps interfaces. TCP and IP tunnel packets passing from 1Gbps interfaces to 10Gbps interfaces are still offloaded as normal.

If host-shortcut is set to the default bi-directional setting, packets in both directions are offloaded.

This option is only available if your FortiGate has 10G and 1G interfaces accelerated by NP6 processors.

Offloading RDP traffic

FortiOS supports NP6 offloading of Reliable Data Protocol (RDP) traffic. RDP is a network transport protocol that optimizes remote loading, debugging, and bulk transfer of images and data. RDP traffic uses Assigned Internet Protocol number 27 and is defined in RFC 908 and updated in RFC 1151. If your network is processing a lot of RDP traffic, offloading it can improve overall network performance.

You can use the following command to enable or disable NP6 RDP offloading. RDP offloading is enabled by default.

config system npu

set rdp-offload {disable | enable}

end

NP6 session drift

In some cases, sessions processed by NP6 processors may fail to be deleted leading to a large number of idle sessions. This is called session drift. You can use SNMP to be alerted when the number of idle sessions becomes high. SNMP also allows you to see which NP6 processor has the abnormal number of idle sessions and you can use a diagnose command to delete them.

The following MIB fields allow you to use SNMP to monitor session table information for NP6 processors including drift for each NP6 processor:

FORTINET-FORTIGATE-MIB::fgNPUNumber.0 = INTEGER: 2

FORTINET-FORTIGATE-MIB::fgNPUName.0 = STRING: NP6

FORTINET-FORTIGATE-MIB::fgNPUDrvDriftSum.0 = INTEGER: 0

FORTINET-FORTIGATE-MIB::fgNPUIndex.0 = INTEGER: 0

FORTINET-FORTIGATE-MIB::fgNPUIndex.1 = INTEGER: 1

FORTINET-FORTIGATE-MIB::fgNPUSessionTblSize.0 = Gauge32: 33554432

FORTINET-FORTIGATE-MIB::fgNPUSessionTblSize.1 = Gauge32: 33554432

FORTINET-FORTIGATE-MIB::fgNPUSessionCount.0 = Gauge32: 0

FORTINET-FORTIGATE-MIB::fgNPUSessionCount.1 = Gauge32: 0

FORTINET-FORTIGATE-MIB::fgNPUDrvDrift.0 = INTEGER: 0

FORTINET-FORTIGATE-MIB::fgNPUDrvDrift.1 = INTEGER: 0


You can also use the following diagnose command to determine of drift is occurring:

diagnose npu np6 sse-drift-summary 
NPU   drv-drift 
----- --------- 
np6_0 0         
np6_1 0         
----- --------- 
Sum   0       
----- --------- 

The command output shows a drift summary for all the NP6 processors in the system, and shows the total drift. Normally the sum is 0. The previous command output, from a FortiGate-1500D, shows that the 1500D's two NP6 processors are not experiencing any drift.

If the sum is not zero, then extra idle sessions may be accumulating. You can use the following command to delete those sessions:

diagnose npu np6 sse-purge-drift <np6_id> [<time>]

Where <np6_id> is the number (starting with NP6_0 with a np6_id of 0) of the NP6 processor for which to delete idle sessions in. <time> is the age in seconds of the idle sessions to be deleted. All idle sessions this age and older are deleted. The default time is 300 seconds.

The diagnose npu np6 sse-stats <np6_id> command output also includes a drv-drift field that shows the total drift for one NP6 processor.

Optimizing FortiGate-3960E and 3980E IPsec VPN performance

You can use the following command to configure outbound hashing to improve IPsec VPN performance for the FortiGate-3960E and 3980E. If you change these settings, to make sure they take affect, you should reboot your device.

config system np6

edit np6_0

set ipsec-outbound-hash {disable | enable}

set ipsec-ob-hash-function {switch-group-hash | global- hash | global-hash-weighted | round-robin-switch-group | round-robin-global}

end

Where:

ipsec-outbound-hash is disabled by default. If you enable it you can set ipsec-ob-hash-function as follows:

switch-group-hash (the default) distribute outbound IPsec Security Association (SA) traffic to NP6 processors connected to the same switch as the interfaces that received the incoming traffic. This option, keeps all traffic on one switch and the NP6 processors connected to that switch, to improve performance.

global-hash distribute outbound IPsec SA traffic among all NP6 processors.

global-hash-weighted distribute outbound IPsec SA traffic from switch 1 among all NP6 processors with more sessions going to the NP6s connected to switch 0. This options is only recommended for the FortiGate-3980E because it is designed to weigh switch 0 hider to send more sessions to switch 0 which on the FortiGate-3980E has more NP6 processors connected to it. On the FortiGate-3960E both switches have the same number of NP6s so for best performance one switch shouldn't have a higher weight.

round-robin-switch-group round-robin distribution of outbound IPsec SA traffic among the NP6 processors connected to the same switch.

round-robin-global round-robin distribution of outbound IPsec SA traffic among all NP6 processors.

FortiGate-3960E and 3980E support for high throughput traffic streams

FortiGate devices with multiple NP6 processors support high throughput by distributing sessions to multiple NP6 processors. However, default ISF hash-based load balancing has some limitations for single traffic streams or flows that use more than 10Gbps of bandwidth. Normally, the ISF sends all of the packets in a single traffic stream over the same 10Gbps interface to an NP6 processor. If a single traffic stream is larger than 10Gbps, packets are also sent to 10Gbps interfaces that may be connected to the same NP6 or to other NP6s. Because the ISF uses hash-bsed load balancing, this can lead to packets being processed out of order and other potential drawbacks.

You can configure the FortiGate-3960E and 3980E to support single traffic flows that are larger than 10Gbps. To enable this feature, you can assign interfaces to round robin groups using the following configuration. If you assign an interface to a Round Robin group, the ISF uses round-robin load balancing to distribute incoming traffic from one stream to multiple NP6 processors. Round-robin load balancing prevents the potential problems associated with hash-based load balancing of packets from a single stream.

config system npu

config port-npu-map

edit <interface>

set npu-group-index <npu-group>

end

end

<interface> is the name of an interface that receives or sends large traffic streams.

<npu-group> is the number of an NPU group.To enable round-robin load balancing select a round-robin NPU group. Use ? to see the list of NPU groups. The output shows which groups support round robin load balancing. For example, the following output shows that NPU group 30 supports round robin load balancing to NP6 0 to 7.

set npu-group-index ?
index: npu group
0 : NP#0-7
2 : NP#0
3 : NP#1
4 : NP#2
5 : NP#3
6 : NP#4
7 : NP#5
8 : NP#6
9 : NP#7
10 : NP#0-1
11 : NP#2-3
12 : NP#4-5
13 : NP#6-7
14 : NP#0-3
15 : NP#4-7
30 : NP#0-7 - Round Robin

For example, use the following command to assign port1, port2, port17 and port18 to NPU group 30.

config system npu

config port-npu-map

edit port1

set npu-group-index 30

next

edit port2

set npu-group-index 30

next

edit port7

set npu-group-index 30

next

edit port18

set npu-group-index 30

next

end

end

Recalculating packet checksums if the iph.reserved bit is set to 0

NP6 and the NP6lite processors clear the iph.flags.reserved bit. This results in the packet checksum becoming incorrect because by default the packet is changed but the checksum is not recalculated. Since the checksum is incorrect these packets may be dropped by the network stack. You can enable this option to cause the system to re-calculate the checksum. Enabling this option may cause a minor performance reduction. This option is disabled by default.

To enabled checksum recalculation for packets with the iph.flags.reserved header:

config system npu

set iph-rsvd-re-cksum enable

end

Improving LAG performance on some FortiGate models

Some FortiGate models support one of the following commands that might improve link aggregation (LAG) performance by reducing the number of dropped packets that can occur with some LAG configurations.

If the command is available, depending on hardware architecture, on some models its available under config system npu:

config system npu

set lag-sw-out-trunk {disable | enable}

end

And on others the following option is available under config system np6:

config system np6

edit np6_0

set lag-npu {disable | enable}

end

If you notice NP6- accelerated LAG interface performance is lower than expected or if you notice excessive dropped packets for sessions over LAG interfaces, you could see if your FortiGate has one of these options in the CLI and if available try enabling it and see if performance improves.

If the option is available for your FortiGate under config system np6, you should enable it for every NP6 processor that is connected to a LAG interface.

NP6 IPsec engine status monitoring

Use the following command to configure NP6 IPsec engine status monitoring.

config monitoring np6-ipsec-engine

set status enable

set interval 5

set threshold 10 10 8 8 6 6 4 4

end

Use this command to configure NP6 IPsec engine status monitoring. NP6 IPsec engine status monitoring writes a system event log message if the IPsec engines in an NP6 processor become locked after receiving malformed packets.

If an IPsec engine becomes locked, that particular engine can no longer process IPsec traffic, reducing the capacity of the NP6 processor. The only way to recover from a locked IPsec engine is to restart the FortiGate device. If you notice an IPsec performance reduction over time on your NP6 accelerated FortiGate device, you could enable NP6 IPsec engine monitoring and check log messages to determine if your NP6 IPsec engines are becoming locked.

To configure IPsec engine status monitoring you set status to enable and then configure the following options:

interval

Set the IPsec engine status check time interval in seconds (range 1 to 60 seconds, default = 1).

threshold <np6_0-threshold> <np6_1-threshold>...<np6_7-threshold>

Set engine status check thresholds. An NP6 processor has eight IPsec engines and you can set a threshold for each engine. NP6 IPsec engine status monitoring regularly checks the status of all eight engines in all NP6 processors in the FortiGate device.

Each threshold can be an integer between 1 and 255 and represents the number of times the NP6 IPsec engine status check detects that the NP6 processor is busy before generating a log message.

The default thresholds are 15 15 12 12 8 8 5 5. Any IPsec engine exceeding its threshold triggers the event log message. The default interval and thresholds have been set to work for most network topologies based on a balance of timely reporting a lock-up and accuracy and on how NP6 processors distribute sessions to their IPsec engines. The default settings mean:

  • If engine 1 or 2 are busy for 15 checks (15 seconds) trigger an event log message.
  • If engine 3 or 4 are busy for 12 checks (15 seconds) trigger an event log message.
  • And so on.

NP6 IPsec engine monitoring writes three levels of log messages:

  • Information if an IPsec engine is found to be busy.
  • Warning if an IPsec engine exceeds a threshold.
  • Critical if a lockup is detected, meaning an IPsec engine continues to exceed its threshold.

The log messages include the NP6 processor and engine affected.