DPDK Bridges¶
Bridge must be specially configured to utilize DPDK-backed physical and virtual ports.
Quick Example¶
This example demonstrates how to add a bridge that will take advantage of DPDK:
$ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
This assumes Open vSwitch has been built with DPDK support. Refer to Open vSwitch with DPDK for more information.
Extended & Custom Statistics¶
The DPDK Extended Statistics API allows PMDs to expose a unique set of
statistics. The Extended Statistics are implemented and supported only for
DPDK physical and vHost ports. Custom statistics are a dynamic set of counters
which can vary depending on the driver. Those statistics are implemented for
DPDK physical ports and contain all “dropped”, “error” and “management”
counters from XSTATS
. A list of all XSTATS
counters can be found
here.
Note
vHost ports only support RX packet size-based counters. TX packet size counters are not available.
To enable statistics, you have to enable OpenFlow 1.4 support for OVS. To
configure a bridge, br0
, to support OpenFlow version 1.4, run:
$ ovs-vsctl set bridge br0 datapath_type=netdev \
protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14
Once configured, check the OVSDB protocols column in the bridge table to ensure OpenFlow 1.4 support is enabled:
$ ovsdb-client dump Bridge protocols
You can also query the port statistics by explicitly specifying the -O
OpenFlow14
option:
$ ovs-ofctl -O OpenFlow14 dump-ports br0
There are custom statistics that OVS accumulates itself and these stats has
ovs_
as prefix. These custom stats are shown along with other stats
using the following command:
$ ovs-vsctl get Interface <iface> statistics
EMC Insertion Probability¶
By default 1 in every 100 flows is inserted into the Exact Match Cache (EMC).
It is possible to change this insertion probability by setting the
emc-insert-inv-prob
option:
$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-inv-prob=N
where:
N
A positive integer representing the inverse probability of insertion, i.e. on average 1 in every
N
packets with a unique flow will generate an EMC insertion.
If N
is set to 1, an insertion will be performed for every flow. If set to
0, no insertions will be performed and the EMC will effectively be disabled.
With default N
set to 100, higher megaflow hits will occur initially as
observed with pmd stats:
$ ovs-appctl dpif-netdev/pmd-stats-show
For certain traffic profiles with many parallel flows, it’s recommended to set
N
to ‘0’ to achieve higher forwarding performance.
It is also possible to enable/disable EMC on per-port basis using:
$ ovs-vsctl set interface <iface> other_config:emc-enable={true,false}
Note
This could be useful for cases where different number of flows expected on different ports. For example, if one of the VMs encapsulates traffic using additional headers, it will receive large number of flows but only few flows will come out of this VM. In this scenario it’s much faster to use EMC instead of classifier for traffic from the VM, but it’s better to disable EMC for the traffic which flows to the VM.
For more information on the EMC refer to Open vSwitch with DPDK .
SMC cache¶
SMC cache or signature match cache is a new cache level after EMC cache. The difference between SMC and EMC is SMC only stores a signature of a flow thus it is much more memory efficient. With same memory space, EMC can store 8k flows while SMC can store 1M flows. When traffic flow count is much larger than EMC size, it is generally beneficial to turn off EMC and turn on SMC. It is currently turned off by default.
To turn on SMC:
$ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true
Datapath Classifier Performance¶
The datapath classifier (dpcls) performs wildcard rule matching, a compute
intensive process of matching a packet miniflow
to a rule miniflow
. The
code that does this compute work impacts datapath performance, and optimizing
it can provide higher switching performance.
Modern CPUs provide extensive SIMD instructions which can be used to get higher performance. The CPU OVS is being deployed on must be capable of running these SIMD instructions in order to take advantage of the performance benefits. In OVS v2.14 runtime CPU detection was introduced to enable identifying if these CPU ISA additions are available, and to allow the user to enable them.
OVS provides multiple implementations of dpcls. The following command enables the user to check what implementations are available in a running instance
$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
Available lookup functions (priority : name)
0 : autovalidator
1 : generic
0 : avx512_gather
To set the priority of a lookup function, run the prio-set
command
$ ovs-appctl dpif-netdev/subtable-lookup-prio-set avx512_gather 5
Lookup priority change affected 1 dpcls ports and 1 subtables.
The highest priority lookup function is used for classification, and the output
above indicates that one subtable of one DPCLS port is has changed its lookup
function due to the command being run. To verify the prioritization, re-run the
get command, note the updated priority of the avx512_gather
function
$ ovs-appctl dpif-netdev/subtable-lookup-prio-get
Available lookup functions (priority : name)
0 : autovalidator
1 : generic
5 : avx512_gather
If two lookup functions have the same priority, the first one in the list is chosen, and the 2nd occurance of that priority is not used. Put in logical terms, a subtable is chosen if its priority is greater than the previous best candidate.
Optimizing Specific Subtable Search¶
During the packet classification, the datapath can use specialized lookup tables to optimize the search. However, not all situations are optimized. If you see a message like the following one in the OVS logs, it means that there is no specialized implementation available for the current networking traffic. In this case, OVS will continue to process the traffic normally using a more generic lookup table.”
“Using non-specialized AVX512 lookup for subtable (4,1) and possibly others.”
(Note that the numbers 4 and 1 will likely be different in your logs)
Additional specialized lookups can be added to OVS if the user provides that log message along with the command output as show below to the OVS mailing list. Note that the numbers in the log message (“subtable (X,Y)”) need to match with the numbers in the provided command output (“dp-extra-info:miniflow_bits(X,Y)”).
“ovs-appctl dpctl/dump-flows -m”, which results in output like this:
ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, skb_priority(0/0),skb_mark(0/0) ,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0), dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00: 00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type( 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0 ,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0, dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs, actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
Please send an email to the OVS mailing list ovs-dev@openvswitch.org with the output of the “dp-extra-info:miniflow_bits(4,1)” values.
CPU ISA Testing and Validation¶
As multiple versions of DPCLS can co-exist, each with different CPU ISA
optimizations, it is important to validate that they all give the exact same
results. To easily test all DPCLS implementations, an autovalidator
implementation of the DPCLS exists. This implementation runs all other
available DPCLS implementations, and verifies that the results are identical.
Running the OVS unit tests with the autovalidator enabled ensures all implementations provide the same results. Note that the performance of the autovalidator is lower than all other implementations, as it tests the scalar implementation against itself, and against all other enabled DPCLS implementations.
To adjust the DPCLS autovalidator priority, use this command
$ ovs-appctl dpif-netdev/subtable-lookup-prio-set autovalidator 7
Running Unit Tests with Autovalidator¶
To run the OVS unit test suite with the DPCLS autovalidator as the default implementation, it is required to recompile OVS. During the recompilation, the default priority of the autovalidator implementation is set to the maximum priority, ensuring every test will be run with every lookup implementation
$ ./configure --enable-autovalidator
Compile OVS in debug mode to have ovs_assert statements error out if there is a mis-match in the DPCLS lookup implementation.
Datapath Interface Performance¶
The datapath interface (DPIF) or dp_netdev_input() is responsible for taking packets through the major components of the userspace datapath; such as miniflow_extract, EMC, SMC and DPCLS lookups, and a lot of the performance stats associated with the datapath.
Just like with the SIMD DPCLS feature above, SIMD can be applied to the DPIF to improve performance.
OVS provides multiple implementations of the DPIF. The available implementations can be listed with the following command
$ ovs-appctl dpif-netdev/dpif-impl-get
Available DPIF implementations:
dpif_scalar (pmds: none)
dpif_avx512 (pmds: 1,2,6,7)
By default, dpif_scalar is used. The DPIF implementation can be selected by name
$ ovs-appctl dpif-netdev/dpif-impl-set dpif_avx512
DPIF implementation set to dpif_avx512.
$ ovs-appctl dpif-netdev/dpif-impl-set dpif_scalar
DPIF implementation set to dpif_scalar.
Running Unit Tests with AVX512 DPIF¶
Since the AVX512 DPIF is disabled by default, a compile time option is available in order to test it with the OVS unit test suite. When building with a CPU that supports AVX512, use the following configure option
$ ./configure --enable-dpif-default-avx512
The following line should be seen in the configure output when the above option is used
checking whether DPIF AVX512 is default implementation... yes
Miniflow Extract¶
Miniflow extract (MFEX) performs parsing of the raw packets and extracts the important header information into a compressed miniflow. This miniflow is composed of bits and blocks where the bits signify which blocks are set or have values where as the blocks hold the metadata, ip, udp, vlan, etc. These values are used by the datapath for switching decisions later. The Optimized miniflow extract is traffic specific to speed up the lookup, whereas the scalar works for ALL traffic patterns
Most modern CPUs have SIMD capabilities. These SIMD instructions are able to process a vector rather than act on one variable. OVS provides multiple implementations of miniflow extract. This allows the user to take advantage of SIMD instructions like AVX512 to gain additional performance.
A list of implementations can be obtained by the following command. The command also shows whether the CPU supports each implementation
$ ovs-appctl dpif-netdev/miniflow-parser-get
Available Optimized Miniflow Extracts:
autovalidator (available: True, pmds: none)
scalar (available: True, pmds: 1,15)
study (available: True, pmds: none)
An implementation can be selected manually by the following command
$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
[study_cnt]
The above command has two optional parameters: study_cnt and core_id. The core_id sets a particular miniflow extract function to a specific pmd thread on the core. The third parameter study_cnt, which is specific to study and ignored by other implementations, means how many packets are needed to choose the best implementation.
Also user can select the study implementation which studies the traffic for a specific number of packets by applying all available implementations of miniflow extract and then chooses the one with the most optimal result for that traffic pattern. The user can optionally provide an packet count [study_cnt] parameter which is the minimum number of packets that OVS must study before choosing an optimal implementation. If no packet count is provided, then the default value, 128 is chosen. Also, as there is no synchronization point between threads, one PMD thread might still be running a previous round, and can now decide on earlier data.
The per packet count is a global value, and parallel study executions with differing packet counts will use the most recent count value provided by user.
Study can be selected with packet count by the following command
$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
Study can be selected with packet count and explicit PMD selection by the following command
$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
In the above command the first parameter is the CORE ID of the PMD thread and this can also be used to explicitly set the miniflow extraction function pointer on different PMD threads.
Scalar can be selected on core 3 by the following command where study count should not be provided for any implementation other than study
$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
Miniflow Extract Validation¶
As multiple versions of miniflow extract can co-exist, each with different
CPU ISA optimizations, it is important to validate that they all give the
exact same results. To easily test all miniflow implementations, an
autovalidator
implementation of the miniflow exists. This implementation
runs all other available miniflow extract implementations, and verifies that
the results are identical.
Running the OVS unit tests with the autovalidator enabled ensures all implementations provide the same results.
To set the Miniflow autovalidator, use this command
$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
A compile time option is available in order to test it with the OVS unit test suite. Use the following configure option
$ ./configure --enable-mfex-default-autovalidator
Unit Test Miniflow Extract¶
Unit test can also be used to test the workflow mentioned above by running the following test-case in tests/system-dpdk.at
make check-dpdk TESTSUITEFLAGS='-k MFEX'
OVS-DPDK - MFEX Autovalidator
The unit test uses mulitple traffic type to test the correctness of the implementaions.
The MFEX commands can also be tested for negative and positive cases to verify that the MFEX set command does not allow for incorrect parameters. A user can directly run the following configuration test case in tests/system-dpdk.at
make check-dpdk TESTSUITEFLAGS='-k MFEX'
OVS-DPDK - MFEX Configuration
Running Fuzzy test with Autovalidator¶
Fuzzy tests can also be done on miniflow extract with the help of auto-validator and Scapy. The steps below describes the steps to reproduce the setup with IP being fuzzed to generate packets.
Scapy is used to create fuzzy IP packets and save them into a PCAP
pkt = fuzz(Ether()/IP()/TCP())
Set the miniflow extract to autovalidator using
$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
OVS is configured to receive the generated packets
$ ovs-vsctl add-port br0 pcap0 -- \
set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
"rx_pcap=fuzzy.pcap"
With this workflow, the autovalidator will ensure that all MFEX implementations are classifying each packet in exactly the same way. If an optimized MFEX implementation causes a different miniflow to be generated, the autovalidator has ovs_assert and logging statements that will inform about the issue.
Unit Fuzzy test with Autovalidator¶
Unit test can also be used to test the workflow mentioned above by running the following test-case in tests/system-dpdk.at
make check-dpdk TESTSUITEFLAGS='-k MFEX'
OVS-DPDK - MFEX Autovalidator Fuzzy