TRex

1. Introduction

1.1. A word on traffic generators

Traditionally, routers have been tested using commercial traffic generators, while performance typically has been measured using packets per second (PPS) metrics. As router functionality and services have become more complex, stateful traffic generators have become necessary to provide more realistic traffic scenarios.

Advantages of realistic traffic generators:

Accurate performance metrics.

Discovering bottlenecks in realistic traffic scenarios.

1.1.1. Current Challenges:

Cost: Commercial stateful traffic generators are very expensive.

Scale: Bandwidth does not scale up well with feature complexity.

Standardization: Lack of standardization of traffic patterns and methodologies.

Flexibility: Commercial tools are not sufficiently agile when flexibility and customization are needed.

1.1.2. Implications

High capital expenditure (capex) spent by different teams.

Testing in low scale and extrapolation became a common practice. This is non-ideal and fails to indicate bottlenecks that appear in real-world scenarios.

Teams use different benchmark methodologies, so results are not standardized.

Delays in development and testing due to dependence on testing tool features.

Resource and effort investment in developing different ad hoc tools and test methodologies.

1.2. Overview of TRex

TRex addresses the problems associated with commercial stateful traffic generators, through an innovative and extendable software implementation, and by leveraging standard and open software and x86/UCS hardware.

Stateful traffic generator based on pre-processing and smart replay of real traffic templates.

Generates and amplifies both client- and server-side traffic.

Customized functionality can be added.

Scales to 200Gb/sec for one UCS (using Intel 40Gb/sec NICs).

Low cost.

Self-contained package that can be easily installed and deployed.

Virtual interface support enables TRex to be used in a fully virtual environment without physical NICs. Example use cases:

Amazon AWS

Cisco LaaS

TRex on your laptop

Table 1. TRex Hardware

Cisco UCS Platform

Intel NIC

1.3. Purpose of this guide

This guide explains the use of TRex internals and the use of TRex together with Cisco ASR1000 Series routers. The examples illustrate novel traffic generation techniques made possible by TRex.

2. Download and installation

2.1. Hardware recommendations

TRex operates in a Linux application environment, interacting with Linux kernel modules.
TRex curretly works on x86 architecture and can operate well on Cisco UCS hardware. The following platforms have been tested and are recommended for operating TRex.

A high-end UCS platform is not required for operating TRex in its current version, but may be required for future versions.

3. First time Running

3.1. Configuring for loopback

Before connecting TRex to your DUT, it is strongly advised to verify that TRex and the NICs work correctly in loopback.

For best performance, loopback the interfaces on the same NUMA (controlled by the same physical processor). If you are unable to check this, proceed without this step.

If you are using a 10Gbs NIC based on an Intel 520-D2 NIC, and you loopback ports on the same NIC using SFP+, the device might not sync, causing a failure to link up.
Many types of SFP+ (Intel/Cisco/SR/LR) have been verified to work.
If you encounter link issues, try to loopback interfaces from different NICs, or use Cisco twinax copper cable.

<none>
- port_limit :2
version :2#List of interfaces. Change according to your setup. Use ./dpdk_setup_ports.py -s to see available options.
interfaces :["03:00.0","03:00.1"]#
port_info :# Port IPs. Change according to your needs. In case of loopback, you can leave as is.
- ip :1.1.1.1
default_gw :2.2.2.2
- ip :2.2.2.2
default_gw :1.1.1.1

Edit this line to match the interfaces you are using.
All NICs must have the same type - do not mix different NIC types in one config file. For more info, see trex-201.

3.2. Script for creating config file

A script is available to automate the process of tailoring the basic configuration file to your needs. The script gets you started, and then you can then edit the resulting configuration file directly for advanced options. For details, see YAML Configuration File.

There are two ways to run the script:

Interactive mode: Script pormpts you for parameters.

Command line mode: Provide all parameters using command line options.

3.2.1. Interactive mode

The script provides a list of available interfaces with interface-related information. Follow the instructions to create a basic config file.

[bash]>sudo ./dpdk_setup_ports.py -i

3.2.2. Command line mode

Run the following command to display a list of all interfaces and interface-related information:

[bash]>sudo ./dpdk_setup_ports.py -t

In case of Loopback and/or only L1-L2 Switches on the way, IPs and destination MACs are not required. The script assumes the following interface connections: 0↔1, 2↔3 etc.

Destination MACs to be used per each interface. Use this option for MAC-based configuration instead of IP-based. Do not use this option together with --ip and --def_gw

--dest-macs 11:11:11:11:11:11 22:22:22:22:22:22

--ip

List of IPs to use for each interface. If --ip and --dest-macs are not specified, the script assumes loopback connections (0↔1, 2↔3 etc.).

--ip 1.2.3.4 5.6.7.8

--def-gw

List of default gateways to use for each interface. When using the --ip option, also use the --def_gw option.

--def-gw 3.4.5.6 7.8.9.10

--ci

Cores include: White list of cores to use. Include enough cores for each NUMA.

--ci 0 2 4 5 6

--ce

Cores exclude: Black list of cores to exclude. When excluding cores, ensure that enough remain for each NUMA.

--ci 10 11 12

--no-ht

No HyperThreading: Use only one thread of each core specified in the configuration file.

--prefix

(Advanced option) Prefix to be used in TRex configuration in case of parallel instances.

--prefix first_instance

--zmq-pub-port

(Advanced option) ZMQ Publisher port to be used in TRex configuration in case of parallel instances.

--zmq-pub-port 4000

--zmq-rpc-port

(Advanced option) ZMQ RPC port to be used in the TRex configuration in case of parallel instances.

--zmq-rpc-port

--ignore-numa

(Advanced option) Ignore NUMAs for configuration creation. This option may reduce performance. Use only if necessary - for example, in case of a pair of interfaces at different NUMAs.

3.3. TRex on ESXi

General recommendation: For best performance, run TRex on "bare metal" hardware, without any type of VM. Bandwidth on a VM may be limited, and IPv6 may not be fully supported.

In special cases, it may be reasonable or advantageous to run TRex on VM:

If you already have VM installed, and do not require high performance.

Virtual NICs can be used to bridge between TRex and NICs not supported by TRex.

3.3.1. Configuring ESXi for running TRex

Click the host machine, then select Configuration → Networking.

One of the NICs must be connected to the main vSwitch network for an "outside" connection for the TRex client and ssh:

Other NICs that are used for TRex traffic must be in a separate vSwitch:

Right-click the guest machine → Edit settings → Ensure the NICs are set to their networks:

Before version 2.10, the following command did not function correctly:

sudo ./t-rex-64 -f cap2/dns.yaml --lm 1 --lo -l 1000 -d 100

The vSwitch did not route packets correctly. This issue was resolved in version 2.10 when TRex started to support ARP.

3.3.2. Configuring Pass-through

Pass-through enables direct use of host machine NICs from within the VM. Pass-through access is generally limited only by the NIC/hardware itself, but there may be occasional spikes in latency (~10ms). Passthrough settings cannot be saved to OVA.

Average CPU utilization of transmitters threads. For best results it should be lower than 80%.

Gb/sec generated per core of DP. Higher is better.

Total Tx must be the same as Rx at the end of the run.

Total Rx must be the same as Tx at the end of the run.

Expected number of packets per second (calculated without latency packets).

Expected number of connections per second (calculated without latency packets).

Expected number of bits per second (calculated without latency packets).

Number of TRex active "flows". Could be different than the number of router flows, due to aging issues. Usualy the TRex number of active flows is much lower than that of the router because the router ages flows slower.

Total number of TRex flows opened since startup (including active ones, and ones already closed).

Drop rate.

Rx and latency thread CPU utilization.

Tx_ok on port 0 should equal Rx_ok on port 1, and vice versa.

3.5.2. Additional information about statistics in output

socket

Same as the active flows.

Socket/Clients

Average of active flows per client, calculated as active_flows/#clients.

Socket-util

Estimate of number of L4 ports (sockets) used per client IP. This is approximately (100*active_flows/#clients)/64K, calculated as (average active flows per client*100/64K). Utilization of more than 50% means that TRex is generating too many flows per single client, and that more clients must be added in the generator configuration.

Max window

Maximum latency within a time window of 500 msec. There are few values shown per port.
The earliest value is on the left, and latest value (last 500msec) on the right. This can help in identifying spikes of high latency clearing after some time. Maximum latency is the total maximum over the entire test duration. To best understand this, run TRex with the latency option (-l) and watch the results with this section in mind.

Platform_factor

In some cases, users duplicate traffic using a splitter/switch. In this scenario, it is useful for all numbers displayed by TRex to be multiplied by this factor, so that TRex counters will match the DUT counters.

If you do not see Rx packets, review the MAC address configuration.

4. Basic usage

4.1. DNS basic example

The following is a simple example helpful for understanding how TRex works. The example uses the TRex simulator.
This simulator can be run on any Cisco Linux including on the TRex itself.
TRex simulates clients and servers and generates traffic based on the pcap files provided.

Clients/Servers

The following is an example YAML-format traffic configuration file (cap2/dns_test.yaml), with explanatory notes.

Global statistics on the templates given. cps=connection per second. tps is template per second. they might be different in case of plugins where one template includes more than one flow. For example RTP flow in SFR profile (avl/delay_10_rtp_160k_full.pcap)

Generator output.

[bash]>wireshark my.erf

gives

TRex generated output file

As the output file shows…

TRex generates a new flow every 1 sec.

Client IP values are taken from client IP pool .

Servers IP values are taken from server IP pool .

IPG (inter packet gap) values are taken from the configuration file (10 msec).

In basic usage, TRex does not wait for an initiator packet to be received. The response packet will be triggered based only on timeout (IPG in this example).
In advanced scenarios (for example, NAT), The first packet of the flow can process by TRex software and initiate the response packet only when a packet is received.
Consequently, it is necessary to process the template pcap file offline and ensure that there is enough round-trip delay (RTT) between client and server packets.
One approach is to record the flow with a Pagent that creats RTT (10 msec RTT in the example), recording the traffic at some distance from both the client and server (not close to either side).
This ensures sufficient delay that packets from each side will arrive without delay in the DUT. TRex-dev will work on an offline tool that will make it even simpler.
Another approach is to change the yamlipg field to a high enough value (bigger than 10msec ).

Converting the simulator text results in a table similar to the following:

The output above illustrates two HTTP flows and ten DNS flows in 1 second, as expected.

4.7. SFR traffic YAML

SFR traffic includes a combination of traffic templates. This traffic mix in the example below was defined by SFR France.
This SFR traffic profile is used as our traffic profile for our ASR1k/ISR-G2 benchmark. It is also possible to use EMIX instead of IMIX traffic.

The traffic was recorded from a Spirent C100 with a Pagent that introduce 10msec delay from client and server side.

4.9. Traffic profiles provided with the TRex package

sfr traffic profile capture from Avalanche - Spirent without bundeling support with RTT=10msec ( a delay machine), this can be used with --ipv6 and --learn-mode

avl/sfr_delay_10_1g.yaml

head-end sfr traffic profile capture from Avalanche - Spirent with bundeling support with RTT=10msec ( a delay machine), it is normalized to 1Gb/sec for m=1

avl/sfr_branch_profile_delay_10.yaml

branch sfr profile capture from Avalanche - Spirent with bundeling support with RTT=10msec it, is normalized to 1Gb/sec for m=1

cap2/imix_fast_1g.yaml

imix profile with 1600 flows normalized to 1Gb/sec.

cap2/imix_fast_1g_100k_flows.yaml

imix profile with 100k flows normalized to 1Gb/sec.

cap2/imix_64.yaml

64byte UDP packets profile

4.10. Mimicking stateless traffic under stateful mode

TRex supports also true stateless traffic generation.
If you are looking for stateless traffic, please visit the following link: TRex Stateless Support

With this feature you can "repeat" flows and create stateless, IXIA like streams.
After injecting the number of flows defined by limit, TRex repeats the same flows. If all templates have limit the CPS will be zero after some time as there are no new flows after the first iteration.

IMIX support:

Example:

[bash]>sudo ./t-rex-64 -f cap2/imix_64.yaml -d 1000 -m 40000 -c 4 -p

The -p is used here to send the client side packets from both interfaces.
(Normally it is sent from client ports only.)
With this option, the port is selected by the client IP.
All the packets of a flow are sent from the same interface. This may create an issue with routing, as the client’s IP will be sent from the server interface. PBR router configuration solves this issue but cannot be used in all cases. So use this -p option carefully.

Repeats the flows in a loop, generating 1000 flows from this type. In this example, udp_64B includes only one packet.

The cap file "cap2/udp_64B.pcap" includes only one packet of 64B. This configuration file creates 1000 flows that will be repeated as follows:
f1 , f2 , f3 …. f1000 , f1 , f2 …
where the PPS == CPS for -m=1. In this case it will have PPS=1000 in sec for -m==1.
It is possible to mix stateless templates and stateful templates.

4.11. Clients/Servers IP allocation scheme

Currently, there is one global IP pool for clients and servers. It serves all templates. All templates will allocate IP from this global pool.
Each TRex client/server "dual-port" (pair of ports, such as port 0 for client, port 1 for server) has its own generator offset, taken from the config file. The offset is called dual_port_mask.

When using -p option, TRex will not comply with the static route rules. Server-side traffic may be sent from the client side (port 0) and vice-versa.
If you use the -p option, you must configure policy based routing to pass all traffic from router port 1 to router port 2, and vice versa.

VLAN feature does not comply with static route rules. If you use it, you also need policy based routing
rules to pass packets from VLAN0 to VLAN1 and vice versa.

Limitation: When using template with plugins (bundles), the number of servers must be higher than the number of clients.

4.11.1. More Details about IP allocations

Each time a new flow is created, TRex allocates new Client IP/port and Server IP. This 3-tuple should be distinct among active flows.

Currently, only sequential distribution is supported in IP allocation. This means the IP address is increased by one for each flow.

For example, if we have a pool of two IP addresses: 16.0.0.1 and 16.0.0.2, the allocation of client src/port pairs will be

4.12. Measuring Jitter/Latency

To measure jitter/latency using independent flows (SCTP or ICMP), use -l [Hz] where Hz defines the number of packets to send from each port per second.
This option measures latency and jitter. We can define the type of traffic used for the latency measurement using the --l-pkt-mode option.

Option ID

Type

0

default, SCTP packets

1

ICMP echo request packets from both sides

2

Send ICMP requests from one side, and matching ICMP responses from other side.

This is particulary usefull if your DUT drops traffic from outside, and you need to open pin hole to get the outside traffic in (for example when testing a firewall)

3

Send ICMP request packets with a constant 0 sequence number from both sides.

5. Advanced features

5.1. VLAN (dot1q) support

To add a VLAN tag to all traffic generated by TRex, add a “vlan” keyword in each port section in the platform config file, as described in the YAML Configuration File section.

You can specify a different VLAN tag for each port, or use VLAN only on some ports.

One useful application of this can be in a lab setup where you have one TRex and many DUTs, and you want to test a different DUT on each run, without changing cable connections. You can put each DUT on a VLAN of its own, and use different TRex platform configuration files with different VLANs on each run.

If you want simple VLAN support, this is probably not the feature to use. This feature is used for load balancing. To configure VLAN support, see the “vlan” field in the YAML Configuration File section.

In this case, traffic on vlan0 is sent as before, while for traffic on vlan1, the order is reversed (client traffic sent on port1 and server traffic on port0).
TRex divides the flows evenly between the vlans. This results in an equal amount of traffic on each port.

The IPv6 address is formed by placing what would typically be the IPv4
address into the least significant 32 bits and copying the value provided
in the src_ipv6/dst_ipv6 keywords into the most signficant 96 bits.
If src_ipv6 and dst_ipv6 are not specified, the default
is to form IPv4-compatible addresses (most signifcant 96 bits are zero).

Send 80% of the traffic to the upper cluster and 20% to the lower cluster. Specify the DUT to which the packet will be sent by MAC address or IP. (The following example uses the MAC address. The instructions after the example indicate how to change to IP-based.)

The above configuration divides the generator range of 255 clients to two clusters. The range
of IPs in all groups in the client configuration file must cover the entire range of client IPs
from the traffic profile file.

MAC addresses will be allocated incrementally, with a wrap around after “count” addresses.

In this case, TRex attempts to resolve the following addresses using ARP:

1.1.1.1, 1.1.1.2, 1.1.1.3, 1.1.1.4 (and the range 2.2.2.1-2.2.2.4)

If not all IPs are resolved, TRex exits with an error message.

src_ip is used to send gratuitous ARP, and for filling relevant fields in ARP request. If no src_ip is given, TRex looks for the source IP in the relevant port section in the platform configuration file (/etc/trex_cfg.yaml). If none is found, TRex exits with an error message.

If a client config file is given, TRex ignores the dest_mac and default_gw parameters from the platform configuration file.

Now, streams will look like:

Initiator side (packets with source in 16.x.x.x net):

16.0.0.1 → 48.x.x.x - dst_mac: MAC of 1.1.1.1 vlan: 100

16.0.0.2 → 48.x.x.x - dst_mac: MAC of 1.1.1.2 vlan: 100

16.0.0.3 → 48.x.x.x - dst_mac: MAC of 1.1.1.3 vlan: 100

16.0.0.4 → 48.x.x.x - dst_mac: MAC of 1.1.1.4 vlan: 100

16.0.0.5 → 48.x.x.x - dst_mac: MAC of 1.1.1.1 vlan: 100

16.0.0.6 → 48.x.x.x - dst_mac: MAC of 1.1.1.2 vlan: 100

Responder side (packets with source in 48.x.x.x net):

48.x.x.x → 16.0.0.1 - dst_mac: MAC of 2.2.2.1 , vlan:200

48.x.x.x → 16.0.0.2 - dst_mac: MAC of 2.2.2.2 , vlan:200

It is important to understand that the IP to MAC coupling (with either MAC-based or IP-based configuration) is done at the beginning and never changes. For example, in a MAC-based configuration:

Consequently, you can predict exactly which packet (and how many packets) will go to each DUT.

Usage:

[bash]>sudo ./t-rex-64 -f cap2/dns.yaml --client_cfg my_cfg.yaml

5.5.1. Latency with Cluster mode

Latency streams client IP is taken from the first IP in the default client pool. Each dual port will have one Client IP. In case of cluster configuration this is a limitation as you can have a topology with many paths.

For example, in this case 16.0.0.1→48.0.0.1 ICMP will be the flow for latency. The Cluster configuration of this flow will be taken from cluster file ( VLAN=100, dst_mac : "00:00:00:01:00:00" )

5.5.2. Clustering example

In this example we have one DUT with four 10gb interfaces and one TRex with two 40Gb/sec interfaces and we want to convert the traffic from 2 TRex interfaces to 4 DUT Interfaces.
The folowing figure shows the topology

Cluster example

For this topology the following traffic and cluster configuration file were created

All the IPs from client pools and default pool should be maped in this file, it is possible to have wider range in cluster file

The following diagram shows how new flow src_ip/dest_ip/next-hop/vlan is choosen with cluster file

Cluster example

Latency stream will check only 12.1.1.1/4050 path (one DUT intertface) there is no way to check latency on all the ports in current version

DUT should have a static route to move packets from client to server and vice versa, as traffic is not in the same subnet of the ports

An example of one flow generation

next hop resolotion. TRex resolve all the next hop option e.g. 11.10.0.1/4050 11.11.0.1/4051

Choose template by CPS, 50% probability for each. take template #1

SRC_IP=12.1.1.2, DEST_IP=13.1.1.2

Allocate src_port for 12.1.1.2 =⇒src_port=1025 for the first flow of client=12.1.1.2

Associate the next-hop from cluster pool. In this case 12.1.1.2 has the following information
5.1 client side : VLAN=4050 and MAC of 11.10.0.1 (Initiator)
5.2 server side : VLAN=4054 and MAC of 10.10.0.1 (Responder)

5.6. NAT support

TRex can learn dynamic NAT/PAT translation. To enable this feature, use the--learn-mode <mode>
switch at the command line. To learn the NAT translation, TRex must embed information describing which flow a packet belongs to, in the first packet of each flow. TRex can do this using one of several methods, depending on the chosen <mode>.

Mode 1:

--learn-mode 1TCP flow: Flow information is embedded in the ACK of the first TCP SYN.UDP flow: Flow information is embedded in the IP identification field of the first packet in the flow.
This mode was developed for testing NAT with firewalls (which usually do not work with mode 2). In this mode, TRex also learns and compensates for TCP sequence number randomization that might be done by the DUT. TRex can learn and compensate for seq num randomization in both directions of the connection.

Mode 2:

--learn-mode 2
Flow information is added in a special IPv4 option header (8 bytes long 0x10 id). This option header is added only to the first packet in the flow. This mode does not work with DUTs that drop packets with IP options (for example, Cisco ASA firewall).

Mode 3:

--learn-mode 3
Similar to mode 1, but TRex does not learn the seq num randomization in the server→client direction.
This mode can provide better connections-per-second performance than mode 1. But for all existing firewalls, the mode 1 cps rate is adequate.

Number of connections for which TRex had to send the next packet in the flow, but did not learn the NAT translation yet. Should be 0. Usually, a value different than 0 is seen if the DUT drops the flow (probably because it cannot handle the number of connections).

Number of flows for which the flow had already aged out by the time TRex received the translation info. A value other than 0 is rare. Can occur only when there is very high latency in the DUT input/output queue.

Number of flows for which TRex sent the first packet before learning the NAT translation. The value depends on the connection per second rate and round trip time.

Total number of translations over the lifetime of the TRex instance. May be different from the total number of flows if template is uni-directional (and consequently does not need translation).

Out of the timed-out flows, the number that were timed out while waiting to learn the TCP seq num randomization of the server→client from the SYN+ACK packet. Seen only in --learn-mode 1.

Out of the active NAT sessions, the number that are waiting to learn the client→server translation from the SYN packet. (Others are waiting for SYN+ACK from server.) Seen only in --learn-mode 1.

Configuration for Cisco ASR1000 Series:

This feature was tested with the following configuration and the
sfr_delay_10_1g_no_bundling.yaml
traffic profile. The client address range is 16.0.0.1 to 16.0.0.255

The IPv6-IPv6 NAT feature does not exist on routers, so this feature can work only with IPv4.

Does not support NAT64.

Bundling/plugin is not fully supported. Consequently, sfr_delay_10.yaml does not work. Use sfr_delay_10_no_bundling.yaml instead.

--learn-verify is a TRex debug mechanism for testing the TRex learn mechanism.

Need to run it when DUT is configured without NAT. It will verify that the inside_ip==outside_ip and inside_port==outside_port.

5.7. Flow order/latency verification

In normal mode (without the feature enabled), received traffic is not checked by software. Hardware (Intel NIC) testing for dropped packets occurs at the end of the test. The only exception is the Latency/Jitter packets. This is one reason that with TRex, you cannot check features that terminate traffic (for example TCP Proxy).

To enable this feature, add
--rx-check <sample>
to the command line options, where <sample> is the sample rate. The number of flows that will be sent to the software for verification is (1/(sample_rate). For 40Gb/sec traffic you can use a sample rate of 1/128. Watch for Rx CPU% utilization.

This feature changes the TTL of the sampled flows to 255 and expects to receive packets with TTL 254 or 255 (one routing hop). If you have more than one hop in your setup, use --hops to change it to a higher value. More than one hop is possible if there are number of routers betwean TRex client side and TRex server side.

This feature ensures that:

Packets get out of DUT in order (from each flow perspective).

There are no packet drops (no need to wait for the end of the test). Without this flag, you must wait for the end of the test in order to identify packet drops, because there is always a difference between TX and Rx, due to RTT.

6.2. YAML Configuration File (parameter of --cfg option)

Masked interfaces, to ensure that TRex does not try to use the management ports as traffic ports.

Changing the zmq/telnet TCP port.

You specify which config file to use by adding --cfg <file name> to the command line arguments.
If no --cfg given, the default /etc/trex_cfg.yaml is used.
Configuration file examples can be found in the $TREX_ROOT/scripts/cfg folder.

Number of ports. Should be equal to the number of interfaces listed in 3. - mandatory

Must be set to 2. - mandatory

List of interfaces to use. Run sudo ./dpdk_setup_ports.py --show to see the list you can choose from. - mandatory. there are cases that one PCI can have more than one port (MLX4 driver for example), for this you can use the format dd:dd.d/d for example 03:00.0/1, it means the second port of this device. The order of the list is important the first will the virtual port 0.

Enable the ZMQ publisher for stats data, default is true.

ZMQ port number. Default value is good. If running two TRex instances on the same machine, each should be given distinct number. Otherwise, can remove this line.

If running two TRex instances on the same machine, each should be given distinct name. Otherwise, can remove this line. ( Passed to DPDK as --file-prefix arg)

Limit the amount of packet memory used. (Passed to dpdk as -m arg)

Number of threads (cores) TRex will use per interface pair ( Can be overridden by -c command line option )

The bandwidth of each interface in Gbs. In this example we have 10Gbs interfaces. For VM, put 1. Used to tune the amount of memory allocated by TRex.

TRex need to know the destination MAC address to use on each port. You can specify this in one of two ways:
Specify dest_mac directly.
Specify default_gw (since version 2.10). In this case (only if no dest_mac given), TRex will issue ARP request to this IP, and will use
the result as dest MAC. If no dest_mac given, and no ARP response received, TRex will exit.

Source MAC to use when sending packets from this interface. If not given (since version 2.10), MAC address of the port will be used.

If given (since version 2.10), TRex will issue gratitues ARP for the ip + src MAC pair on appropriate port. In stateful mode,
gratitues ARP for each ip will be sent every 120 seconds (Can be changed using --arp-refresh-period argument).

If given (since version 2.18), all traffic on the port will be sent with this VLAN tag.

Old MAC address format. New format is supported since version v2.09.

Stateless ZMQ RPC port number. Default value is good. If running two TRex instances on the same machine, each should be given distinct number. Otherwise, can remove this line.

If you use version earlier than 2.10, or choose to omit the “ip”
and have mac based configuration, be aware that TRex will not send any
gratitues ARP and will not answer ARP requests. In this case, you must configure static
ARP entries pointing to TRex port on your DUT. For an example config, you can look
here.

To find out which interfaces (NIC ports) can be used, perform the following:

6.2.2. Memory section configuration

The memory section is optional. It is used when there is a need to tune the amount of memory used by TRex packet manager.
Default values (from the TRex source code), are usually good for most users. Unless you have some unusual needs, you can
eliminate this section.

Numbers of memory buffers allocated for packets in transit, per port pair. Numbers are specified per packet size.

Numbers of memory buffers allocated for holding the part of the packet which is remained unchanged per template.
You should increase numbers here, only if you have very large amount of templates.

Number of TRex flow objects allocated (To get best performance they are allocated upfront, and not dynamically).
If you expect more concurrent flows than the default (1048576), enlarge this.

Number objects TRex allocates for holding NAT “in transit” connections. In stateful mode, TRex learn NAT
translation by looking at the address changes done by the DUT to the first packet of each flow. So, these are the
number of flows for which TRex sent the first flow packet, but did not learn the translation yet. Again, default
here (10240) should be good. Increase only if you use NAT and see issues.

6.2.3. Platform section configuration

The platform section is optional. It is used to tune the performance and allocate the cores to the right NUMA
a configuration file now has the folowing struct to support multi instance

This gave best results: with ~98 Gb/s TX BW and c=7, CPU utilization became ~21%! (40% with c=4)

6.2.4. Timer Wheeel section configuration

The memory section is optional. It is used when there is a need to tune the amount of memory used by TRex packet manager.
Default values (from the TRex source code), are usually good for most users. Unless you have some unusual needs, you can
eliminate this section.

6.2.5. Timer Wheel section configuration

The flow scheduler uses timer wheel to schedule flows. To tune it for a large number of flows it is possible to change the default values.
This is an advance configuration, don’t use it if you don’t know what you are doing. it can be configure in trex_cfg file and trex traffic profile.

tw :
buckets :1024
levels :3
bucket_time_usec :20.0

the number of buckets in each level, higher number will improve performance, but will reduce the maximum levels.

how many levels.

bucket time in usec. higher number will create more bursts

6.3. Command line options

--active-flows

An experimental switch to scale up or down the number of active flows.
It is not accurate due to the quantization of flow scheduler and in some cases does not work.
Example: --active-flows 500000 wil set the ballpark of the active flows to be ~0.5M.

--allow-coredump

Allow creation of core dump.

--arp-refresh-period <num>

Period in seconds between sending of gratuitous ARP for our addresses. Value of 0 means ``never send``.

-c <num>

Number of hardware threads to use per interface pair. Use at least 4 for TRex 40Gbs.
TRex uses 2 threads for inner needs. Rest of the threads can be used. Maximum number here, can be number of free threads
divided by number of interface pairs.
For virtual NICs on VM, we always use one thread per interface pair.

Same as -p, but change the src/dst IP according to the port. Using this, you will get all the packets of the
same flow from the same port, and with the same src/dst IP.
It will not work good with NBAR as it expects all clients ip to be sent from same direction.

Provide number of hops in the setup (default is one hop). Relevant only if the Rx check is enabled.
Look here for details.

--iom <mode>

I/O mode. Possible values: 0 (silent), 1 (normal), 2 (short).

--ipv6

Convert templates to IPv6 mode.

-k <num>

Run “warm up” traffic for num seconds before starting the test. This is needed if TRex is connected to switch running
spanning tree. You want the switch to see traffic from all relevant source MAC addresses before starting to send real
data. Traffic sent is the same used for the latency test (-l option)
Current limitation (holds for TRex version 1.82): does not work properly on VM.

-l <rate>

In parallel to the test, run latency check, sending packets at rate/sec from each interface.

Used for testing the NAT learning mechanism. Do the learning as if DUT is doing NAT, but verify that packets
are not actually changed.

--limit-ports <port num>

Limit the number of ports used. Overrides the “port_limit” from config file.

--lm <hex bit mask>

Mask specifying which ports will send traffic. For example, 0x1 - Only port 0 will send. 0x4 - only port 2 will send.
This can be used to verify port connectivity. You can send packets from one port, and look at counters on the DUT.

--lo

Latency only - Send only latency packets. Do not send packets from the templates/pcap files.

-m <num>

Rate multiplier. TRex will multiply the CPS rate of each template by num.

--nc

If set, will terminate exacly at the end of the specified duration.
This provides faster, more accurate TRex termination.
By default (without this option), TRex waits for all flows to terminate gracefully. In case of a very long flow, termination might prolong.

--no-flow-control-change

Since version 2.21.
Prevents TRex from changing flow control. By default (without this option), TRex disables flow control at startup for all cards, except for the Intel XL710 40G card.

--no-hw-flow-stat

Relevant only for Intel x710 stateless mode. Do not use HW counters for flow stats.
Enabling this will support lower traffic rate, but will also report RX byte count statistics.

--no-key

Daemon mode, don’t get input from keyboard.

--no-watchdog

Disable watchdog.

-p

Send all packets of the same flow from the same direction. For each flow, TRex will randomly choose between client port and
server port, and send all the packets from this port. src/dst IPs keep their values as if packets are sent from two ports.
Meaning, we get on the same port packets from client to server, and from server to client.
If you are using this with a router, you can not relay on routing rules to pass traffic to TRex, you must configure policy
based routes to pass all traffic from one DUT port to the other.

-pm <num>

Platform factor. If the setup includes splitter, you can multiply all statistic number displayed by TRex by this factor, so that they will match the DUT counters.

-pubd

Disable ZMQ monitor’s publishers.

--rx-check <sample rate>

Enable Rx check module. Using this, each thread randomly samples 1/sample_rate of the flows and checks packet order, latency, and additional statistics for the sampled flows.
Note: This feature works on the RX thread.

--software

Since version 2.21.
Do not configure any hardware rules. In this mode, all RX packets will be processed by software. No HW assist for dropping (while counting) packets will be used.
This mode is good for enabling features like per stream statistics, and
latency, support packet types, not supported by HW flow director rules (For example QinQ).
You can also use this mode for running TRex on interfaces which manifest themselves as ones supported by TRex, but in reality support less hardware capabilities.
For example, NICs supported by DPDK e1000_igb driver, but with different HW capabilities than i350.
Drawback of this is that because software has to handle all received packets, total rate of RX streams is significantly lower.
Currently, this mode is also limited to using only one TX core (and one RX core as usual).

-v <verbosity level>

Show debug info. Value of 1 shows debug info on startup. Value of 3, shows debug info during run at some cases. Might slow down operation.

7.2.3. Upgrade

Download NVMUpdatePackage.zip from Intel site here
It includes the utility nvmupdate64e

Run this:

[bash]>sudo ./nvmupdate64e

You might need a power cycle and to run this command a few times to get the latest firmware

7.2.4. QSFP+ support for XL710

see QSFP+ support for QSFP+ support and Firmware requirement for XL710

7.3. TRex with ASA 5585

When running TRex aginst ASA 5585, you have to notice following things:

ASA can’t forward ipv4 options, so there is a need to use --learn-mode 1 (or 3) in case of NAT. In this mode, bidirectional UDP flows are not supported.
--learn-mode 1 support TCP sequence number randomization in both sides of the connection (client to server and server client). For this to work, TRex must learn
the translation of packets from both sides, so this mode reduce the amount of connections per second TRex can generate (The number is still high enough to test
any existing firewall). If you need higher cps rate, you can use --learn-mode 3. This mode handles sequence number randomization on client→server side only.

7.5.1. Enable forwarding

To make this permanent, add the following line to the file /etc/sysctl.conf:

net.ipv4.ip_forward=1

7.5.2. Add static routes

Example if for the default TRex networks, 48.0.0.0 and 16.0.0.0.

Routing all traffic from 48.0.0.0 to the gateway 10.0.0.100

[cli]>route add -net 48.0.0.0 netmask 255.255.0.0 gw 10.0.0.100

Routing all traffic from 16.0.0.0 to the gateway 172.168.0.100

[cli]>route add -net 16.0.0.0 netmask 255.255.0.0 gw 172.168.0.100

If you use stateless mode, and decide to add route only in one direction, remember to disable reverse path check.
For example, to disable on all interfaces:

for i in /proc/sys/net/ipv4/conf/*/rp_filter ;do
echo 0>$idone

Alternatively, you can edit /etc/network/interfaces, and add something like this for both ports connected to TRex.
This will take effect, only after restarting networking (rebooting the machine in an alternative also).

7.5.3. Add static ARP entries

7.6. Configure Linux to use VF on Intel X710 and 82599 NICs

TRex supports paravirtualized interfaces such as VMXNET3/virtio/E1000 however when connected to a vSwitch, the vSwitch limits the performance. VPP or OVS-DPDK can improve the performance but require more software resources to handle the rate.
SR-IOV can accelerate the performance and reduce CPU resource usage as well as latency by utilizing NIC hardware switch capability (the switching is done by hardware).
TRex version 2.15 now includes SR-IOV support for XL710 and X710.
The following diagram compares between vSwitch and SR-IOV.

One use case which shows the performance gain that can be acheived by using SR-IOV is when a user wants to create a pool of TRex VMs that tests a pool of virtual DUTs (e.g. ASAv,CSR etc.)
When using newly supported SR-IOV, compute, storage and networking resources can be controlled dynamically (e.g by using OpenStack)

The above diagram is an example of one server with two NICS. TRex VMs can be allocated on one NIC while the DUTs can be allocated on another.

Following are some links we used and lessons we learned while putting up an environment for testing TRex with VF interfaces (using SR-IOV).
This is by no means a full toturial of VF usage, and different Linux distributions might need slightly different handling.

7.6.2. Linux configuration

First, need to verify BIOS support for the feature (make sure VT-d is enabled.)
Second, need to make sure you have the correct kernel options. (see links above)
In our regression with SR-IOV (Cisco UCS, Intel CPU, host OS: CentOS), we have following configs:

7.6.3. x710 specific instructions

For x710 (i40e driver), we needed to download latest kernel driver. On all distributions we were using, existing driver was not new enough.
To make the system use your new compiled driver with the correct parameters:
Copy the .ko file to /lib/modules/Your kernel version as seen by uname -r/kernel/drivers/net/ethernet/intel/i40e/i40e.ko

7.6.4. 82599 specific instructions

In order to make VF interfaces work correctly, we had to increase mtu on related PF interfaces.
For example, if you run with max_vfs=1,1 (one VF per PF), you will have something like this:

Using the following command, running on x710 card with VF driver, we can see that TRex can reach 30GBps, using only one core. We can also see that the average latency is around 20 usec, which is pretty much the same value we get on loopback ports with x710 physical function without VF.

7.6.5. Performance

7.7. Mellanox ConnectX-4/5 support

Mellanox ConnectX-4/5 adapter family supports 100/56/40/25/10 Gb/s Ethernet speeds.
Its DPDK support is a bit different from Intel DPDK support, more information can be found here.
Intel NICs do not require additional kernel drivers (except for igb_uio which is already supported in most distributions). ConnectX-4 works on top of Infiniband API (verbs) and requires special kernel modules/user space libs.
This means that it is required to install OFED package to be able to work with this NIC.
Installing the full OFED package is the simplest way to make it work (trying to install part of the package can work too but didn’t work for us).
The advantage of this model is that you can control it using standard Linux tools (ethtool and ifconfig will work).
The disadvantage is the OFED dependency.

7.7.1. Installation

7.7.2. Install Linux

We tested the following distro with TRex and OFED. Others might work too.

7.7.4. TRex specific implementation details

TRex uses flow director filter to steer specific packets to specific queues.
To support that, we change IPv4.TOS/Ipv6.TC LSB to 1 for packets we want to handle by software (Other packets will be dropped). So latency packets will have this bit turned on (This is true for all NIC types, not only for ConnectX-4).
This means taht if the DUT for some reason clears this bit (change TOS LSB to 0, e.g. change it from 0x3 to 0x2 for example) some TRex features (latency measurement for example) will not work properly.

7.7.5. Which NIC to buy?

NIC with two ports will work better from performance prospective, so it is better to have MCX456A-ECAT(dual 100gb ports) and not the MCX455A-ECAT (single 100gb port).

7.7.7. Performance Cycles/Packet ConnectX-4 vs Intel XL710

For TRex version v2.11, these are the comparison results between XL710 and ConnectX-4 for various scenarios.

Stateless MPPS/Core [Preliminary]

Stateless Gb/Core [Preliminary]

Comments

MLX5 can reach 50MPPS while XL710 is limited to 35MPPS. (With potential future fix it will be 65MPPS)

with IMIX you should reach ~90Gb/sec (not 100Gb/sec) with one port (Total bandwidth for both ports is 100Gb/Sec)

For Stateless/Stateful 256B profiles, ConnectX-4 uses half of the CPU cycles per packet. ConnectX-4 probably can handle in a better way chained mbufs (scatter gather).

In the average stateful scenario, ConnectX-4 is the same as XL710.

For Stateless 64B/IMIX profiles, ConnectX-4 uses 50-90% more CPU cycles per packet (it is actually even more because there is the TRex scheduler overhead) - it means that in worst case scenario, you will need x2 CPU for the same total MPPS.

There is a task to automate the production of thess reports

7.7.8. Troubleshooting

Before running TRex make sure the commands ibv_devinfo and ibdev2netdev present the NICS

ifconfig should work too, you need to be able to ping from those ports

run TRex server with -v 7 for example sudo ./t-rex-64 -i -v 7

In case the link_layer is not set to Ethernet you should run this command

7.8. Cisco VIC support

Supported from TRex version v2.12.

Since version 2.21, all VIC card types supported by DPDK are supported by TRex, using “--software” command line argument.
Notice that if using “--software”, no HW assist is used, causing supported packet rate to be much lower.
Since we do not have all cards in our lab, we could not test all of them. Will be glad for feedback on this (good or bad).

If not using “--software”, following limitations apply:

Only 1300 series Cisco adapter supported.

Must have VIC firmware version 2.0(13) for UCS C-series servers. Will be GA in Febuary 2017.

Must have VIC firmware version 3.1(2) for blade servers (which supports more filtering capabilities).

The feature can be enabled via Cisco CIMC or USCM with the advanced filters radio button. When enabled, these additional flow director modes are available:
RTE_ETH_FLOW_NONFRAG_IPV4_OTHER
RTE_ETH_FLOW_NONFRAG_IPV4_SCTP
RTE_ETH_FLOW_NONFRAG_IPV6_UDP
RTE_ETH_FLOW_NONFRAG_IPV6_TCP
RTE_ETH_FLOW_NONFRAG_IPV6_SCTP
RTE_ETH_FLOW_NONFRAG_IPV6_OTHER

7.8.1. vNIC Configuration Parameters

Number of Queues

The maximum number of receive queues (RQs), work queues (WQs) and completion queues (CQs) are configurable on a per vNIC basis through the Cisco UCS Manager (CIMC or UCSM).
These values should be configured as follows:

The number of WQs should be greater or equal to the number of threads (-c value) plus 1

The number of RQs should be greater than 5

The number of CQs should set to WQs + RQs

Unless there is a lack of resources due to creating many vNICs, it is recommended that the WQ and RQ sizes be set to the maximum.

7.8.2. Limitations/Issues

VLAN 0 Priority Tagging
If a vNIC is configured in TRUNK mode by the UCS manager, the adapter will priority tag egress packets according to 802.1Q if they were not already VLAN tagged by software. If the adapter is connected to a properly configured switch, there will be no unexpected behavior.
In test setups where an Ethernet port of a Cisco adapter in TRUNK mode is connected point-to-point to another adapter port or connected though a router instead of a switch, all ingress packets will be VLAN tagged. TRex can work with that see more upstream VIC
Upstream the VIC always tags packets with an 802.1p header.In downstream it is possible to remove the tag (not supported by TRex yet)

7.9. More active flows

From version v2.13 there is a new Stateful scheduler that works better in the case of high concurrent/active flows.
In case of EMIX 70% better performance was observed.
In this tutorial there are 14 DP cores & up to 8M flows.
There is a special config file to enlarge the number of flows. This tutorial present the difference in performance between the old scheduler and the new.