Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion surrounding range commands #295

Open
vaastav opened this issue Nov 14, 2024 · 29 comments
Open

Confusion surrounding range commands #295

vaastav opened this issue Nov 14, 2024 · 29 comments

Comments

@vaastav
Copy link

vaastav commented Nov 14, 2024

Hi, I am trying to use the range commands to control the number of unique flows for my workload.
This is the lua script I have right now

package.path = package.path ..";?.lua;test/?.lua;app/?.lua;../?.lua"

require "Pktgen";
pkt_size = 64;
local dstip = "10.10.0.100";
local srcip = "10.10.0.101";
local netmask = "/24";
sendport = 0;
recvport = 0;


printf("\nStarting Experiment!!!\n");
print("Pkt_size is", pkt_size, "\n");
pktgen.set("all", "size", pkt_size)
--pktgen.start("0");
pktgen.set("all", "sport", 0x5678);
pktgen.set("all", "dport", 0x1234);
pktgen.set_ipaddr("all", "src", srcip..netmask);
pktgen.range.dst_ip("0", "start", "10.12.0.1");
pktgen.range.dst_ip("0", "inc", "0.0.0.1");
pktgen.range.dst_ip("0", "min", "10.12.0.1");
pktgen.range.dst_ip("0", "max", "10.12.0.64");

pktgen.set_range("0", "on");
pktgen.start("0");
pktgen.delay(10000);
pktgen.stop("0");
pktgen.quit();

My confusion is that when I execute this workload, the source port and destination port do not remain static. But the src ip remains static but the destination ip address seems to also exceed the range.
Have I configured the range commands incorrectly? Any help would be appreciated.

@KeithWiles
Copy link
Collaborator

The script is mixing single mode calls and range calls. The page range command will let you see the configuration. The destination IP address defaults to increment of 0.0.0.1 and source IP address increment defaults to 0.0.0.0, these need to be set up correctly for you use case. Here is an updated script, please let me know if this does not work and most likely did not get the exact use case you wanted.

package.path = package.path ..";?.lua;test/?.lua;app/?.lua;../?.lua"

require "Pktgen";
pkt_size = 64;

local dstip     = "10.12.0.1";
local min_dstip = "10.12.0.1";
local max_dstip = "10.12.0.64";
local inc_dstip = "0.0.0.1"

local srcip     = "10.12.0.101";
local min_srcip = "10.12.0.101";
local max_srcip = "10.12.0.101";
local inc_srcip = "0.0.0.0"

local dst_port     = 5678
local inc_dst_port = 0;
local min_dst_port = dst_port
local max_dst_port = dst_port;

local src_port     = 1234
local inc_src_port = 0;
local min_src_port = src_port
local max_src_port = src_port;

local netmask = "/24";
sendport = 0;
recvport = 0;


printf("\nStarting Experiment!!!\n");
print("Pkt_size is", pkt_size, "\n");

pktgen.set("all", "size", pkt_size)

pktgen.page("range")

pktgen.range.dst_port("all", "start", dst_port);
pktgen.range.dst_port("all", "inc", inc_dst_port);
pktgen.range.dst_port("all", "min", min_dst_port);
pktgen.range.dst_port("all", "max", max_dst_port);

pktgen.range.src_port("all", "start", src_port);
pktgen.range.src_port("all", "inc", inc_src_port);
pktgen.range.src_port("all", "min", min_src_port);
pktgen.range.src_port("all", "max", max_src_port);

pktgen.range.dst_ip("all", "start", dstip);
pktgen.range.dst_ip("all", "inc", inc_dstip);
pktgen.range.dst_ip("all", "min", min_dstip);
pktgen.range.dst_ip("all", "max", max_dstip);

pktgen.range.src_ip("all", "start", srcip);
pktgen.range.src_ip("all", "inc", inc_srcip);
pktgen.range.src_ip("all", "min", srcip);
pktgen.range.src_ip("all", "max", srcip);

pktgen.set_range("0", "on");

@vaastav
Copy link
Author

vaastav commented Nov 14, 2024

Hi, thanks for the quick response.
This is the output I see when I execute the script:
image

Couple of issues:

  1. I would like this ranged command to cycle through and keep generating packets (like pktgen.start does) until a stop is called. At the moment, this script finishes quickly after sending 64 packets I think.
  2. The IP address shown still doesn't seem to match the ip that was set.

Ideally, I want to be able to execute a workload at line rate for a fixed period of time. Normally, I do this using

pktgen.start("0")
pktgen.delay(60000) -- Sleep for 60 seconds so that we generate packets at line rate for 1 minute.
pktgen.stop("0")

But, I want the additional restriction of ensuring that the flow information of the packet ranges between pre-determined values. For example,
Packet 1: 10.12.0.101:5678 -> 10.12.0.1:1234
...
Packet 64: 10.12.0.101:5678 -> 10.12.0.64:1234
Packet 65: 10.12.0.101:5678 -> 10.12.0.1:1234
...
Packet 10million: 10.12.0.101:5678 -> 10.12.0.64:1234

Is it possible to get this type of workload with the range commands or should I try a different type of workload?

@KeithWiles
Copy link
Collaborator

KeithWiles commented Nov 14, 2024

Please update pktgen to latest version 24.10.3 and you may have to update DPDK to the current version as well. DPDK has changed defines and APIs, which makes it difficult to maintain backward compatibility so I only support latest Pktgen and DPDK.

Give that a try and see if that works as the above screen does not look correct for you configuration.

When I have more time today, I will read the rest of the post above.

@KeithWiles
Copy link
Collaborator

When using range command it does cycle thru the packet until you stop sending traffic as long as send forever is set as the TX packet count. Make sure you did not set the TX count to some value.

The only way to get something different then what the range mode supports is to use PCAP file with the traffic you need.

@vaastav
Copy link
Author

vaastav commented Nov 14, 2024

That doesn't seem to be the case when I run the script you sent. It immediately exits.
I haven't tried it with pktgen 24.10 as I can't seem to upgrade dpdk version atm.

@KeithWiles
Copy link
Collaborator

In the script I did not start pktgen sending packets unless you added that code back. After loading the script you can do the start 0 from the command line. If you did add that framement of code and is the latest Pktgen and DPDK it is a bug. Also I only enabled range mode of port 0 not any of the other ports.

@davideandres95
Copy link

davideandres95 commented Jan 2, 2025

Hi,
I am also trying to generate multiple parallel flows using the range commands. However, I am unable to generate a predictable number of flows using the script @KeithWiles provided.
Setting

local dstip     = "10.12.0.1";
local min_dstip = "10.12.0.1";
local max_dstip = "10.12.0.255";
local inc_dstip = "0.0.0.1";

only results in 222 different flows (expected 256), no matter how long I let it run. To count the number of flows, I am using external HW.
I am running -- Pktgen 24.10.3 Powered by DPDK 24.11.0-rc3.

I can only suspect of the burst size having something to do, but I am unsure how to fix this.

@KeithWiles
Copy link
Collaborator

If you try to increase the number of flow to a few more flows do you get a different count?
Can you determine which flows are missing?

@davideandres95
Copy link

When running again configured for 256 flows, I get 224 this time; 32 are missing. The list is as follows:

10.12.0.1
10.12.0.2
10.12.0.3
10.12.0.4
10.12.0.5
10.12.0.6
10.12.0.7
10.12.0.8
10.12.0.9
10.12.0.10
10.12.0.11
10.12.0.12
10.12.0.13
10.12.0.14
10.12.0.15
10.12.0.16
10.12.0.17
10.12.0.18
10.12.0.19
10.12.0.20
10.12.0.21
10.12.0.22
10.12.0.23
10.12.0.24
10.12.0.25
10.12.0.26
10.12.0.27
10.12.0.28
10.12.0.29
10.12.0.30
10.12.0.31
10.12.0.32
10.12.0.33
10.12.0.34
--missing--
10.12.0.36
10.12.0.37
10.12.0.38
10.12.0.39
10.12.0.40
10.12.0.41
10.12.0.42
10.12.0.43
10.12.0.44
10.12.0.45
10.12.0.46
10.12.0.47
10.12.0.48
10.12.0.49
10.12.0.50
10.12.0.51
10.12.0.52
10.12.0.53
10.12.0.54
10.12.0.55
10.12.0.56
10.12.0.57
10.12.0.58
10.12.0.59
10.12.0.60
10.12.0.61
10.12.0.62
10.12.0.63
10.12.0.64
10.12.0.65
10.12.0.66
10.12.0.67
--missing x 30--
10.12.0.97
10.12.0.98
10.12.0.99
10.12.0.100
10.12.0.101
10.12.0.102
10.12.0.103
10.12.0.104
10.12.0.105
10.12.0.106
10.12.0.107
10.12.0.108
10.12.0.109
10.12.0.110
10.12.0.111
10.12.0.112
10.12.0.113
10.12.0.114
10.12.0.115
10.12.0.116
10.12.0.117
10.12.0.118
10.12.0.119
10.12.0.120
10.12.0.121
10.12.0.122
10.12.0.123
10.12.0.124
10.12.0.125
10.12.0.126
10.12.0.127
10.12.0.128
--missing--
10.12.0.130
10.12.0.131
10.12.0.132
10.12.0.133
10.12.0.134
10.12.0.135
10.12.0.136
10.12.0.137
10.12.0.138
10.12.0.139
10.12.0.140
10.12.0.141
10.12.0.142
10.12.0.143
10.12.0.144
10.12.0.145
10.12.0.146
10.12.0.147
10.12.0.148
10.12.0.149
10.12.0.150
10.12.0.151
10.12.0.152
10.12.0.153
10.12.0.154
10.12.0.155
10.12.0.156
10.12.0.157
10.12.0.158
10.12.0.159
10.12.0.160
10.12.0.161
10.12.0.162
10.12.0.163
10.12.0.164
10.12.0.165
10.12.0.166
10.12.0.167
10.12.0.168
10.12.0.169
10.12.0.170
10.12.0.171
10.12.0.172
10.12.0.173
10.12.0.174
10.12.0.175
10.12.0.176
10.12.0.177
10.12.0.178
10.12.0.179
10.12.0.180
10.12.0.181
10.12.0.182
10.12.0.183
10.12.0.184
10.12.0.185
10.12.0.186
10.12.0.187
10.12.0.188
10.12.0.189
10.12.0.190
10.12.0.191
10.12.0.192
10.12.0.193
10.12.0.194
10.12.0.195
10.12.0.196
10.12.0.197
10.12.0.198
10.12.0.199
10.12.0.200
10.12.0.201
10.12.0.202
10.12.0.203
10.12.0.204
10.12.0.205
10.12.0.206
10.12.0.207
10.12.0.208
10.12.0.209
10.12.0.210
10.12.0.211
10.12.0.212
10.12.0.213
10.12.0.214
10.12.0.215
10.12.0.216
10.12.0.217
10.12.0.218
10.12.0.219
10.12.0.220
10.12.0.221
10.12.0.222
10.12.0.223
10.12.0.224
10.12.0.225
10.12.0.226
10.12.0.227
10.12.0.228
10.12.0.229
10.12.0.230
10.12.0.231
10.12.0.232
10.12.0.233
10.12.0.234
10.12.0.235
10.12.0.236
10.12.0.237
10.12.0.238
10.12.0.239
10.12.0.240
10.12.0.241
10.12.0.242
10.12.0.243
10.12.0.244
10.12.0.245
10.12.0.246
10.12.0.247
10.12.0.248
10.12.0.249
10.12.0.250
10.12.0.251
10.12.0.252
10.12.0.253
10.12.0.254
10.12.0.255

Increasing the number of flows using

dstip = "10.12.0.1";
min_dstip = "10.12.0.1";
max_dstip = "10.12.1.4";
inc_dstip = "0.0.0.1";

results in 216 flows. Please let me know if you also want the list.

@KeithWiles
Copy link
Collaborator

I copied the above list into VIM and removed the missing lines and that gives me 225 lines in the file. If I calculated the number of flows it would be 260 flows total and it is missing 36 flows 0.35, 0.68-0.96, 0.129 and 1.0-1.4. Check my math here.

You can try setting the burst size to 1 and see if that gives a different answer or different missing flows.
We also need to verify the receiving machine is not dropping packets due to missed RX packets.

@davideandres95
Copy link

Sorry for the confusion, but the list is for the previous config

local dstip     = "10.12.0.1";
local min_dstip = "10.12.0.1";
local max_dstip = "10.12.0.255";
local inc_dstip = "0.0.0.1";

So only 31 are missing (0.35, 0.68-0.96, 0.129).

Some additional information:

  1. I am receiving all packets that are being sent, so no packets are being dropped.
  2. The tx traffic passes through an Intel Tofino with a P4 program that counts any flow with 3+ packets and sends a digest with the flow information (this is the source of the list).
  3. The HW register in the Tofino where packets are counted matches the number of entries in the list.
  4. The runtime does not affect the result. Though, a possible explanation could be that for those missing flows less than 3 packets are sent.
  5. Setting the burst size to 1, results in even less flows (111)

@KeithWiles
Copy link
Collaborator

With that few of flows being generated can you use something like wireshark to capture say a 1000 packets and see if we can count the flows within this small number of packets. I know it is some work, but if the switch is not counting flows with less than 3 packets I worry pktgen is sending the packets and the count is throwing us off.

The code is straight forward as it increments the IP address. The only other place packets can be dropped is when transmitting packets the TX queue in the NIC hardware is becoming full. If I remember correctly I attempt to resend the part of the burst not sent the first time and continue to attempt sending all of the burst until all packets in the burst are sent.

Maybe I have something wrong when the TX burst send returns with less then a burst being sent, which would cause missing some flows. The problem there is if you send for a long time it should resend the flows. The way pktgen (and DPDK) work is packets are consumed from the pre-built pktmbuf mempool, but packets can be placed in the per lcore cache and resented and not be able to send all of the packets as the packets are being pulled from the cache over and over without getting to the other flows. The standard TX mempool has the cache turned on and this could be causing the problem.

Sorry I had to write this quickly before a meeting. I hope you get the idea.

@davideandres95
Copy link

Thank you for your support and explanations, @KeithWiles.
At the moment, I don't have physical access to the testbed right now, and I am using a single physical port, so I have trouble using both DPDK to send and tcpdump to capture. I have tried capturing with Pktgen, but it only captures a few packets:

Pktgen:/> disable 0 capture
Used 42696, count 664

    Dumping ~0.04MB of captured data to disk: 0%...
>>> Hit packet length zero at 46 of 664

I had sent 1000 packets.

However, I don't believe that the switch is misguiding us. I have verified that it is reliably counting flows when sending 1000 flows sequentially using Scapy. I have also improved it's monitoring to register any flow, even if it has only one or two packets. The results show the same results as before.

I am going to start looking at the source code, but I have little experience with DPDK. I will share any finding.

@KeithWiles
Copy link
Collaborator

Looking at the code for capture it should not find a zero length packet. If you want please change the line in file ptkgen/app/pktgen-capture.c lines 180-184 to the following:

            for (uint32_t i = 0; i < cap->nb_pkts; i++) {
                if (hdr->pkt_len == 0) {
                    printf("\n>>> Hit packet length zero at %'u of %'u skipping\n", i, cap->nb_pkts);
                    continue;
                }

Allow the dump to continue skipping the zero length packet. I have not tried this change.

@davideandres95
Copy link

The suggested patch does not solve the capturing problem:

Used 45288, count 710                                                                                                                                                                         
                                                                                                                                                                                              
    Dumping ~0.04MB of captured data to disk: 0%...                                            
>>> Hit packet length zero at 42 of 710 skipping                                               
                                                                                               
>>> Hit packet length zero at 43 of 710 skipping                                               
                                                                                               
>>> Hit packet length zero at 44 of 710 skipping                                               
                                                                                               
>>> Hit packet length zero at 45 of 710 skipping

[ ... ]

>>> Hit packet length zero at 707 of 710 skipping                                              
                                                                                               
>>> Hit packet length zero at 708 of 710 skipping                                              
                                                                                                                                                                                              
>>> Hit packet length zero at 709 of 710 skipping

The stats show:

/ <Main Page> Ports 0-0 of 1  Copyright(c) <2010-2024>, Intel Corporation                                                                                                             [0/1942]
Port:Flags          :   0:P------       Range                                                                                                                                                 
Link State          :          <UP-100000-FD>        ---Total Rate---                                                                                                                         
Pkts/s Rx           :                       0                       0                          
       Tx           :                       0                       0                                                                                                                         
MBits/s Rx/Tx       :                     0/0                     0/0                          
Total Rx Pkts       :                    1000                     904                                                                                                                         
      Tx Pkts       :                    1000                     904                                                                                                                         
      Rx/Tx MBs     :                     0/0                                                  
Pkts/s Rx Max       :                     904                                                  
       Tx Max       :                     904                                                  
Errors Rx/Tx        :                     0/0                                                  
Broadcast           :                       0                                                  
Multicast           :                       0                                                  
Sizes 64            :                    1000                                                
      65-127        :                       0                                                  
      128-255       :                       0                                                  
      256-511       :                       0                                                  
      512-1023      :                       0                                                                                                                                                 
      1024-1518     :                       0                                                                                                                                                 
Runts/Jumbos        :                     0/0                                                                                                                                                 
ARP/ICMP Pkts       :                     0/0                                                  
Tx Count/% Rate     :            1000 /0.001%

If it helps somehow, the NIC is a Mellanox Connect-5.

@KeithWiles
Copy link
Collaborator

I do not have a Connect-5 to test this problem. I was able to setup a test sending range packets to wireshark. I sent 1000 packets and only saw 965 packets in wireshark :-(

Wireshark is not able to keep up with the transmit rate and is dropping packets. I change the TX count to 512 and it was able to capture all 512 packets.

I then converted it to a text files and removed duplicate IP addresses. The total number of packets in my test is 254 and it sent 254 different flows.

192.168.1.001
192.168.1.002
192.168.1.003
192.168.1.004
192.168.1.005
192.168.1.006
192.168.1.007
192.168.1.008
192.168.1.009
192.168.1.010
192.168.1.011
192.168.1.012
192.168.1.013
192.168.1.014
192.168.1.015
192.168.1.016
192.168.1.017
192.168.1.018
192.168.1.019
192.168.1.020
192.168.1.021
192.168.1.022
192.168.1.023
192.168.1.024
192.168.1.025
192.168.1.026
192.168.1.027
192.168.1.028
192.168.1.029
192.168.1.030
192.168.1.031
192.168.1.032
192.168.1.033
192.168.1.034
192.168.1.035
192.168.1.036
192.168.1.037
192.168.1.038
192.168.1.039
192.168.1.040
192.168.1.041
192.168.1.042
192.168.1.043
192.168.1.044
192.168.1.045
192.168.1.046
192.168.1.047
192.168.1.048
192.168.1.049
192.168.1.050
192.168.1.051
192.168.1.052
192.168.1.053
192.168.1.054
192.168.1.055
192.168.1.056
192.168.1.057
192.168.1.058
192.168.1.059
192.168.1.060
192.168.1.061
192.168.1.062
192.168.1.063
192.168.1.064
192.168.1.065
192.168.1.066
192.168.1.067
192.168.1.068
192.168.1.069
192.168.1.070
192.168.1.071
192.168.1.072
192.168.1.073
192.168.1.074
192.168.1.075
192.168.1.076
192.168.1.077
192.168.1.078
192.168.1.079
192.168.1.080
192.168.1.081
192.168.1.082
192.168.1.083
192.168.1.084
192.168.1.085
192.168.1.086
192.168.1.087
192.168.1.088
192.168.1.089
192.168.1.090
192.168.1.091
192.168.1.092
192.168.1.093
192.168.1.094
192.168.1.095
192.168.1.096
192.168.1.097
192.168.1.098
192.168.1.099
192.168.1.100
192.168.1.101
192.168.1.102
192.168.1.103
192.168.1.104
192.168.1.105
192.168.1.106
192.168.1.107
192.168.1.108
192.168.1.109
192.168.1.110
192.168.1.111
192.168.1.112
192.168.1.113
192.168.1.114
192.168.1.115
192.168.1.116
192.168.1.117
192.168.1.118
192.168.1.119
192.168.1.120
192.168.1.121
192.168.1.122
192.168.1.123
192.168.1.124
192.168.1.125
192.168.1.126
192.168.1.127
192.168.1.128
192.168.1.129
192.168.1.130
192.168.1.131
192.168.1.132
192.168.1.133
192.168.1.134
192.168.1.135
192.168.1.136
192.168.1.137
192.168.1.138
192.168.1.139
192.168.1.140
192.168.1.141
192.168.1.142
192.168.1.143
192.168.1.144
192.168.1.145
192.168.1.146
192.168.1.147
192.168.1.148
192.168.1.149
192.168.1.150
192.168.1.151
192.168.1.152
192.168.1.153
192.168.1.154
192.168.1.155
192.168.1.156
192.168.1.157
192.168.1.158
192.168.1.159
192.168.1.160
192.168.1.161
192.168.1.162
192.168.1.163
192.168.1.164
192.168.1.165
192.168.1.166
192.168.1.167
192.168.1.168
192.168.1.169
192.168.1.170
192.168.1.171
192.168.1.172
192.168.1.173
192.168.1.174
192.168.1.175
192.168.1.176
192.168.1.177
192.168.1.178
192.168.1.179
192.168.1.180
192.168.1.181
192.168.1.182
192.168.1.183
192.168.1.184
192.168.1.185
192.168.1.186
192.168.1.187
192.168.1.188
192.168.1.189
192.168.1.190
192.168.1.191
192.168.1.192
192.168.1.193
192.168.1.194
192.168.1.195
192.168.1.196
192.168.1.197
192.168.1.198
192.168.1.199
192.168.1.200
192.168.1.201
192.168.1.202
192.168.1.203
192.168.1.204
192.168.1.205
192.168.1.206
192.168.1.207
192.168.1.208
192.168.1.209
192.168.1.210
192.168.1.211
192.168.1.212
192.168.1.213
192.168.1.214
192.168.1.215
192.168.1.216
192.168.1.217
192.168.1.218
192.168.1.219
192.168.1.220
192.168.1.221
192.168.1.222
192.168.1.223
192.168.1.224
192.168.1.225
192.168.1.226
192.168.1.227
192.168.1.228
192.168.1.229
192.168.1.230
192.168.1.231
192.168.1.232
192.168.1.233
192.168.1.234
192.168.1.235
192.168.1.236
192.168.1.237
192.168.1.238
192.168.1.239
192.168.1.240
192.168.1.241
192.168.1.242
192.168.1.243
192.168.1.244
192.168.1.245
192.168.1.246
192.168.1.247
192.168.1.248
192.168.1.249
192.168.1.250
192.168.1.251
192.168.1.252
192.168.1.253
192.168.1.254

I did disable incrementing the src/dst ID as wireshark was trying to figure out the packet based on port ID.

######################### Port  0 ##################################
#
# Port:  0, Burst (Rx/Tx): 64/ 32, Rate:100%, Flags:00001000, TX Count:Forever
#           Sequence count:0, Prime:1 VLAN ID:0001, Link: <UP-40000-FD>
#
# Set up the primary port information:
set 0 count 512
set 0 size 64
set 0 rate 100
set 0 rxburst 64
set 0 txburst 32
set 0 sport 1234
set 0 dport 5678
set 0 prime 1
set 0 type ipv4
set 0 proto tcp
set 0 dst ip 192.168.1.1
set 0 src ip 192.168.0.1/24
set 0 tcp flags ack
set 0 tcp seq 74616
set 0 tcp ack 74640
set 0 dst mac 3c:fd:fe:e4:38:41
set 0 src mac 3c:fd:fe:e4:38:40
set 0 vlan 1

set 0 pattern abc

set 0 jitter 50
disable 0 mpls
range 0 mpls entry 0x0
disable 0 qinq
set 0 qinqids 0 0
disable 0 gre
disable 0 gre_eth
disable 0 vxlan
set 0 vxlan 0x0 0 0
#
# Port flag values:
disable 0 icmp
disable 0 pcap
disable 0 range
disable 0 latency
disable 0 process
disable 0 capture
disable 0 vlan
#
# Range packet information:
range 0 src mac start 3c:fd:fe:e4:38:40
range 0 src mac min 00:00:00:00:00:00
range 0 src mac max 00:00:00:00:00:00
range 0 src mac inc 00:00:00:00:00:00

range 0 dst mac start 3c:fd:fe:e4:38:41
range 0 dst mac min 00:00:00:00:00:00
range 0 dst mac max 00:00:00:00:00:00
range 0 dst mac inc 00:00:00:00:00:00

range 0 src ip start 192.168.0.1
range 0 src ip min 192.168.0.1
range 0 src ip max 192.168.0.254
range 0 src ip inc 0.0.0.0

range 0 dst ip start 192.168.1.1
range 0 dst ip min 192.168.1.1
range 0 dst ip max 192.168.1.254
range 0 dst ip inc 0.0.0.1

range 0 proto tcp

range 0 src port start 1234
range 0 src port min 0
range 0 src port max 65535
range 0 src port inc 0

range 0 dst port start 5678
range 0 dst port min 0
range 0 dst port max 65535
range 0 dst port inc 0

range 0 tcp flags ack

range 0 tcp seq start 74616
range 0 tcp seq min 0
range 0 tcp seq max 536870911
range 0 tcp seq inc 0

range 0 tcp ack start 74640
range 0 tcp ack min 0
range 0 tcp ack max 536870911
range 0 tcp ack inc 0

range 0 ttl start 64
range 0 ttl min 0
range 0 ttl max 255
range 0 ttl inc 0

range 0 vlan start 1
range 0 vlan min 1
range 0 vlan max 4095
range 0 vlan inc 0

range 0 cos start 0
range 0 cos min 0
range 0 cos max 7
range 0 cos inc 0

range 0 tos start 0
range 0 tos min 0
range 0 tos max 255
range 0 tos inc 0
range 0 gre key 0

range 0 size start 64
range 0 size min 64
range 0 size max 1518
range 0 size inc 0

#
# Set up the sequence data for the port.
set 0 seq_cnt 0

@davideandres95
Copy link

I am relieved to see that it works on other systems. Thank you for testing it. I will try the same configuration you used and report back my observations.

Do you think that the NIC config or hugepages setup could be causing this misbehavior? Sorry, my systems' knowledge is limited.

To start Pktgen I use the following command:

sudo -E ./builddir/app/pktgen -l 2,3-8 -n 1 --proc-type auto --log-level 7 --file-prefix pg_tcp -a 81:00.0 --huge-dir /mnt/huge -- -v -T -P -m [3-5:6-8].0 -f themes/black-yellow.theme -G

And I have configured 1G-size hugepages on /mnt/huge.

I am trying to start the testpmd application to help me debug with the info from the show port stats all command, but I am unable to start it correctly:

~/Pktgen-DPDK$ sudo dpdk-testpmd -l 0,9-10 -n 1 --proc-type=auto --log-level=7 --file-prefix=pg_monitor -a 81:00.0 --huge-dir=/mnt/huge 
EAL: Detected CPU lcores: 48
EAL: Detected NUMA nodes: 2
EAL: Auto-detected process type: PRIMARY
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/pg_monitor/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: 114 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: VFIO support initialized
mlx5_net: No available register for sampler.
testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc

Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.

Configuring Port 0 (socket 0)
Port 0: 08:C0:EB:48:75:06
Checking link statuses...
Done
No commandline core given, start packet forwarding
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 9 (socket 0) forwards packets on 1 streams:
  RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00

  io packet forwarding packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=1
  port 0: RX queue number: 1 Tx queue number: 1
    Rx offloads=0x0 Tx offloads=0x10000
    RX queue: 0
      RX desc=256 - RX free threshold=64
      RX threshold registers: pthresh=0 hthresh=0  wthresh=0
      RX Offloads=0x0
    TX queue: 0
      TX desc=256 - TX free threshold=0
      TX threshold registers: pthresh=0 hthresh=0  wthresh=0
      TX offloads=0x10000 - TX RS bit threshold=0
Press enter to exit

Is there any debug mode or traces that I could enable to help me debug the Pktgen behavior?

Ultimately, I would like to use Pktgen as a traffic generator for my research on efficient controllers for stateful NFs running on P4-programmable HW. Therefore, I want to generate high-speed traffic with many parallel flows.

@KeithWiles
Copy link
Collaborator

KeithWiles commented Jan 3, 2025

It looks like the pktgen configuration is correct. The only question I have is your NIC is on PCI address 81:00.0 and normally if your PCI address starts with 80.00.0 and above this means it is attached to PCI bus 1 addresses 7f:00.0 and below are on PCI bus 0. In a dual socket (CPU = Socket) system this normally means that PCI bus is attached to Socket 1 (numbered 0, 1). You can run dpdp/usertools/cpu_layout.py to see your core to socket layout.

On my machine I use af:00.0 and af:00.1 which is attached to PCI bus 1 and busd one is attached to Socket 1. This means all of the core I need to use for that port must be from socket 1. You can install sudo apt install hwloc and then run lstopo which creates a graphic window showing the system configuration.

If you use core from socket 0 you will have a bottleneck across the QPI bus between the sockets. If you only have a single CPU then it does not matter.

For test-pmd I use Pktgen, so not an expert on using test-pmd, but you can add --interactive on the command line to get a CLI prompt. sudo dpdk-testpmd -l 0,9-10 -n 1 --proc-type=auto --log-level=7 --file-prefix=pg_monitor -a 81:00.0 --huge-dir=/mnt/huge -- --interactive note the -- is two dashes and then you provide test-pmd arguments before the -- is for DPDK arguments if that was not obvious. Using ? at the CLI prompt will list commands you can use, but you will have google the docs at dpdk.org.

@davideandres95
Copy link

Hi Keith,
this system is somehow unusual, it has two NUMA nodes, though a single socket. As far as I know, the reason is that it has two memory management modules, each for the set of memory banks left and right of the CPU. However, the system has now only two 64GB DIMMS connected to NUMA #0. I am aware that a more balanced memory distribution would improve performance. But I don't think this can explain the behavior I am seeing.

$ lstopo
Machine (125GB total) + Package L#0                                                            
  Group0 L#0          
    NUMANode L#0 (P#0 125GB)  
    L3 L#0 (16MB)                                                                              
      L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
        PU L#0 (P#0)          
        PU L#1 (P#24)        
      L2 L#1 (512KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1                              
        PU L#2 (P#1)  
        PU L#3 (P#25)     
      L2 L#2 (512KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2                              
        PU L#4 (P#2)      
        PU L#5 (P#26) 
    L3 L#1 (16MB)                                                                              
      L2 L#3 (512KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3
        PU L#6 (P#3)         
        PU L#7 (P#27)         
      L2 L#4 (512KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4                              
        PU L#8 (P#4)         
        PU L#9 (P#28)         
      L2 L#5 (512KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5                              
        PU L#10 (P#5)         
        PU L#11 (P#29)
    L3 L#2 (16MB)                                                                              
      L2 L#6 (512KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6
        PU L#12 (P#6) 
        PU L#13 (P#30)
      L2 L#7 (512KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7
        PU L#14 (P#7)         
        PU L#15 (P#31)  
      L2 L#8 (512KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8    
        PU L#16 (P#8)   
        PU L#17 (P#32)
    L3 L#3 (16MB)                                                                              
      L2 L#9 (512KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9
        PU L#18 (P#9)                   
        PU L#19 (P#33)
      L2 L#10 (512KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
        PU L#20 (P#10)
        PU L#21 (P#34)
      L2 L#11 (512KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
        PU L#22 (P#11)
        PU L#23 (P#35)
    HostBridge                                                                                                           
      PCIBridge
        PCI 81:00.0 (Ethernet)
          Net "enp129s0f0np0"
          OpenFabrics "mlx5_0"
        PCI 81:00.1 (Ethernet)
          Net "enp129s0f1np1"
          OpenFabrics "mlx5_1"
      PCIBridge
        PCI 84:00.0 (SATA)
      PCIBridge
        PCI 85:00.0 (SATA)
    HostBridge
      PCIBridge
        PCI c4:00.0 (Ethernet)
          Net "enp196s0f0np0"
          OpenFabrics "mlx5_2"
        PCI c4:00.1 (Ethernet)
          Net "enp196s0f1np1"
          OpenFabrics "mlx5_3"
      PCIBridge
        PCI c1:00.0 (Ethernet)
          Net "eno1"
        PCI c1:00.1 (Ethernet)
          Net "eno2"
      PCIBridge
        PCIBridge
          PCI c3:00.0 (VGA)
  Group0 L#1
    L3 L#4 (16MB)
      L2 L#12 (512KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
        PU L#24 (P#12)
        PU L#25 (P#36)
      L2 L#13 (512KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
        PU L#26 (P#13)
        PU L#27 (P#37)
      L2 L#14 (512KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
        PU L#28 (P#14)
        PU L#29 (P#38)
L3 L#5 (16MB)
      L2 L#15 (512KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
        PU L#30 (P#15)
        PU L#31 (P#39)
      L2 L#16 (512KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16
        PU L#32 (P#16)
        PU L#33 (P#40)
      L2 L#17 (512KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17
        PU L#34 (P#17)
        PU L#35 (P#41)
    L3 L#6 (16MB)
      L2 L#18 (512KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18
        PU L#36 (P#18)
        PU L#37 (P#42)
      L2 L#19 (512KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19
        PU L#38 (P#19)
        PU L#39 (P#43)
      L2 L#20 (512KB) + L1d L#20 (32KB) + L1i L#20 (32KB) + Core L#20
        PU L#40 (P#20)
        PU L#41 (P#44)
    L3 L#7 (16MB)
      L2 L#21 (512KB) + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21
        PU L#42 (P#21)
        PU L#43 (P#45)
      L2 L#22 (512KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22
        PU L#44 (P#22)
        PU L#45 (P#46)
      L2 L#23 (512KB) + L1d L#23 (32KB) + L1i L#23 (32KB) + Core L#23
        PU L#46 (P#23)
        PU L#47 (P#47)
    HostBridge
      PCIBridge
        PCI 02:00.0 (Ethernet)
          Net "enp2s0f0"
        PCI 02:00.1 (Ethernet)
          Net "enp2s0f1"
      PCIBridge
        PCI 01:00.0 (RAID)
          Block(Disk) "sda"

Regarding the last configuration you provided (I adapted the MAC addresses), I have tested it without successful results. It seems that the rate has an effect on the number of flows. The best results are achieved with a rate of 0.0001%; a 100% rate results in a single flow.

Also, I have not been able to debug the problem using the testpmd. The statistics are empty:

sudo dpdk-testpmd -l 0,9-10 -n 1 --proc-type=auto --log-level=7 --file-prefix=pg_monitor -a 81:00.0 --huge-dir=/mnt/huge -- --nb-cores=2 --interactive
EAL: Detected CPU lcores: 48                                                                  
EAL: Detected NUMA nodes: 2                                                                   
EAL: Auto-detected process type: PRIMARY                                                      
EAL: Detected static linkage of DPDK                                                          
EAL: Multi-process socket /var/run/dpdk/pg_monitor/mp_socket                                  
EAL: Selected IOVA mode 'VA'                                                                  
EAL: VFIO support initialized                                                                 
mlx5_net: No available register for sampler.                                                  
Interactive-mode selected                                                                     
testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0                                                                                                                    
testpmd: preferred mempool ops selected: ring_mp_mc                                           
                                                                                              
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
                                                                                              
Configuring Port 0 (socket 0)                                                                 
Port 0: 08:C0:EB:48:75:06                                                                     
Checking link statuses...                                                                     
Done                                                                                          
testpmd> show port stats all                                                                  
                                                                                              
  ######################## NIC statistics for port 0  ########################                                                                                                                
  RX-packets: 0          RX-missed: 0          RX-bytes:  0                                   
  RX-errors: 0                                                                                
  RX-nombuf:  0                                                                               
  TX-packets: 0          TX-errors: 0          TX-bytes:  0                                   
                                                                                              
  Throughput (since last show)                                                                
  Rx-pps:            0          Rx-bps:            0                                          
  Tx-pps:            0          Tx-bps:            0                                          
  ############################################################################                                                                                                                

In the light of the differences when using different rates, could it be that this NIC requires a specific configuration to behave addequately?
Is there any additional information, or experiment that I can run to help us diagnose the root cause?
Thank you in advance.

@KeithWiles
Copy link
Collaborator

The memory configuration can cause performance issues, but I to think this is not a problem.

I have never used an MLX and I do not know how to diagnose the problem farther. If you can try on a non-MLX device like a XL710 i40e device as I use here then maybe I could help, but at this time I can not.

Sorry.

@davideandres95
Copy link

Some additional information:

  1. I have solved the memory configuration. This is, I have set 4 memory DIMMS for balanced, optimal performance and switched into a single NUMA configuration where all cores are local to the NIC in use. As expected, this didn't solve the issue.
  2. When running the last config you provided, the packet capture works straight away. But I realized it was missing the enable 0 range command. Once this is set, the packet capture fails as before.
  3. I also believe that the number of flows it generates is related to the capturing issue, as the number of flows that I observe matches the number of packets that it captures before throwing the zero length packet.

Will keep working on it...

@KeithWiles
Copy link
Collaborator

KeithWiles commented Jan 8, 2025

For some NICs they detect suspicious frames and drops them before sending the frame. This could be related to the problem you are seeing here. Check to make sure the NIC is not dropping frames it finds suspicious as I know the Intel NICs will do this and it has some configuration options to help turn this off.

@davideandres95
Copy link

The hypothesis on the NIC dropping frames is winning points.
I have enabled SR-IOV and configured two VFs. One to send with Pktgen and another to capture with tcpdump. The Switch in the loop is rewriting the destination MAC address so that the correct VF receives the packets. When I generate traffic in packet mode, tcpdump captures all the packets. However, when doing enable 0 range, tcpdump does not capture any packets. Yet, the packets are still sent, as shown by the switch port statistics.

@KeithWiles
Copy link
Collaborator

Great making progress. Intel calls these packets malicious and it appears this NIC has a similar feature. You will have to see if that feature can be turned off or you have to adjust your range options to make sure the NIC sends the packet. I have seen the Source MAC address not belonging to the NIC will be detected as malicious and dropped.

@davideandres95
Copy link

Hi @KeithWiles,
I am looking again at the config you last used to test Pktgen by sending 512 packets, and I noticed that in the port flags section there is disable 0 range. As far as I understand, this would make Pktgen send identical packets, but you received many flows.

On my side, I have moved back to using the Lua script you provided at the beginning of the thread, since the pkt script causes my NIC not to receive any packets. I might be able to disable this behavior in the future, though.

Can you clarify if you agree with me on the following? Pktgen is iterating over the range parameters in fixed batches until the requested number of packets is correctly transmitted. For some reason, my NIC is dropping/rejecting some of the generated mbufs, causing them to be absent in the transmission. Therefore, additional packets from the other flows are sent until 512 packets are successfully transmitted.

@KeithWiles
Copy link
Collaborator

Sorry I did not enable the range mode on port zero before saving the configuration, just means you have to enable range mode for any ports you want before sending traffic. If you want it in the config, but change the disable to enable and restart pktgen.

As for the last question I agree with your reasoning.

@davideandres95
Copy link

Hi @KeithWiles,

I have just noticed that there is another issue (#288) open at the moment related to the mlx_5 driver. Maybe it makes sense to create a specific issue to verify the behavior on this NIC. If you agree, I am happy to open it and document all my observations.

As of my progress, I have written a simple DPDK app to send traffic while cycling through 100 TCP port numbers, and it seems to work well. All the 100 configured flows are sent correctly. So, I am starting to believe that my current NIC configuration is not too weird.

However, my skills are limited, and I haven't succeeded in configuring my IDE's debugger (CLion) to work with Pktgen because of the VT100 app. Is there a way to run Pktgen without the CLI app that allows me to debug it?

I have uploaded my 200-line DPDK app to generate flows. Would you mind taking a look to check if I am doing something that Pktgen is currently not doing? Sorry, I have tried inspecting Pktgen myself, but the abstractions make it pretty tough for my skills level.

I would like to find the root cause of the behavior we have seen and use Pktgen in the end to inject traffic for my research.

@KeithWiles
Copy link
Collaborator

Please do create a new issue if you want.

The simple application you created is incrementing the port number, but normally the destination MAC will cause the NIC to drop the TX packet if it does not match the NICs MAC address. If you configure Pktgen to only increment the TCP port does that work for you?

When using a debugger you can at the Pktgen CLI command line issue 'off' command to turn off the VT100 screens, use 'on' to enable it again. The only other way is to use the '-G' option, but the off/on should work. You can add the 'off' to the command file you load if you want it to disable the screen at startup. Another option when I use GDB is to use the attach to program option. In another xterm I pktgen and then in another xterm I run gdb attaching to the pktgen application.

Looking at the simple application main.c at line 83 I would not free the non-transmitted packets, but I would create a loop around rte_eth_tx_burst() function call and attempt to send all of the packets by retransmitting the packets that were not sent in the previous call. This can cause a lockup if for some reason the TX remains full forever. This change makes sure all of the flows are sent and you do not have missing ones when the TX ring becomes full.

       uint16_t to_send = BURST_SIZE;
        do {
            sent = rte_eth_tx_burst(pid, qid, pkts, to_send);
            to_send -= sent;
            pkts += sent;
        } while (to_send > 0);

Please try only incrementing the TCP ports with Pktgen and see if that works, IMO it should work.

Thanks

@davideandres95
Copy link

Hi @KeithWiles, thanks to your tip I am now able to attach with GDB to Pktgen. I hope this helps me to further debug #301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants