-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion surrounding range commands #295
Comments
The script is mixing single mode calls and range calls. The
|
Please update pktgen to latest version 24.10.3 and you may have to update DPDK to the current version as well. DPDK has changed defines and APIs, which makes it difficult to maintain backward compatibility so I only support latest Pktgen and DPDK. Give that a try and see if that works as the above screen does not look correct for you configuration. When I have more time today, I will read the rest of the post above. |
When using range command it does cycle thru the packet until you stop sending traffic as long as send forever is set as the TX packet count. Make sure you did not set the TX count to some value. The only way to get something different then what the range mode supports is to use PCAP file with the traffic you need. |
That doesn't seem to be the case when I run the script you sent. It immediately exits. |
In the script I did not start pktgen sending packets unless you added that code back. After loading the script you can do the |
Hi, local dstip = "10.12.0.1";
local min_dstip = "10.12.0.1";
local max_dstip = "10.12.0.255";
local inc_dstip = "0.0.0.1"; only results in 222 different flows (expected 256), no matter how long I let it run. To count the number of flows, I am using external HW. I can only suspect of the burst size having something to do, but I am unsure how to fix this. |
If you try to increase the number of flow to a few more flows do you get a different count? |
When running again configured for 256 flows, I get 224 this time; 32 are missing. The list is as follows:
Increasing the number of flows using dstip = "10.12.0.1";
min_dstip = "10.12.0.1";
max_dstip = "10.12.1.4";
inc_dstip = "0.0.0.1"; results in 216 flows. Please let me know if you also want the list. |
I copied the above list into VIM and removed the missing lines and that gives me 225 lines in the file. If I calculated the number of flows it would be 260 flows total and it is missing 36 flows 0.35, 0.68-0.96, 0.129 and 1.0-1.4. Check my math here. You can try setting the burst size to 1 and see if that gives a different answer or different missing flows. |
Sorry for the confusion, but the list is for the previous config local dstip = "10.12.0.1";
local min_dstip = "10.12.0.1";
local max_dstip = "10.12.0.255";
local inc_dstip = "0.0.0.1"; So only 31 are missing (0.35, 0.68-0.96, 0.129). Some additional information:
|
With that few of flows being generated can you use something like wireshark to capture say a 1000 packets and see if we can count the flows within this small number of packets. I know it is some work, but if the switch is not counting flows with less than 3 packets I worry pktgen is sending the packets and the count is throwing us off. The code is straight forward as it increments the IP address. The only other place packets can be dropped is when transmitting packets the TX queue in the NIC hardware is becoming full. If I remember correctly I attempt to resend the part of the burst not sent the first time and continue to attempt sending all of the burst until all packets in the burst are sent. Maybe I have something wrong when the TX burst send returns with less then a burst being sent, which would cause missing some flows. The problem there is if you send for a long time it should resend the flows. The way pktgen (and DPDK) work is packets are consumed from the pre-built pktmbuf mempool, but packets can be placed in the per lcore cache and resented and not be able to send all of the packets as the packets are being pulled from the cache over and over without getting to the other flows. The standard TX mempool has the cache turned on and this could be causing the problem. Sorry I had to write this quickly before a meeting. I hope you get the idea. |
Thank you for your support and explanations, @KeithWiles.
I had sent 1000 packets. However, I don't believe that the switch is misguiding us. I have verified that it is reliably counting flows when sending 1000 flows sequentially using Scapy. I have also improved it's monitoring to register any flow, even if it has only one or two packets. The results show the same results as before. I am going to start looking at the source code, but I have little experience with DPDK. I will share any finding. |
Looking at the code for capture it should not find a zero length packet. If you want please change the line in file ptkgen/app/pktgen-capture.c lines 180-184 to the following: for (uint32_t i = 0; i < cap->nb_pkts; i++) {
if (hdr->pkt_len == 0) {
printf("\n>>> Hit packet length zero at %'u of %'u skipping\n", i, cap->nb_pkts);
continue;
} Allow the dump to continue skipping the zero length packet. I have not tried this change. |
The suggested patch does not solve the capturing problem:
The stats show:
If it helps somehow, the NIC is a Mellanox Connect-5. |
I do not have a Connect-5 to test this problem. I was able to setup a test sending range packets to wireshark. I sent 1000 packets and only saw 965 packets in wireshark :-( Wireshark is not able to keep up with the transmit rate and is dropping packets. I change the TX count to 512 and it was able to capture all 512 packets. I then converted it to a text files and removed duplicate IP addresses. The total number of packets in my test is 254 and it sent 254 different flows.
I did disable incrementing the src/dst ID as wireshark was trying to figure out the packet based on port ID.
|
I am relieved to see that it works on other systems. Thank you for testing it. I will try the same configuration you used and report back my observations. Do you think that the NIC config or hugepages setup could be causing this misbehavior? Sorry, my systems' knowledge is limited. To start Pktgen I use the following command:
And I have configured 1G-size hugepages on /mnt/huge. I am trying to start the testpmd application to help me debug with the info from the
Is there any debug mode or traces that I could enable to help me debug the Pktgen behavior? Ultimately, I would like to use Pktgen as a traffic generator for my research on efficient controllers for stateful NFs running on P4-programmable HW. Therefore, I want to generate high-speed traffic with many parallel flows. |
It looks like the pktgen configuration is correct. The only question I have is your NIC is on PCI address 81:00.0 and normally if your PCI address starts with 80.00.0 and above this means it is attached to PCI bus 1 addresses 7f:00.0 and below are on PCI bus 0. In a dual socket (CPU = Socket) system this normally means that PCI bus is attached to Socket 1 (numbered 0, 1). You can run On my machine I use af:00.0 and af:00.1 which is attached to PCI bus 1 and busd one is attached to Socket 1. This means all of the core I need to use for that port must be from socket 1. You can install If you use core from socket 0 you will have a bottleneck across the QPI bus between the sockets. If you only have a single CPU then it does not matter. For test-pmd I use Pktgen, so not an expert on using test-pmd, but you can add |
Hi Keith,
Regarding the last configuration you provided (I adapted the MAC addresses), I have tested it without successful results. It seems that the rate has an effect on the number of flows. The best results are achieved with a rate of 0.0001%; a 100% rate results in a single flow. Also, I have not been able to debug the problem using the testpmd. The statistics are empty:
In the light of the differences when using different rates, could it be that this NIC requires a specific configuration to behave addequately? |
The memory configuration can cause performance issues, but I to think this is not a problem. I have never used an MLX and I do not know how to diagnose the problem farther. If you can try on a non-MLX device like a XL710 i40e device as I use here then maybe I could help, but at this time I can not. Sorry. |
Some additional information:
Will keep working on it... |
For some NICs they detect suspicious frames and drops them before sending the frame. This could be related to the problem you are seeing here. Check to make sure the NIC is not dropping frames it finds suspicious as I know the Intel NICs will do this and it has some configuration options to help turn this off. |
The hypothesis on the NIC dropping frames is winning points. |
Great making progress. Intel calls these packets malicious and it appears this NIC has a similar feature. You will have to see if that feature can be turned off or you have to adjust your range options to make sure the NIC sends the packet. I have seen the Source MAC address not belonging to the NIC will be detected as malicious and dropped. |
Hi @KeithWiles, On my side, I have moved back to using the Lua script you provided at the beginning of the thread, since the pkt script causes my NIC not to receive any packets. I might be able to disable this behavior in the future, though. Can you clarify if you agree with me on the following? Pktgen is iterating over the range parameters in fixed batches until the requested number of packets is correctly transmitted. For some reason, my NIC is dropping/rejecting some of the generated mbufs, causing them to be absent in the transmission. Therefore, additional packets from the other flows are sent until 512 packets are successfully transmitted. |
Sorry I did not enable the range mode on port zero before saving the configuration, just means you have to enable range mode for any ports you want before sending traffic. If you want it in the config, but change the disable to enable and restart pktgen. As for the last question I agree with your reasoning. |
Hi @KeithWiles, I have just noticed that there is another issue (#288) open at the moment related to the mlx_5 driver. Maybe it makes sense to create a specific issue to verify the behavior on this NIC. If you agree, I am happy to open it and document all my observations. As of my progress, I have written a simple DPDK app to send traffic while cycling through 100 TCP port numbers, and it seems to work well. All the 100 configured flows are sent correctly. So, I am starting to believe that my current NIC configuration is not too weird. However, my skills are limited, and I haven't succeeded in configuring my IDE's debugger (CLion) to work with Pktgen because of the VT100 app. Is there a way to run Pktgen without the CLI app that allows me to debug it? I have uploaded my 200-line DPDK app to generate flows. Would you mind taking a look to check if I am doing something that Pktgen is currently not doing? Sorry, I have tried inspecting Pktgen myself, but the abstractions make it pretty tough for my skills level. I would like to find the root cause of the behavior we have seen and use Pktgen in the end to inject traffic for my research. |
Please do create a new issue if you want. The simple application you created is incrementing the port number, but normally the destination MAC will cause the NIC to drop the TX packet if it does not match the NICs MAC address. If you configure Pktgen to only increment the TCP port does that work for you? When using a debugger you can at the Pktgen CLI command line issue 'off' command to turn off the VT100 screens, use 'on' to enable it again. The only other way is to use the '-G' option, but the off/on should work. You can add the 'off' to the command file you load if you want it to disable the screen at startup. Another option when I use GDB is to use the attach to program option. In another xterm I pktgen and then in another xterm I run gdb attaching to the pktgen application. Looking at the simple application main.c at line 83 I would not free the non-transmitted packets, but I would create a loop around rte_eth_tx_burst() function call and attempt to send all of the packets by retransmitting the packets that were not sent in the previous call. This can cause a lockup if for some reason the TX remains full forever. This change makes sure all of the flows are sent and you do not have missing ones when the TX ring becomes full. uint16_t to_send = BURST_SIZE;
do {
sent = rte_eth_tx_burst(pid, qid, pkts, to_send);
to_send -= sent;
pkts += sent;
} while (to_send > 0); Please try only incrementing the TCP ports with Pktgen and see if that works, IMO it should work. Thanks |
Hi @KeithWiles, thanks to your tip I am now able to attach with GDB to Pktgen. I hope this helps me to further debug #301 |
Hi, I am trying to use the range commands to control the number of unique flows for my workload.
This is the lua script I have right now
My confusion is that when I execute this workload, the source port and destination port do not remain static. But the src ip remains static but the destination ip address seems to also exceed the range.
Have I configured the range commands incorrectly? Any help would be appreciated.
The text was updated successfully, but these errors were encountered: