Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENA/XDP metadata support #265

Open
mejedi opened this issue Mar 10, 2023 · 7 comments
Open

ENA/XDP metadata support #265

mejedi opened this issue Mar 10, 2023 · 7 comments

Comments

@mejedi
Copy link

mejedi commented Mar 10, 2023

xdp_buff has a metadata area, which an XDP eBPF program can use to stash some info (using bpf_xdp_adjust_meta and direct packet access to write data). This information is transferred to sk_buff, and could be accessed by a TC eBPF program. This mechanism enables communication between XDP and TC programs.

Unfortunately, ENA driver doesn't support attaching and transferring metadata.

bpf_xdp_adjust_meta fails with ENOTSUPP when invoked in ENA XDP path.

EMnify would love to see this feature implemented.

@davidarinzon
Copy link
Contributor

Hi @mejedi

Thank you for raising this request, we will look into it.
Would be happy to hear more about your use-cases / challenges and how you're planning to use this feature going forward (on top of what's written).
You're also welcome to contact me at darinzon@amazon.com.

@mejedi
Copy link
Author

mejedi commented Mar 13, 2023

Hi @davidarinzon,

As of today, we had a single use of metadata. Our routing logic is implemented in XDP. We rely on the kernel's ARP handling/neighbour cache. When the destination MAC address is in the cache, we are able to fully handle a packet in XDP (bpf_fib_lookup to obtain MAC addresses).

However, when the destination MAC is not yet known, we have to trigger an ARP probe to learn it eventually. It is not possible from XDP, therefore we XDP_PASS to the host stack. We were passing the egress interface index, selected according to the app logic, via metadata (there are multiple interfaces in use). TC program extracted the egress interface index, and did bpf_redirect_neigh.

It worked beautifully in testing, but failed with the real ENA. Quite unexpectedly, metadata support turned out to be optional :(

A workaround would revolve around passing the data inline; the tricky bit is picking an encoding that doesn't legitimately occur on the wire. One option would be picking a distinct ethertype, but it doesn't quite work. Another option is using a funny destination MAC address. Overall, it looks needlessly complicated and inelegant though.

man bpf-helpers provides further insights on the use-cases:

The use of xdp_md->data_meta is optional and programs are not required to use it. The rationale is that when the packet is processed with XDP (e.g. as DoS filter), it is possible to push further meta data along with it before passing to the stack, and to give the guarantee that an ingress eBPF program attached as a TC classifier on the same device can pick this up for further post-processing. Since TC works with socket buffers, it remains possible to set from XDP the mark or priority pointers, or other pointers for the socket buffer. Having this scratch space generic and programmable allows for more flexibility as the user is free to store whatever meta data they need.

@davidarinzon
Copy link
Contributor

Hi @mejedi
Thank you for sharing your use-case in detail.
We will look into this support and understand the gaps.

@zeffron
Copy link

zeffron commented Sep 13, 2023

We had a patch locally to enable metadata support. It was as easy setting the second parameter in this line to true. The ENA driver already has sufficient space overhead for XDP metadata.

@mejedi
Copy link
Author

mejedi commented Sep 13, 2023

@zeffron

We had a patch locally to enable metadata support. It was as easy setting the second parameter in this line to true. The ENA driver already has sufficient space overhead for XDP metadata.

Are you sure? It looks like metadata start should be copied to skb to take advantage of it on TC side, but apparently it is not.

@zeffron
Copy link

zeffron commented Sep 18, 2023

Ah, we only use XDP, and we made our patch a while ago, so you're probably correct that more needs to be done to copy the metadata to the SKB.

@derlaft
Copy link

derlaft commented Jul 12, 2024

We had a patch locally to enable metadata support. It was as easy setting the second parameter in this line to true. The ENA driver already has sufficient space overhead for XDP metadata.

The link is now pointing to the wrong place. Here's probably the line that was linked back then:

rx_ring->ena_bufs[0].len, false);

It looks like metadata start should be copied to skb to take advantage of it on TC side, but apparently it is not.

Nevertheless, even without copying to tc being able to use ctx->data_meta for communicating between tail-called xdp programs is already a big improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants