-
Notifications
You must be signed in to change notification settings - Fork 921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rec: dedup records #14617
rec: dedup records #14617
Conversation
Pull Request Test Coverage Report for Build 12705938523Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks good to me. I guess we could try a few more heuristics before actually serializing the content, like checking if we have several records with the same (qtype, qname), but it might not be worth it. Having real-world numbers would indeed be useful.
863fd47
to
8089aeb
Compare
Rebased to fix conflict. |
Some observations: Software tested in default config |
1de6a90
to
5a98b0f
Compare
Speedtest results:
The measured slowdown is about 2.5 and is uniform over the various test case sizes. So the dedupping takes time as expected, but for the already pretty extreme case of 256, records, its absolute value is not a lot compared to the expected network latency. For the 4096 case we spent time that comes closer to the expected network latency. |
Can we rule out that deduping is an attack vector? |
Not completely, in the extreme case spending even a few CPU ms on a single auth result is quite a lot. |
080d63b
to
97795db
Compare
Are you considering an on/off switch for it? |
Yes, that would be one of the options. Another alternative would be to not do the dedupping on large answers as we already refuse to cache them anyway. |
I played a bit with a pre-scan on qtype and name only, but saw no speedup |
274b82b
to
e26c334
Compare
…ows for an unordered_set as well.
e26c334
to
a5962f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general logic and the places were it is applied look good to me.
@@ -1527,6 +1512,9 @@ void startDoResolve(void* arg) // NOLINT(readability-function-cognitive-complexi | |||
} | |||
|
|||
if (!ret.empty()) { | |||
#ifdef notyet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might warrant a comment, at the very least :)
pdns/speedtest.cc
Outdated
std::vector<DNSRecord> vec; | ||
vec.reserve(d_howmany); | ||
std::string name("some.name.in.some.domain"); | ||
auto count = d_howmany; | ||
if (d_withdup) { | ||
count--; | ||
} | ||
for (size_t i = 0; i < count; i++) { | ||
auto content = DNSRecordContent::make(QType::TXT, QClass::IN, "\"a text " + std::to_string(i) + "\""); | ||
DNSRecord rec(name, content, QType::TXT); | ||
if (i == 0 && d_withdup) { | ||
vec.emplace_back(rec); | ||
} | ||
vec.emplace_back(std::move(rec)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to do this in the constructor instead, to get the initial steps out of the timed computation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did not do that as operator()()
must be const
, and dedup
modifies the vector. A middle ground would be to copy the vec constructed in the ct and operate on that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right! The middle ground looks good indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version looks good, note that I still have a few comments that have not been addressed as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I could have sworn I did the other ones as well, will fix.
9046cd0
to
bcf9b97
Compare
bcf9b97
to
0941cf9
Compare
Short description
This deduplicaties records in two places:
When testing this, I encountered a few cases where auths sent duplicate records.
Nothing actually breaks if we don't dedup afaik. So it is questionable of we want this part.
But on the client side, this fixes #14120 and maybe other cases I do not know.
The big question is if we want this. Dedupping is fundamentally not cheap, although I tried to optimize the dedup code.
Originally I played with the idea to change the data structure building the reply vector to avoid duplicates, but that requires changes in many places. Still the idea is not completely off the table.
On the receiving side the dedup internals could also be weaved into the sanitize code, at the cost of increasing complexity. That would avoid the separate dedup() call.
This will remain a draft until I have some some speed measurements and pondered the alternative approaches some more. This PR is mainly to share thoughts.
The test changes are needed as a few of them use duplicate records.
Update: I removed the general dedup calls and I'm now only using dedup where it makes sense to solve a specific case of potential duplciate records. The general purpose (quite performant) dedup code remains and is used, in one case replacing existing ad-hoc dedup code.
Checklist
I have: