-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed #6393
Comments
@suryaaprakassh What guidance do you need? |
@yurishkuro Has anyone started this? If not, I’d like to try. |
@yurishkuro Regarding the issue, I was able to understand the first point that you made that we want to add hash in the span as a new field so that we are calculating hashes of only those that are not yet calculated. I have doubts regarding the 2nd point that you are making. Can you provide a sample testcase where, if we do not include the 2nd point in the final code, there might be an issue? I think that if, in the future, a new attribute is added to the hash, it might not be that big of an issue because we just have to change the input to the hash function. That is, while adding a new field in the span, simultaneously change the hashing code so that the issue is resolved. |
What if we forget to do that? The intent of item (2) is to ensure that we will have some tests fail if we forget to add a new field to the hash function. A solution that does not provide this guarantee is unsafe and can lead to future bugs. |
@yurishkuro
Instead of adding unit tests, there is a simpler solution that I have that could prevent this issue. Keep an array that stores what all fields have to be used for hashing and what will be their order. Lets say that following are the fields of a span: and out of these [traceID, spanID, operationName] are the ones that are used for hashing. So, what we can do is that we can define an array that will contain what field to be used. So the array might look like the following: Index-----------Value and if we want to add a new field, we can just add it to the array as follows, and it will be adjusted automatically by the hashing function. Index ------- Value |
|
@yurishkuro I understood the concern. The problem being that for all the spans during deployment, we cannot use reflection as consumes the resource. So the solution I can think of is a combination of my proposed solution and yours. Index-----------Value And during the testing, it will be checked whether the hash of the span that comes out of the adjuster is the same or not. So this will reduce the headache of adding custom tests, and the user just has to make changes in the ordering array, and everything will be done automatically. Although this might not be efficient during the testing phase, I don't think it matters. |
it's ok to use reflection in the tests. The "array" is irrelevant since it can be constructed via reflection from the actual types. |
Hey, I spent some time understanding this issue, and found it interesting.
Is this issue still being worked on or can I work on it @chahatsagarmain @yurishkuro ? |
Originally posted by @yurishkuro in #6391 (comment)
Some storage backends (Cassandra, in particular), perform similar deduping by computing a hash before the span is saved and using it as part of the partition key (it creates tombstones if identical span is saved 2 times or more but no dups on read). So we could make this hashing process to be a part of the ingestion pipeline (e.g. in sanitizers) and simply store it as an attribute on the span. Then this adjuster would be "lazy", it will only recompute the hash if it doesn't already exist in the storage.
If we do this on the write path, we would want this to be as efficient as possible, so we would need to implement manual hashing by iterating through the attributes (and pre-sorting them to avoid dependencies) and but manually going through all fields of the Span / SpanEvent / SpanLink. The reason I was reluctant to do that in the past was to avoid unintended bugs if the data model was changed, like a new field added that we'd forget to add to the hash function. To protect against that we probably could use some fuzzing tests, by setting / unsetting each field individually and making sure the hash code changes as a result.
The text was updated successfully, but these errors were encountered: