-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: move aside TSM file on errBlockRead #25839
base: master-1.x
Are you sure you want to change the base?
Conversation
@davidby-influx This is probably an uncommon occurrence, but what if the first One solution would be to have Another solution would to change |
The second solution is conceptually cleaner, but the reason we put the list limit in was that this error list was crashing the system with an OOM when a bad file was being queried again and again. The search for existing errors matching the new error should probably dictate a different data structure to store errors to reduce search costs. |
The original problem for which the error list was limited to avoid running out of memory was because a bad block caused thousands of the same error. I suggest we store each unique error (as determined by its We will still limit map size, but this will cause far fewer overflows of the error limit, because the symptom we have seen is many, many multiples of the same error. This still runs the risk of discarding an |
Alternately, we could remove the limit on error storage, on the theory that the number of unique errors is unlikely to cause an out-of-memory situation. Then we would have a small risk of an OOM, but reduce the risk of missing an |
I'd still be tempted to key off the type of error instead of the value I think it would also be good to keep the errors in order, if possible. The ordering of the errors might yield clues to what went wrong. |
Types are problematic. A lot of errors come up as type The point on ordering is important, though. |
@gwossum - take a look at my latest revision. We now store unique errors in order of appearance up to a limit. Should reduce storage in the case we first saw, and retains the ordering. |
The error type check for errBlockRead was incorrect,
and bad TSM files were not being moved aside when
that error was encountered. Use errors.Join,
errors.Is, and errors.As to correctly unwrap multiple
errors.
Closes #25838