-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-7223: Create an option to control timeout for REFRESH METADATA #1776
base: master
Are you sure you want to change the base?
Conversation
@@ -354,6 +354,10 @@ private ExecConstants() { | |||
"enables statistics usage for varchar and decimal data types. Default is unset, i.e. empty string. " + | |||
"Allowed values: 'true', 'false', '' (empty string)."), "true", "false", ""); | |||
|
|||
public static final String PARQUET_REFRESH_TIMEOUT = "store.parquet.refresh_timeout_per_runnable_in_msec"; | |||
public static final LongValidator PARQUET_REFRESH_TIMEOUT_VALIDATOR = new LongValidator(PARQUET_REFRESH_TIMEOUT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this PositiveLongValidator
since timeout of 0 does not make sense. Maximum value can be Integer.MAX_VALUE
@@ -354,6 +354,10 @@ private ExecConstants() { | |||
"enables statistics usage for varchar and decimal data types. Default is unset, i.e. empty string. " + | |||
"Allowed values: 'true', 'false', '' (empty string)."), "true", "false", ""); | |||
|
|||
public static final String PARQUET_REFRESH_TIMEOUT = "store.parquet.refresh_timeout_per_runnable_in_msec"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid the word 'refresh' here and other places for the timeout since this parameter is intended for any timed runnable task, not just the ones initiated by the REFRESH command. For instance, in normal query planning without using metadata cache, the FooterGatherer also creates multiple TimedCallable
threads to read parquet footers directly.
* @throws IOException All exceptions are coerced to IOException since this was build for storage system tasks initially. | ||
*/ | ||
public static <V> List<V> run(final String activity, final Logger logger, final List<TimedCallable<V>> tasks, int parallelism, long timeout) throws IOException { | ||
TIMEOUT_PER_RUNNABLE_IN_MSECS = timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention we treat these as static final constants (although I see that the final
is missing). Also assigning to this static variable is not thread safe. Actually, now that this is a config option, why do we need a static constant ? Why not just pass it as a parameter wherever needed including passing it to the constructor of classes that are extending the abstract class and then we can set the timeout as a local class variable ?
@@ -50,6 +50,7 @@ | |||
private boolean enableTimeReadCounter = false; | |||
private boolean autoCorrectCorruptedDates = true; | |||
private boolean enableStringsSignedMinMax = false; | |||
private long timeoutPerRunnableInMsec = 15_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the 'owner' of the default should be ExecConstants, we should avoid this default value.
@@ -629,6 +629,7 @@ drill.exec.options: { | |||
store.parquet.reader.columnreader.async: false, | |||
store.parquet.reader.int96_as_timestamp: false, | |||
store.parquet.reader.strings_signed_min_max: "", | |||
store.parquet.refresh_timeout_per_runnable_in_msec: 15000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See prior comment about omitting 'refresh' . Also I think that we should not put this in the 'parquet' namespace even though currently it is mainly used for Parquet metadata read. The reason is the runnable timeout could in theory be used for other formats also. How about store.timeout_per_runnable_in_msec
?
The new option is passed to the TimedCallable code via the existing ParquetReaderConfig parameter, by adding a Long field there - timeoutPerRunnableInMsec .
The timeout (TIMEOUT_PER_RUNNABLE_IN_MSECS) is used there for two purposes:
(1) Issue a warning when exceeded.
(2) Time-out (i.e. throw TimeoutException ?) the running thread.
Seems that the second one does not work; maybe the last change (DRILL-6281) is the reason.