-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS table lock not working: broken downloads #325
Comments
We rely on The lock is present when data is being inserted:
SQL queries for that table are waiting for lock releases during insertion. |
Thanks @muttcg Do we know if the error is always related to the |
@timrobertson100 |
Thanks. Found in Slack from 2 months ago, so this confirms not just multimedia:
|
I have just seen this when trying a clustering run. The clustering run is slightly different than a download in that it is doing a Spark SQL job, sourced from the Hive metastore. It could be that Spark SQL doesn't lock (or perhaps our environment is not configured to lock) the same way as the Oozie-launched MR jobs.
The nightly table build job appears to still be in the create-avro stage Edited after the build table completed to add: The create-avro launches 2 child jobs, and it may be noteworthy that the first ( Also likely relevant is that the file that is missing in my query was actually created nearly an hour before the error and before I submitted my clustering job, but was presumably sitting in a job tmp directory and moved into place as the MR job completed (
|
Some downloads can fail around 06:00Z when the HDFS table build completes.
There are a few of these from recent weeks, but it's not necessarily a new problem.
The text was updated successfully, but these errors were encountered: