-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition when restarting continuous jobs #1849
Conversation
// We don't need to cancel existing runs if the job is continuous and unpaused. | ||
// the /jobs/run-now API will automatically cancel any existing runs before starting a new one. | ||
// | ||
// /jobs/run-now will not cancel existing runs if the job is continuous and paused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean "if the job is NOT continuous"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you can configure jobs to have continuous.pause_status = "PAUSED" in which case it behaves like a non-continuous job.
CLI: * Added JSON input validation for CLI commands ([#1771](#1771)). Bundles: * Support Git worktrees for `sync` ([#1831](#1831)). * Add `bundle summary` to display URLs for deployed resources ([#1731](#1731)). * Added a warning when incorrect permissions used for `/Workspace/Shared` bundle root ([#1821](#1821)). * Show actionable errors for collaborative deployment scenarios ([#1386](#1386)). * Fix path to repository-wide exclude file ([#1837](#1837)). * Fixed typo in converting cluster permissions ([#1826](#1826)). * Ignore metastore permission error during template generation ([#1819](#1819)). * Handle normalization of `dyn.KindTime` into an any type ([#1836](#1836)). * Added support for pip options in environment dependencies ([#1842](#1842)). * Fix race condition when restarting continuous jobs ([#1849](#1849)). * Fix pipeline in default-python template not working for certain workspaces ([#1854](#1854)). * Add "output" flag to the bundle sync command ([#1853](#1853)). Internal: * Move utility functions dealing with IAM to libs/iamutil ([#1820](#1820)). * Remove unused `IS_OWNER` constant ([#1823](#1823)). * Assert SDK version is consistent in the CLI generation process ([#1814](#1814)). * Fixed unmarshalling json input into `interface{}` type ([#1832](#1832)). * Fix `TestAccFsMkdirWhenFileExistsAtPath` in isolated Azure environments ([#1833](#1833)). * Add behavioral tests for examples from the YAML spec ([#1835](#1835)). * Remove Terraform conversion function that's no longer used ([#1840](#1840)). * Encode assumptions about the dashboards API in a test ([#1839](#1839)). * Add script to make testing of code on branches easier ([#1844](#1844)). API Changes: * Added `databricks disable-legacy-dbfs` command group. OpenAPI commit cf9c61453990df0f9453670f2fe68e1b128647a2 (2024-10-14) Dependency updates: * Upgrade TF provider to 1.54.0 ([#1852](#1852)). * Bump github.com/databricks/databricks-sdk-go from 0.48.0 to 0.49.0 ([#1843](#1843)).
CLI: * Added JSON input validation for CLI commands ([#1771](#1771)). * Support Git worktrees for `sync` ([#1831](#1831)). Bundles: * Add `bundle summary` to display URLs for deployed resources ([#1731](#1731)). * Added a warning when incorrect permissions used for `/Workspace/Shared` bundle root ([#1821](#1821)). * Show actionable errors for collaborative deployment scenarios ([#1386](#1386)). * Fix path to repository-wide exclude file ([#1837](#1837)). * Fixed typo in converting cluster permissions ([#1826](#1826)). * Ignore metastore permission error during template generation ([#1819](#1819)). * Handle normalization of `dyn.KindTime` into an any type ([#1836](#1836)). * Added support for pip options in environment dependencies ([#1842](#1842)). * Fix race condition when restarting continuous jobs ([#1849](#1849)). * Fix pipeline in default-python template not working for certain workspaces ([#1854](#1854)). * Add "output" flag to the bundle sync command ([#1853](#1853)). Internal: * Move utility functions dealing with IAM to libs/iamutil ([#1820](#1820)). * Remove unused `IS_OWNER` constant ([#1823](#1823)). * Assert SDK version is consistent in the CLI generation process ([#1814](#1814)). * Fixed unmarshalling json input into `interface{}` type ([#1832](#1832)). * Fix `TestAccFsMkdirWhenFileExistsAtPath` in isolated Azure environments ([#1833](#1833)). * Add behavioral tests for examples from the YAML spec ([#1835](#1835)). * Remove Terraform conversion function that's no longer used ([#1840](#1840)). * Encode assumptions about the dashboards API in a test ([#1839](#1839)). * Add script to make testing of code on branches easier ([#1844](#1844)). API Changes: * Added `databricks disable-legacy-dbfs` command group. OpenAPI commit cf9c61453990df0f9453670f2fe68e1b128647a2 (2024-10-14) Dependency updates: * Upgrade TF provider to 1.54.0 ([#1852](#1852)). * Bump github.com/databricks/databricks-sdk-go from 0.48.0 to 0.49.0 ([#1843](#1843)).
## Changes This GET API call is unnecessary and serves no purpose. Let's remove it. Noticed this when I was adding a unit test for the pipeline runner here: #1849 ## Tests Manually. ### Case 1: The pipeline does not exist Before: ``` ➜ my_project git:(master) ✗ databricks bundle run my_project_pipeline -p dogfood Error: User shreyas.goenka@databricks.com does not have View permissions on pipeline 9941901a-e48b-4d04-b6ba-e0072ad126bg. ``` After: ``` ➜ my_project git:(master) ✗ cli bundle run my_project_pipeline -p dogfood Error: User shreyas.goenka@databricks.com does not have Run permissions on pipeline 9941901a-e48b-4d04-b6ba-e0072ad126bg. ``` ### Case 2: Pipeline exists Before: ``` ➜ my_project git:(master) ✗ databricks bundle run my_project_pipeline -p dogfood --restart Update URL: https://e2-dogfood.staging.cloud.databricks.com/#joblist/pipelines/9941901a-e48b-4d04-b6ba-e0072ad126bf/updates/0f988d62-9ec7-49f1-b429-5572ece3a9aa 2024-11-18T15:30:36.054Z update_progress INFO "Update 0f988d is WAITING_FOR_RESOURCES." ``` After: ``` ➜ my_project git:(master) ✗ cli bundle run my_project_pipeline -p dogfood --restart Update URL: https://e2-dogfood.staging.cloud.databricks.com/#joblist/pipelines/9941901a-e48b-4d04-b6ba-e0072ad126bf/updates/87b43350-6186-4a9b-9d0e-38da2ecf33ae 2024-11-18T15:28:27.144Z update_progress INFO "Update 87b433 is WAITING_FOR_RESOURCES." ```
Changes
We don't need to cancel existing runs when the job is continuous and unpaused. The
/jobs/run-now
command will cancel the existing run and trigger a new one automatically.Cancelling the job manually can cause a race condition where both the manual trigger from the CLI and the continuous trigger from the job configuration happens at the same time. This PR prevents that from happening.
Tests
Unit tests and manually