Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition when restarting continuous jobs #1849

Merged
merged 10 commits into from
Oct 22, 2024
Merged

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Oct 21, 2024

Changes

We don't need to cancel existing runs when the job is continuous and unpaused. The /jobs/run-now command will cancel the existing run and trigger a new one automatically.

Cancelling the job manually can cause a race condition where both the manual trigger from the CLI and the continuous trigger from the job configuration happens at the same time. This PR prevents that from happening.

Tests

Unit tests and manually

@shreyas-goenka shreyas-goenka marked this pull request as ready for review October 22, 2024 13:58
bundle/run/job.go Show resolved Hide resolved
bundle/run/job_test.go Outdated Show resolved Hide resolved
bundle/run/job.go Show resolved Hide resolved
@shreyas-goenka shreyas-goenka added this pull request to the merge queue Oct 22, 2024
Merged via the queue into main with commit 3bab21e Oct 22, 2024
5 checks passed
@shreyas-goenka shreyas-goenka deleted the fix-race branch October 22, 2024 15:04
// We don't need to cancel existing runs if the job is continuous and unpaused.
// the /jobs/run-now API will automatically cancel any existing runs before starting a new one.
//
// /jobs/run-now will not cancel existing runs if the job is continuous and paused.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "if the job is NOT continuous"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you can configure jobs to have continuous.pause_status = "PAUSED" in which case it behaves like a non-continuous job.

andrewnester added a commit that referenced this pull request Oct 23, 2024
CLI:
 * Added JSON input validation for CLI commands ([#1771](#1771)).

Bundles:
 * Support Git worktrees for `sync` ([#1831](#1831)).
 * Add `bundle summary` to display URLs for deployed resources ([#1731](#1731)).
 * Added a warning when incorrect permissions used for `/Workspace/Shared` bundle root ([#1821](#1821)).
 * Show actionable errors for collaborative deployment scenarios ([#1386](#1386)).
 * Fix path to repository-wide exclude file ([#1837](#1837)).
 * Fixed typo in converting cluster permissions ([#1826](#1826)).
 * Ignore metastore permission error during template generation ([#1819](#1819)).
 * Handle normalization of `dyn.KindTime` into an any type ([#1836](#1836)).
 * Added support for pip options in environment dependencies ([#1842](#1842)).
 * Fix race condition when restarting continuous jobs ([#1849](#1849)).
 * Fix pipeline in default-python template not working for certain workspaces ([#1854](#1854)).
 * Add "output" flag to the bundle sync command ([#1853](#1853)).

Internal:
 * Move utility functions dealing with IAM to libs/iamutil ([#1820](#1820)).
 * Remove unused `IS_OWNER` constant ([#1823](#1823)).
 * Assert SDK version is consistent in the CLI generation process ([#1814](#1814)).
 * Fixed unmarshalling json input into `interface{}` type ([#1832](#1832)).
 * Fix `TestAccFsMkdirWhenFileExistsAtPath` in isolated Azure environments ([#1833](#1833)).
 * Add behavioral tests for examples from the YAML spec ([#1835](#1835)).
 * Remove Terraform conversion function that's no longer used ([#1840](#1840)).
 * Encode assumptions about the dashboards API in a test ([#1839](#1839)).
 * Add script to make testing of code on branches easier ([#1844](#1844)).

API Changes:
 * Added `databricks disable-legacy-dbfs` command group.

OpenAPI commit cf9c61453990df0f9453670f2fe68e1b128647a2 (2024-10-14)
Dependency updates:
 * Upgrade TF provider to 1.54.0 ([#1852](#1852)).
 * Bump github.com/databricks/databricks-sdk-go from 0.48.0 to 0.49.0 ([#1843](#1843)).
github-merge-queue bot pushed a commit that referenced this pull request Oct 23, 2024
CLI:
* Added JSON input validation for CLI commands
([#1771](#1771)).
* Support Git worktrees for `sync`
([#1831](#1831)).

Bundles:
* Add `bundle summary` to display URLs for deployed resources
([#1731](#1731)).
* Added a warning when incorrect permissions used for
`/Workspace/Shared` bundle root
([#1821](#1821)).
* Show actionable errors for collaborative deployment scenarios
([#1386](#1386)).
* Fix path to repository-wide exclude file
([#1837](#1837)).
* Fixed typo in converting cluster permissions
([#1826](#1826)).
* Ignore metastore permission error during template generation
([#1819](#1819)).
* Handle normalization of `dyn.KindTime` into an any type
([#1836](#1836)).
* Added support for pip options in environment dependencies
([#1842](#1842)).
* Fix race condition when restarting continuous jobs
([#1849](#1849)).
* Fix pipeline in default-python template not working for certain
workspaces ([#1854](#1854)).
* Add "output" flag to the bundle sync command
([#1853](#1853)).

Internal:
* Move utility functions dealing with IAM to libs/iamutil
([#1820](#1820)).
* Remove unused `IS_OWNER` constant
([#1823](#1823)).
* Assert SDK version is consistent in the CLI generation process
([#1814](#1814)).
* Fixed unmarshalling json input into `interface{}` type
([#1832](#1832)).
* Fix `TestAccFsMkdirWhenFileExistsAtPath` in isolated Azure
environments ([#1833](#1833)).
* Add behavioral tests for examples from the YAML spec
([#1835](#1835)).
* Remove Terraform conversion function that's no longer used
([#1840](#1840)).
* Encode assumptions about the dashboards API in a test
([#1839](#1839)).
* Add script to make testing of code on branches easier
([#1844](#1844)).

API Changes:
 * Added `databricks disable-legacy-dbfs` command group.

OpenAPI commit cf9c61453990df0f9453670f2fe68e1b128647a2 (2024-10-14)
Dependency updates:
* Upgrade TF provider to 1.54.0
([#1852](#1852)).
* Bump github.com/databricks/databricks-sdk-go from 0.48.0 to 0.49.0
([#1843](#1843)).
github-merge-queue bot pushed a commit that referenced this pull request Dec 31, 2024
## Changes
This GET API call is unnecessary and serves no purpose. Let's remove it.
Noticed this when I was adding a unit test for the pipeline runner here:
#1849

## Tests
Manually. 

### Case 1: The pipeline does not exist

Before:
```
➜  my_project git:(master) ✗ databricks bundle run my_project_pipeline -p dogfood
Error: User shreyas.goenka@databricks.com does not have View permissions on pipeline 9941901a-e48b-4d04-b6ba-e0072ad126bg.
```

After:
```
➜  my_project git:(master) ✗ cli bundle run my_project_pipeline -p dogfood
Error: User shreyas.goenka@databricks.com does not have Run permissions on pipeline 9941901a-e48b-4d04-b6ba-e0072ad126bg.
```

### Case 2: Pipeline exists

Before:
```
➜  my_project git:(master) ✗ databricks bundle run my_project_pipeline -p dogfood --restart
Update URL: https://e2-dogfood.staging.cloud.databricks.com/#joblist/pipelines/9941901a-e48b-4d04-b6ba-e0072ad126bf/updates/0f988d62-9ec7-49f1-b429-5572ece3a9aa

2024-11-18T15:30:36.054Z update_progress INFO "Update 0f988d is WAITING_FOR_RESOURCES."
```

After:
```
➜  my_project git:(master) ✗ cli bundle run my_project_pipeline -p dogfood --restart
Update URL: https://e2-dogfood.staging.cloud.databricks.com/#joblist/pipelines/9941901a-e48b-4d04-b6ba-e0072ad126bf/updates/87b43350-6186-4a9b-9d0e-38da2ecf33ae

2024-11-18T15:28:27.144Z update_progress INFO "Update 87b433 is WAITING_FOR_RESOURCES."
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants