-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List jobs per job type #576
base: main
Are you sure you want to change the base?
Conversation
458d480
to
eba403f
Compare
@@ -103,20 +105,57 @@ def list_jobs( | |||
) | |||
|
|||
|
|||
@router.get("/{job_id}") | |||
def get_job(service: JobServiceDep, job_id: UUID) -> Job: | |||
@router.get("/{job_spec}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will a spec be in this case? we could consider a more descriptive variable name here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. are we changing this from a UUID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are. It can be a uuid (returning a single job) or a job type name (inference
or evaluate
, returning a list of jobs). It follows the approach in the create_job service for consistency. I'd rather use a param (actually inheritance via anyOf plus a discriminator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have one route return one type of object for clarity and consistency. If we have a spec that returns multiple objects, we should return a dictionary of elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a couple of constraints: make it backwards compatible, and follow a similar interface to the existing calls.
Regarding the dictionary, the GET /jobs
call already returns a list, so I'd see it natural to return a list as well.
Regarding the routes, I can use a more specific route like GET /jobs/per_type/{job_type}
to avoid overloading the /jobs/{job_id}
route. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can actually use /jobs/{uuid}
and /jobs/inference/
/ /jobs/evaluate/
with trailing slashes :-/ It's a bit messy too but I kinda like it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agreed with @aittalam on using /jobs/{job_type}/
to keep it consistent with the existing POST methods. Let me know if this would be ok for you.
ray_job = _get_ray_job(job_id) | ||
|
||
# Combine both types of response. | ||
x = ray_job.model_dump() # JobSubmissionResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're looking to understand the type here, we can annotate it:
x: JobSubmissionResponse = ray_job.model_dump()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I'm not sure I wrote that, but type annotations are fine for me. IIRC this would need to be taken from ray (but we do have those dependencies and can import the types, right?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I wrote that
I mean I just copied the existing approach of merging the Ray info together with our own.
|
||
# Merge Ray jobs into the repositories jobs | ||
for job in jobs.items: | ||
found_job = next( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this piece of logic: why are we looking to combine Ray job and lumigator IDs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done in a separate PR already, I'm just following it. The Ray info is bundled with the Lumigator job info so the UI has all the info in one place. Please check #514
op.add_column("jobs", sa.Column("job_type", sa.String(), nullable=True)) | ||
|
||
|
||
def downgrade() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we adding and removing the same column here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly (which may not be doing!) adding the column is part of the upgrade procedure into this db revision from the previous, and removing the column is part of the downgrade procedure from this db revision into the previous. But maybe I'm not getting it right :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm! This is automatic code generated by alembic to upgrade / downgrade the database to a new / older version.
As Dimitri's PR has been merged, I think you don't need to have this anymore as you are not actively modifying the DB (IIRC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand corrected, the previous PR was not dealing with thejob_type
field. You'll need to run alembic again then and generate a more recent migration if you have not done it already
238cbb1
to
0197eac
Compare
revision: str = "ef5ee5662ce3" # pragma: allowlist secret | ||
down_revision: str | None = "e9679cbc3c36" # pragma: allowlist secret | ||
branch_labels: str | Sequence[str] | None = None | ||
depends_on: str | Sequence[str] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure when we merge this it doesn't conflict with other alembic changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmmm, true :-/ I'll regenerate the alembic part just before merging just in case.
@@ -111,15 +111,15 @@ def list_jobs( | |||
# Merge Ray jobs into the repositories jobs | |||
for job in jobs.items: | |||
found_job = next( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest splitting this code into two lines and commenting for legibility, and in splitting it up, looks like this can go into a dict?
return_val: type = (job for job in filter(lambda x: x.submission_id == str(job.id), ray_jobs)), None
found_job: [type here] = next(return_val)
suggested refactor:
ray_jobs = {}
for job in jobs.items():
submission_id = job[1].submission_id
if submission_id:
ray_jobs[submission_id] = job[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling a set/dict to keep track of the ids definitely helps finding out ray/job relations.
Ok, refactoring 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did something similar:
# Get all jobs Ray knows about on a dict
ray_jobs = {ray_job.submission_id: ray_job for ray_job in _get_all_ray_jobs()}
results = list[Job]()
# Merge Ray jobs into the repositories jobs
job: JobResponse
for job in jobs.items:
job_id = str(job.id)
if job_id in ray_jobs:
# Combine both types of response.
"""Attempts to retrieve merged job data from the Lumigator repository and Ray | ||
for a valid UUID. | ||
|
||
The result is a merged representation which forms an augmented view of a 'job'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the comment string is incorrectly formatted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense? I don't get the issue :(
x = ray_job.model_dump() # JobSubmissionResponse | ||
y = job.model_dump() # JobResponse | ||
merged = {**x, **y} | ||
ray_info = ray_job.dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would recommend creating or using an exiting PyDantic class for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do. It's just an older version that Ray uses, afaict :-/ I tried using model_dump, but if I got it right it's their Pydantic version not implementing it.
def _get_all_ray_jobs() -> list[RayJobDetails]: |
from ray.job_submission import JobDetails as RayJobDetails |
https://github.com/ray-project/ray/blob/916f534e571278b26733812b24a7b3dee08f24e4/python/ray/dashboard/modules/job/pydantic_models.py#L38
@@ -194,7 +240,7 @@ def get_job_result_download( | |||
return service.get_job_result_download(job_id) | |||
|
|||
|
|||
def _get_all_ray_jobs() -> list[JobSubmissionResponse]: | |||
def _get_all_ray_jobs() -> list[RayJobDetails]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the info we get is modelled by the Ray pydantic models. Then it is merged with our own job info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merged information is not modelled as JointJobInfo(RayJobDetails, JobResponse)
because this would require the lumigator schemas to import the ray
package, which is too heavy for the end user as there is no separate api or model package for Ray.
@@ -24,6 +24,9 @@ | |||
from backend.services.datasets import DatasetService | |||
from backend.settings import settings | |||
|
|||
DEFAULT_SKIP = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's injecting per method call to follow the rest of the repo
self, skip: int = 0, limit: int = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I'd prefer to have defaults as consts and not magic numbers, to keep consistency over the code.
What's changing
A GET request can be issued to the
/jobs/{JOB_TYPE}/
endpoint, where currentlyJOB_TYPE
is eitherinference
orevaluate
, to select all jobs of the desired type. A GET request can still be issued to/jobs/{UUID}/
whereUUID
is a v4 UUID to retrieve the details of a single job.Closes #380
How to test it
As tested in the integration tests, two jobs, one inference and one evaluation, can be started, and only one would be selected with the new API.
Additional notes for reviewers
N/A
I already...
/docs
)