Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Update in background job #49

Merged
merged 5 commits into from
Dec 20, 2024

Conversation

ebrelsford
Copy link

@ebrelsford ebrelsford commented Dec 19, 2024

This is a work in progress PR, please do not merge.

The intention here is to use background jobs to update dataset metadata needed for zip file expansion and map previews. Currently we run a separate script to do this which can lead to confusion or errors.

To do:

  • update zip file sources
  • update map preview metadata
  • remove sync script

@ebrelsford
Copy link
Author

@phargogh before I wrap this up I thought I'd show you what I have so far here--feels like it will be a bit more convenient than what we have right now.

Comment on lines +3 to +4
echo "Starting background jobs worker"
ckan jobs worker &
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start the jobs worker in the background.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this need to be nohuped in order to keep it running past the end of the shell process, and still have it running in the background (thanks to &)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that could be necessary, locally this works but I can see how it would not when deployed. I can try it on staging first.

Comment on lines +265 to +267
def after_dataset_update(self, context, package):
resources = [res.as_dict(core_columns_only=False) for res in context['package'].resources]
toolkit.enqueue_job(update_dataset, [context['user'], package, resources])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hook that is fired on dataset update via the interface or API.

Comment on lines +107 to +109
ctx = { 'user': user }
updates = {'id': dataset['id'], 'extras': extras}
toolkit.get_action('package_patch')(ctx, updates)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We patch the dataset with our updated extras to avoid overwriting other fields.

Comment on lines 82 to 84
def update_mappreview(dataset, metadata, extras):
# TODO
return extras
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will port over from sync-datasets.py.

@ebrelsford
Copy link
Author

@phargogh I believe this is ready now! I tried this on staging without a nohup and it appears to work fine.

Steps to test:

  1. Deploy with these changes.
  2. Edit a dataset and save it.
  3. In the background, the worker should at least update an extras field called natcap_last_updated, which you should see when you edit the dataset again.

So now adding / editing via the interface or API should have the same result as running sync-datasets.

Copy link
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @ebrelsford !

@phargogh phargogh merged commit de6e983 into natcap:master Dec 20, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants