Skip to content

Commit

Permalink
Commit Predictions File with Git Hash Link, Remove Artifact Upload (#…
Browse files Browse the repository at this point in the history
…1350)

This PR refactors the GitHub Actions workflow for the paper ranking
script to ensure the predictions file is committed directly to the
repository. The following changes were made:

1. Removed Artifact Upload Step:
- The actions/upload-artifact step in the workflow is currently used to
store files generated during the workflow. These artifacts are then
available for download in the GitHub Actions interface under the
"Artifacts" section for that workflow run.
- The addition of the commit and push step below makes this step
unnecessary and was removed to avoid redundant storage.

2. Added Commit and Push Step:
- Introduced a new step to commit and push the predictions file directly
to the `exports/analyses/paper_ranking/` directory in the repository.
- This ensures that the predictions file is saved in the repository for
future use if necessary.

Testing via a forked repository successfully showed that the predictions
file was saved to the correct directory once the workflow run was
complete.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin M. Gyori <ben.gyori@gmail.com>
  • Loading branch information
3 people authored Jan 20, 2025
1 parent 3bd2d2b commit bfee1d4
Show file tree
Hide file tree
Showing 4 changed files with 23 additions and 12,617 deletions.
29 changes: 22 additions & 7 deletions .github/workflows/paper_ranking.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ on:
- cron: '0 0 1 * *' # runs on the first day of every month
workflow_dispatch:

permissions:
contents: write
issues: write

jobs:
paper-ranking:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -43,11 +47,20 @@ jobs:
# TODO update to using python -m
python src/bioregistry/analysis/paper_ranking.py --start-date ${{ env.START_DATE }} --end-date ${{ env.END_DATE }}
- name: Upload Full List as Artifact
uses: actions/upload-artifact@v3
with:
name: full-predictions-list-${{ env.START_DATE }}-to-${{ env.END_DATE }}
path: exports/analyses/paper_ranking/predictions_${{ env.START_DATE }}_to_${{ env.END_DATE }}.tsv
- name: Configure Git
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
- name: Commit and Push Changes
run: |
git add exports/analyses/paper_ranking/predictions.tsv
git commit -m "Update predictions file with papers between ${{ env.START_DATE }} and ${{ env.END_DATE }}"
git push
- name: Find Commit Hash
id: get-commit-hash
run: echo "COMMIT_HASH=$(git rev-parse HEAD)" >> $GITHUB_ENV

- name: Find Existing Issue
id: find-issue
Expand All @@ -74,15 +87,17 @@ jobs:
const issueNumber = ${{ steps.find-issue.outputs.result }};
const startDate = process.env.START_DATE;
const endDate = process.env.END_DATE;
const content = fs.readFileSync(`exports/analyses/paper_ranking/predictions_${startDate}_to_${endDate}.tsv`, 'utf8');
const commitHash = process.env.COMMIT_HASH;
const rankingFileLink = `https://github.com/${{ github.repository }}/blob/${commitHash}/exports/analyses/paper_ranking/predictions.tsv`;
const content = fs.readFileSync(`exports/analyses/paper_ranking/predictions.tsv`, 'utf8');
const lines = content.split('\n').slice(1, 21);
const rows = lines.map(line => {
const [pubmed, title] = line.split('\t');
const link = `https://bioregistry.io/pubmed:${pubmed}`;
return `| [${pubmed}](${link}) | ${title} |`;
});
const tableHeader = '| PubMed ID | Title |\n| --- | --- |\n';
const commentBody = `This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways: \n(1) as a new prefix for a resource that can be added to the Bioregistry,\n(2) as a provider for an existing prefix, or\n(3) as a new publication for an existing prefix already in the Bioregistry.\n\nThese curations can happen in separate issues and pull requests. The full list of ranked papers can be found [here](https://github.com/${{ github.repository }}/blob/main/exports/analyses/paper_ranking/predictions_${startDate}_to_${endDate}.tsv). If you review any of these papers for relevance, you should edit the curated papers file [here](https://github.com/${{ github.repository }}/blob/main/src/bioregistry/data/curated_papers.tsv); these curations are taken into account when retraining the ranking model.\n\n**New entries for ${startDate} to ${endDate}:**\n\n${tableHeader}${rows.join('\n')}`;
const commentBody = `This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways: \n(1) as a new prefix for a resource that can be added to the Bioregistry,\n(2) as a provider for an existing prefix, or\n(3) as a new publication for an existing prefix already in the Bioregistry.\n\nThese curations can happen in separate issues and pull requests. The full list of ranked papers can be found [here](${rankingFileLink}). If you review any of these papers for relevance, you should edit the curated papers file [here](https://github.com/${{ github.repository }}/blob/main/src/bioregistry/data/curated_papers.tsv); these curations are taken into account when retraining the ranking model.\n\n**New entries for ${startDate} to ${endDate}:**\n\n${tableHeader}${rows.join('\n')}`;
if (issueNumber) {
await github.rest.issues.createComment({
Expand Down
Loading

0 comments on commit bfee1d4

Please sign in to comment.