Commit Predictions File with Git Hash Link, Remove Artifact Upload (#…

…1350) This PR refactors the GitHub Actions workflow for the paper ranking script to ensure the predictions file is committed directly to the repository. The following changes were made: 1. Removed Artifact Upload Step: - The actions/upload-artifact step in the workflow is currently used to store files generated during the workflow. These artifacts are then available for download in the GitHub Actions interface under the "Artifacts" section for that workflow run. - The addition of the commit and push step below makes this step unnecessary and was removed to avoid redundant storage. 2. Added Commit and Push Step: - Introduced a new step to commit and push the predictions file directly to the `exports/analyses/paper_ranking/` directory in the repository. - This ensures that the predictions file is saved in the repository for future use if necessary. Testing via a forked repository successfully showed that the predictions file was saved to the correct directory once the workflow run was complete. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Benjamin M. Gyori <ben.gyori@gmail.com>
biopragmatics · Jan 20, 2025 · bfee1d4 · bfee1d4
1 parent 3bd2d2b
commit bfee1d4
Show file tree

Hide file tree

Showing 4 changed files with 23 additions and 12,617 deletions.
diff --git a/.github/workflows/paper_ranking.yml b/.github/workflows/paper_ranking.yml
@@ -5,6 +5,10 @@ on:
     - cron: '0 0 1 * *' # runs on the first day of every month
   workflow_dispatch:
 
+permissions:
+  contents: write
+  issues: write
+
 jobs:
   paper-ranking:
     runs-on: ubuntu-latest
@@ -43,11 +47,20 @@ jobs:
         # TODO update to using python -m
         python src/bioregistry/analysis/paper_ranking.py --start-date ${{ env.START_DATE }} --end-date ${{ env.END_DATE }}
 
-    - name: Upload Full List as Artifact
-      uses: actions/upload-artifact@v3
-      with:
-        name: full-predictions-list-${{ env.START_DATE }}-to-${{ env.END_DATE }}
-        path: exports/analyses/paper_ranking/predictions_${{ env.START_DATE }}_to_${{ env.END_DATE }}.tsv
+    - name: Configure Git
+      run: |
+        git config user.name "github-actions[bot]"
+        git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
+
+    - name: Commit and Push Changes
+      run: |
+        git add exports/analyses/paper_ranking/predictions.tsv
+        git commit -m "Update predictions file with papers between ${{ env.START_DATE }} and ${{ env.END_DATE }}"
+        git push
+
+    - name: Find Commit Hash
+      id: get-commit-hash
+      run: echo "COMMIT_HASH=$(git rev-parse HEAD)" >> $GITHUB_ENV
 
     - name: Find Existing Issue
       id: find-issue
@@ -74,15 +87,17 @@ jobs:
           const issueNumber = ${{ steps.find-issue.outputs.result }};
           const startDate = process.env.START_DATE;
           const endDate = process.env.END_DATE;
-          const content = fs.readFileSync(`exports/analyses/paper_ranking/predictions_${startDate}_to_${endDate}.tsv`, 'utf8');
+          const commitHash = process.env.COMMIT_HASH;
+          const rankingFileLink = `https://github.com/${{ github.repository }}/blob/${commitHash}/exports/analyses/paper_ranking/predictions.tsv`;
+          const content = fs.readFileSync(`exports/analyses/paper_ranking/predictions.tsv`, 'utf8');
           const lines = content.split('\n').slice(1, 21);
           const rows = lines.map(line => {
             const [pubmed, title] = line.split('\t');
             const link = `https://bioregistry.io/pubmed:${pubmed}`;
             return `| [${pubmed}](${link}) | ${title} |`;
           });
           const tableHeader = '| PubMed ID | Title |\n| --- | --- |\n';
-          const commentBody = `This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways: \n(1) as a new prefix for a resource that can be added to the Bioregistry,\n(2) as a provider for an existing prefix, or\n(3) as a new publication for an existing prefix already in the Bioregistry.\n\nThese curations can happen in separate issues and pull requests. The full list of ranked papers can be found [here](https://github.com/${{ github.repository }}/blob/main/exports/analyses/paper_ranking/predictions_${startDate}_to_${endDate}.tsv). If you review any of these papers for relevance, you should edit the curated papers file [here](https://github.com/${{ github.repository }}/blob/main/src/bioregistry/data/curated_papers.tsv); these curations are taken into account when retraining the ranking model.\n\n**New entries for ${startDate} to ${endDate}:**\n\n${tableHeader}${rows.join('\n')}`;
+          const commentBody = `This issue contains monthly updates to an automatically ranked list of PubMed papers as candidates for curation in the Bioregistry. Papers may be relevant in at least three ways: \n(1) as a new prefix for a resource that can be added to the Bioregistry,\n(2) as a provider for an existing prefix, or\n(3) as a new publication for an existing prefix already in the Bioregistry.\n\nThese curations can happen in separate issues and pull requests. The full list of ranked papers can be found [here](${rankingFileLink}). If you review any of these papers for relevance, you should edit the curated papers file [here](https://github.com/${{ github.repository }}/blob/main/src/bioregistry/data/curated_papers.tsv); these curations are taken into account when retraining the ranking model.\n\n**New entries for ${startDate} to ${endDate}:**\n\n${tableHeader}${rows.join('\n')}`;
 
           if (issueNumber) {
             await github.rest.issues.createComment({