Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use new nf-test features #6286

Merged
merged 61 commits into from
Nov 18, 2024
Merged

Use new nf-test features #6286

merged 61 commits into from
Nov 18, 2024

Conversation

edmundmiller
Copy link
Contributor

  • ci: Attempt to split everything out
  • ci: Add changed since, sharding, and ci
  • ci: Add filter to try to get jobs split up
  • ci: Switch to only-changed
  • ci: See if follow-dependencies works without "related-tests"

@edmundmiller
Copy link
Contributor Author

edmundmiller commented Aug 26, 2024

We do have adamrtalbot/detect-nf-test-changes as a fallback option?

That's what we're currently using, so I don't really see it as a fallback 😆 It works fine, I think the sharding can work better and waste less resources spinning up a runner for each test instead of spinning up a consistent amount per PR.

That would close #391 as well, and might encourage people to make smaller PRs if we're only giving them 9 CI runners per PR anyway.

@edmundmiller
Copy link
Contributor Author

Current things blocking this is nf-core/tools#3140 or we can give up and keep the paths-filter step for that

@edmundmiller edmundmiller force-pushed the fresh-ci branch 3 times, most recently from 9cd5419 to 8748f04 Compare September 18, 2024 21:01
@edmundmiller edmundmiller marked this pull request as ready for review September 18, 2024 21:03
@edmundmiller edmundmiller linked an issue Sep 18, 2024 that may be closed by this pull request
@edmundmiller
Copy link
Contributor Author

Removed the need to use nf-core lint with pre-commit, as that wasn't truely blocking this from getting merged.

We can make a follow up issue to run every linting step through pre-commit.

@maxulysse
Copy link
Member

Looking good, there's still some leftover comments in the nf-test yml, but I like it

@edmundmiller
Copy link
Contributor Author

edmundmiller commented Sep 24, 2024

@maxulysse I think I've address the comments that can be addressed.

@adamrtalbot I think this goes slightly against #4545 (comment) but I think it's worth splitting up since we should be running pytest-workflow less and less, and the sharding is a completely different flow. Though it will probably Start up the pytest-workflow workflow every PR...

Copy link
Contributor

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is getting stuck in a loop where there is too much data per runner and having no way of resolving it.

We over shard now, but at least it's a 1:1 relationship which makes it easy to diagnose an issue. We need to be confident we don't end up in a death loop in the middle of a hackathon.

In addition, we're over-provisioning 3 machines when 1 would suffice.

Still, this has fewer moving parts and is easier to replicate on a local machine, which makes it better under certain situations.

- uses: mirpedrol/paths-filter@main
id: filter
with:
filters: "tests/config/pytest_modules.yml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on the pytest_modules.yml tags, is that what you want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, probably just want it to run on any module that's changed. We could include both the nf-test and pytest_modules ymls though I think.

matrix:
filter: [process, workflow]
profile: [conda, docker_self_hosted, singularity]
shard: [1, 2, 3]
Copy link
Contributor

@adamrtalbot adamrtalbot Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 is a very small number for sharding.

If we edit UNTAR, we will run hundreds of tests packed onto the same machines. Remember, if too many tests are packed onto 1 machine there is no way to fix it other than editing this file which essentially blocks the person who opened the PR.

Conversely, if we run 1 nf-test we will add 2 machines that do nothing but install nf-test and close down.

This might be the most practical way of doing this, so I'm not 100% against it but what's the rationale for the number 3 and can we document it somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to set it dynamically based on number of tests with an estimate of how many we can pack onto a machine? e.g. n_tests %% 5?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, by making it dyanmic we're back to the current problem of having a dynamic workflow ID which causes issues with GH branch protection.

So let's go back to my initial point. If you modify a major process like UNTAR, will this still succeed or will it run out of storage? Remember every nf-test invocation downloads all the data it needs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to set it dynamically based on number of tests with an estimate of how many we can pack onto a machine? e.g. n_tests %% 5?

That would be the dream. But yeah I also wanted to avoid the dynamic number of jobs.

3 was just a random number that I picked. Trying to walk the line between runners sitting there, and not enough jobs.

Hot take: What if we just did a --dryRun at the beginning of the shard and if there's no jobs, exit before we set anything else up.

strategy:
fail-fast: false
matrix:
filter: [process, workflow]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filter: [process, workflow]
filter: [process, workflow]

Function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any in this repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep: https://github.com/nf-core/modules/blob/master/subworkflows/nf-core/utils_nfcore_pipeline/main.nf

Also, I'd like to encourage people to put this stuff in modules or a plugin so we don't fix it to the template.

Comment on lines +327 to +343
confirm-pass:
runs-on: ubuntu-latest
needs: [pytest-changes, pytest]
if: always()
steps:
- name: All tests ok
if: ${{ success() || !contains(needs.*.result, 'failure') }}
run: exit 0
- name: One or more tests failed
if: ${{ contains(needs.*.result, 'failure') }}
run: exit 1

- name: debug-print
if: always()
run: |
echo "toJSON(needs) = ${{ toJSON(needs) }}"
echo "toJSON(needs.*.result) = ${{ toJSON(needs.*.result) }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add one of these to every dynamic matrix, so it must be added to all the linting workflow as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I left it out orginally because I'm hoping to get everything to run in pre-commit and we can do away with the filters and make the branch protections happy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way of forcing someone to use a pre-commit and I'm wary of forcing people to install stuff on their laptops. In general, it's a good idea to verify code at the source of truth (github), not in isolation. It's much easier to prevent bad code being added then remove it later.

Perhaps we could run pre-commit before merging? Never really looked into it.

Copy link
Contributor Author

@edmundmiller edmundmiller Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/nf-core/modules/blob/3c464e75051db485c1b37ab9f1ea2182fb3d3533/.github/workflows/test.yml#L29C2-L39C36
We already are ✨

I was talking about pre-commit the tool, not a pre-commit git hook.

Agreed on not expecting people to jump through a ton of hoops to get their local dev environments set up. (But I think this might be over since gitpod switched to self-hosted, but they also adopt devcontainers, which if we can get those setup well, anyone with vscode and docker should be able to start hacking quickly.)

@adamrtalbot
Copy link
Contributor

Also - the GHA says it uses actions/setup-java v4 but so does master and they're in conflict 😬

You should add v4.4.0 for that action as a comment since you are referencing an explicit commit.

@adamrtalbot
Copy link
Contributor

Stress test: #6716

Copy link
Contributor

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just spotted. Currently the tests aren't doing anything (this is every test!):

🚀 nf-test 0.9.0
https://www.nf-test.com/
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr

Load .nf-test/plugins/nft-bam/0.3.0/nft-bam-0.3.0.jar
Load .nf-test/plugins/nft-vcf/1.0.7/nft-vcf-1.0.7.jar
Nothing to do.

.github/actions/nf-test-action/action.yml Outdated Show resolved Hide resolved
.github/workflows/lint.yml Outdated Show resolved Hide resolved
Co-authored-by: mashehu <mashehu@users.noreply.github.com>
nextflow secrets set SENTIEON_AUTH_DATA $(python3 tests/modules/nf-core/sentieon/license_message.py encrypt --key "$SENTIEON_ENCRYPTION_KEY" --message "$SENTIEON_LICENSE_MESSAGE")

# TODO If conda-fail.yml exists and matrix = conda skip
# Really we need in-verse tag selection https://github.com/askimed/nf-test/issues/260
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we wait for that, maybe we can add it here: adamrtalbot/detect-nf-test-changes#12

.github/workflows/nf-test.yml Outdated Show resolved Hide resolved
.github/workflows/pytest-workflow.yml Show resolved Hide resolved
.pre-commit-config.yaml Outdated Show resolved Hide resolved
apptainer.runOptions = '--no-mount tmp --writable-tmpfs --nv'
singularity.runOptions = '--no-mount tmp --writable-tmpfs --nv'
use_gpu = true
}
docker_self_hosted {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other gpu profile is outdated, needs platform argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cut the other one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well doesn't matter which one to keep, but make sure it has all the settings from the one in master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two on master though 😆

gpu {
docker.runOptions = '-u $(id -u):$(id -g) --gpus device=0'
apptainer.runOptions = '--no-mount tmp --writable-tmpfs --nv'
singularity.runOptions = '--no-mount tmp --writable-tmpfs --nv'
use_gpu = true
}
docker_self_hosted {
docker.enabled = true
docker.fixOwnership = true
docker.runOptions = '--platform=linux/amd64'
}
gpu {
docker.runOptions = '-u $(id -u):$(id -g) --gpus all'
apptainer.runOptions = '--nv'
singularity.runOptions = '--nv'
use_gpu = true
}

Co-authored-by: mashehu <mashehu@users.noreply.github.com>
@edmundmiller
Copy link
Contributor Author

Going to merge since the broken tests aren't in the scope, and hopefully I catch any buygs before Europe wakes up 🤞🏻

@edmundmiller edmundmiller merged commit 3a89745 into master Nov 18, 2024
21 of 35 checks passed
@edmundmiller edmundmiller deleted the fresh-ci branch November 18, 2024 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caused by: java.nio.channels.OverlappingFileLockException
5 participants