Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ci-shark-ai.yml to use pkgci_shark_ai.yml so that we only build packages once #780

Merged
merged 29 commits into from
Jan 10, 2025

Conversation

renxida
Copy link
Contributor

@renxida renxida commented Jan 8, 2025

This builds on #625, #589 to make progress on issue #584.

This adds a pkgci.yml to run multiple package-based CI tasks after building package using Scott's changes in #667. This gives us the following benefits:

  • Integration test workflows are faster because they now use dev packages, without needing to build them from source or use editable installs. Also, if more integration tests are added, they can reuse the built packages.
  • Users and developers can access the same dev packages to reproduce CI results
  • Only one runner needs the build requirements (potentially including clang, ninja, CMake, Rust, etc.), other runners only need Python.

This also switches to using uv to create venvs, which is faster.

This PR brings shortfin CPU LLM CI time to roughly half an hour on the mi250 runner to a few seconds of package build (fast due to caching) and around 5 minutes of testing.

@renxida renxida changed the title [skip ci] initial commit using scotts code Convert ci-shark-ai.yml to pkgci_shark_ai.yml Jan 8, 2025
@renxida
Copy link
Contributor Author

renxida commented Jan 8, 2025

TODO: add tracing related script changes from #625

@renxida
Copy link
Contributor Author

renxida commented Jan 8, 2025

TODO: add venv creation script improvements scott made in iree

@renxida renxida requested a review from ScottTodd January 9, 2025 17:07
@renxida renxida marked this pull request as ready for review January 9, 2025 17:07
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few rough edges here that we can probably ignore for a bit, since the workflow latency savings are well worth it.

.github/workflows/pkgci_shark_ai.yml Outdated Show resolved Hide resolved
.github/workflows/pkgci_shark_ai.yml Outdated Show resolved Hide resolved
build_tools/pkgci/setup_venv.py Outdated Show resolved Hide resolved
pytorch-cpu-requirements.txt Outdated Show resolved Hide resolved
@ScottTodd
Copy link
Member

I made a few small edits to the PR description. You could also pull some of the text (with updated metrics) from #589 and #625.

@renxida renxida requested a review from ScottTodd January 9, 2025 19:49
.github/workflows/pkgci_shark_ai.yml Outdated Show resolved Hide resolved
build_tools/pkgci/setup_venv.py Outdated Show resolved Hide resolved
@renxida
Copy link
Contributor Author

renxida commented Jan 9, 2025

UV issue solved! Weirdly, even after adding a cleaning step, checkout still takes a long time at git clean -ffdx

There's an option here to turn it off but i'm not sure it's a good idea to do that.

What do you think @ScottTodd ? Merge as is or keep trying to make checkout faster?

Incidentally, I tried running it on the standard Ubuntu runners and checkout is blazing fast : /

@ScottTodd
Copy link
Member

We'd need to debug those runner machines themselves to see why the cleanup step part of the checkout action is taking so long. We can keep investigation separate from this PR. Could also use mi300 instead of mi250 machines...

Comment on lines 97 to 99
- name: Clean up repo to make next checkout faster
run: |
git clean -ffdx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Going to remove this and collect this into a new issue and ask Sai for help.

.github/workflows/pkgci_shark_ai.yml Outdated Show resolved Hide resolved
.github/workflows/pkgci_shark_ai.yml Outdated Show resolved Hide resolved
@renxida renxida requested a review from ScottTodd January 10, 2025 00:38
@ScottTodd
Copy link
Member

I made a few small edits to the PR description. You could also pull some of the text (with updated metrics) from #589 and #625.

Bump ^ can you add more information to the PR description?

@renxida renxida changed the title Convert ci-shark-ai.yml to pkgci_shark_ai.yml Convert ci-shark-ai.yml to use pkgci_shark_ai.yml so that we only build packages once Jan 10, 2025
@renxida renxida merged commit 8de90a5 into nod-ai:main Jan 10, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants