-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create pkgci.yml and pkgci_build_packages.yml. #589
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just two minor things.
elif [[ "${ARCH}" == "aarch64" ]]; then | ||
# Latest version of ccache is not released for arm64, built it | ||
git clone --depth 1 --branch "v${CCACHE_VERSION}" https://github.com/ccache/ccache.git | ||
mkdir -p ccache/build && cd "$_" | ||
cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release .. | ||
ninja | ||
cp ccache /usr/bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need aarch64
support? We probably can just install ccache for x86_64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might at some point. I've written this code a few times before, most recently at https://github.com/iree-org/base-docker-images/blob/main/build_tools/install_ccache.sh, and supporting both architectures isn't much extra code. Better to write cross-platform/architecture code whenever possible instead of artificially limiting ourselves.
I plan on updating https://github.com/nod-ai/base-docker-images/blob/main/dockerfiles/manylinux_x86_64.Dockerfile to more closely match https://github.com/iree-org/base-docker-images/blob/main/dockerfiles/manylinux_x86_64.Dockerfile, as part of upgrading from manylinux2014 to manylinux_2_28:
shark-ai/shortfin/build_tools/build_linux_package.sh
Lines 39 to 43 in 779adc3
# TODO(#130): Update to manylinux_2_28, upstream or a fork | |
# * upstream uses a version of gcc that has build warnings/errors | |
# * https://github.com/nod-ai/base-docker-images is a bit out of date but can include a recent clang | |
# MANYLINUX_DOCKER_IMAGE="${MANYLINUX_DOCKER_IMAGE:-quay.io/pypa/manylinux_2_28_${ARCH}:latest}" | |
MANYLINUX_DOCKER_IMAGE="${MANYLINUX_DOCKER_IMAGE:-quay.io/pypa/manylinux2014_${ARCH}:latest}" |
Co-authored-by: Marius Brehler <marius.brehler@gmail.com>
Next steps with this:
|
Picking this back up now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely excited for every bit of CI time reduction.
@@ -75,6 +89,23 @@ function run_in_docker() { | |||
echo "Using python versions: ${PYTHON_VERSIONS}" | |||
local orig_path="${PATH}" | |||
|
|||
# Configure caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES!
#646) Splitting this off from #589 to make progress on #584. Tested with ``` CACHE_DIR=/tmp/shortfin/ sudo -E ./shortfin/build_tools/build_linux_package.sh + ccache --show-stats Cacheable calls: 626 / 636 (98.43%) Hits: 2 / 626 ( 0.32%) Direct: 2 / 2 (100.0%) Preprocessed: 0 / 2 ( 0.00%) Misses: 624 / 626 (99.68%) Uncacheable calls: 10 / 636 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 3.10%) Hits: 2 / 626 ( 0.32%) Misses: 624 / 626 (99.68%) + ccache --show-stats ccache stats: Cacheable calls: 1252 / 1272 (98.43%) Hits: 550 / 1252 (43.93%) Direct: 550 / 550 (100.0%) Preprocessed: 0 / 550 ( 0.00%) Misses: 702 / 1252 (56.07%) Uncacheable calls: 20 / 1272 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 4.11%) Hits: 550 / 1252 (43.93%) Misses: 702 / 1252 (56.07%) + ccache --show-stats Cacheable calls: 1878 / 1908 (98.43%) Hits: 1098 / 1878 (58.47%) Direct: 1098 / 1098 (100.0%) Preprocessed: 0 / 1098 ( 0.00%) Misses: 780 / 1878 (41.53%) Uncacheable calls: 30 / 1908 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 5.12%) Hits: 1098 / 1878 (58.47%) Misses: 780 / 1878 (41.53%) CACHE_DIR=/tmp/shortfin/ sudo -E ./shortfin/build_tools/build_linux_package.sh + ccache --show-stats ccache stats: Cacheable calls: 3756 / 3816 (98.43%) Hits: 2820 / 3756 (75.08%) Direct: 2820 / 2820 (100.0%) Preprocessed: 0 / 2820 ( 0.00%) Misses: 936 / 3756 (24.92%) Uncacheable calls: 60 / 3816 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 5.19%) Hits: 2820 / 3756 (75.08%) Misses: 936 / 3756 (24.92%) ``` So we have multiple configurations getting built (Python versions, tracing enable/disabled), but we still get a reasonable number of cache hits. Definitely room to improve there, but better than nothing.
Progress on #584. ~~Depends on #666 (the first commit).~~ This is refactors the `build_packages.yml` workflow so it can be used via `workflow_call` as part of a "pkgci" setup, as an alternative to creating a new `pkgci_build_packages.yml` workflow as originally proposed in #589. This lets us reuse the same workflow for building stable, nightly, and dev packages, all across the same matrix of Python versions and operating systems. Package builds take about 2 minutes (wall time) across the full matrix, so we might as well build them all, instead of artificially constraining ourselves to a subset like only Linux on Python 3.11. Triggers for the workflow are now this: Trigger | Scenario | Build type(s) -- | -- | -- `schedule` | Nightly pre-release build | `rc` `workflow_dispatch` | Workflow testing, manual releasing | `rc` default, `stable` and `dev` possible `workflow_call` | Pull request or push "pkgci" dev builds | `dev` default, `stable` and `rc` possible With this workflow behavior: Build type | Version suffix | Cache enabled? | Tracing enabled? | Pushes to release? -- | -- | -- | -- | -- `stable` | None | No | Yes | No `rc` | `rcYYYYMMDD` | No | Yes | Yes `dev` | `.dev0+${{ github.sha }}` | Yes | No | No Tested over at https://github.com/ScottTodd/shark-ai/actions/workflows/build_packages.yml. Example run: https://github.com/ScottTodd/shark-ai/actions/runs/12245900071 (warm cache)
#646) Splitting this off from #589 to make progress on #584. Tested with ``` CACHE_DIR=/tmp/shortfin/ sudo -E ./shortfin/build_tools/build_linux_package.sh + ccache --show-stats Cacheable calls: 626 / 636 (98.43%) Hits: 2 / 626 ( 0.32%) Direct: 2 / 2 (100.0%) Preprocessed: 0 / 2 ( 0.00%) Misses: 624 / 626 (99.68%) Uncacheable calls: 10 / 636 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 3.10%) Hits: 2 / 626 ( 0.32%) Misses: 624 / 626 (99.68%) + ccache --show-stats ccache stats: Cacheable calls: 1252 / 1272 (98.43%) Hits: 550 / 1252 (43.93%) Direct: 550 / 550 (100.0%) Preprocessed: 0 / 550 ( 0.00%) Misses: 702 / 1252 (56.07%) Uncacheable calls: 20 / 1272 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 4.11%) Hits: 550 / 1252 (43.93%) Misses: 702 / 1252 (56.07%) + ccache --show-stats Cacheable calls: 1878 / 1908 (98.43%) Hits: 1098 / 1878 (58.47%) Direct: 1098 / 1098 (100.0%) Preprocessed: 0 / 1098 ( 0.00%) Misses: 780 / 1878 (41.53%) Uncacheable calls: 30 / 1908 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 5.12%) Hits: 1098 / 1878 (58.47%) Misses: 780 / 1878 (41.53%) CACHE_DIR=/tmp/shortfin/ sudo -E ./shortfin/build_tools/build_linux_package.sh + ccache --show-stats ccache stats: Cacheable calls: 3756 / 3816 (98.43%) Hits: 2820 / 3756 (75.08%) Direct: 2820 / 2820 (100.0%) Preprocessed: 0 / 2820 ( 0.00%) Misses: 936 / 3756 (24.92%) Uncacheable calls: 60 / 3816 ( 1.57%) Local storage: Cache size (GB): 0.1 / 2.0 ( 5.19%) Hits: 2820 / 3756 (75.08%) Misses: 936 / 3756 (24.92%) ``` So we have multiple configurations getting built (Python versions, tracing enable/disabled), but we still get a reasonable number of cache hits. Definitely room to improve there, but better than nothing.
Progress on nod-ai#584. ~~Depends on nod-ai#666 (the first commit).~~ This is refactors the `build_packages.yml` workflow so it can be used via `workflow_call` as part of a "pkgci" setup, as an alternative to creating a new `pkgci_build_packages.yml` workflow as originally proposed in nod-ai#589. This lets us reuse the same workflow for building stable, nightly, and dev packages, all across the same matrix of Python versions and operating systems. Package builds take about 2 minutes (wall time) across the full matrix, so we might as well build them all, instead of artificially constraining ourselves to a subset like only Linux on Python 3.11. Triggers for the workflow are now this: Trigger | Scenario | Build type(s) -- | -- | -- `schedule` | Nightly pre-release build | `rc` `workflow_dispatch` | Workflow testing, manual releasing | `rc` default, `stable` and `dev` possible `workflow_call` | Pull request or push "pkgci" dev builds | `dev` default, `stable` and `rc` possible With this workflow behavior: Build type | Version suffix | Cache enabled? | Tracing enabled? | Pushes to release? -- | -- | -- | -- | -- `stable` | None | No | Yes | No `rc` | `rcYYYYMMDD` | No | Yes | Yes `dev` | `.dev0+${{ github.sha }}` | Yes | No | No Tested over at https://github.com/ScottTodd/shark-ai/actions/workflows/build_packages.yml. Example run: https://github.com/ScottTodd/shark-ai/actions/runs/12245900071 (warm cache)
Progress on #584. ~~Depends on #666 (the first commit).~~ This is refactors the `build_packages.yml` workflow so it can be used via `workflow_call` as part of a "pkgci" setup, as an alternative to creating a new `pkgci_build_packages.yml` workflow as originally proposed in #589. This lets us reuse the same workflow for building stable, nightly, and dev packages, all across the same matrix of Python versions and operating systems. Package builds take about 2 minutes (wall time) across the full matrix, so we might as well build them all, instead of artificially constraining ourselves to a subset like only Linux on Python 3.11. Triggers for the workflow are now this: Trigger | Scenario | Build type(s) -- | -- | -- `schedule` | Nightly pre-release build | `rc` `workflow_dispatch` | Workflow testing, manual releasing | `rc` default, `stable` and `dev` possible `workflow_call` | Pull request or push "pkgci" dev builds | `dev` default, `stable` and `rc` possible With this workflow behavior: Build type | Version suffix | Cache enabled? | Tracing enabled? | Pushes to release? -- | -- | -- | -- | -- `stable` | None | No | Yes | No `rc` | `rcYYYYMMDD` | No | Yes | Yes `dev` | `.dev0+${{ github.sha }}` | Yes | No | No Tested over at https://github.com/ScottTodd/shark-ai/actions/workflows/build_packages.yml. Example run: https://github.com/ScottTodd/shark-ai/actions/runs/12245900071 (warm cache)
… build packages once (#780) This builds on #625, #589 to make progress on issue #584. This adds a pkgci.yml to run multiple package-based CI tasks after building package using Scott's changes in #667. This gives us the following benefits: * Integration test workflows are faster because they now use dev packages, without needing to build them from source or use editable installs. Also, if more integration tests are added, they can reuse the built packages. * Users and developers can access the same dev packages to reproduce CI results * Only one runner needs the build requirements (potentially including clang, ninja, CMake, Rust, etc.), other runners only need Python. This also switches to using uv to create venvs, which is faster. This PR brings shortfin CPU LLM CI time to roughly half an hour on the mi250 runner to a few seconds of package build (fast due to caching) and around 5 minutes of testing. --------- Co-authored-by: Scott Todd <scott.todd0@gmail.com>
Progress on #584.
Summary
.github/workflows/pkgci_build_packages.yml
that buildssharktank
,shortfin
, andshark-ai
dev packages and upload them to GitHub artifacts..github/workflows/pkgci.yml
that just runspkgci_build_packages.yml
for now. Other jobs can be migrated to depend on that job and use/test the packages.Other details
ubuntu-24.04
runners (test logs here).SHORTFIN_ENABLE_TRACING
setting through scripts / Docker. For dev packages we can keep tracing disabled (unless there is a clear reason to add it). If the cache hit rate improves then we might be able to enable tracing for low cost.cache: "pip"
frombuild_packages.yml
since it is counterproductive for a job that only installspackaging
. Multiple workflows seem to be writing to the same cache and I see no way to customize the cache key. That, or the cache is unnecessarily large and we just need to prune it manually.