-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use venv in CI #379
Use venv in CI #379
Conversation
.github/workflows/perf.yaml
Outdated
source ${VENV_DIR}/bin/activate | ||
echo PATH=$PATH >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not keep the venv active across job steps from what I can tell.
See that the logs are not using .turbine-venv
: https://github.com/iree-org/iree-turbine/actions/runs/12714371230/job/35446471647?pr=379#step:5:20
I usually explicitly activate the venv in each step that needs it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what's going on, /bin/activate
doesn't do anything special besides updating PATH
and unsetting PYTHONHOME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, how PATH
env should look like
PATH=/home/runner/work/iree-turbine/iree-turbine/.turbine-venv/bin:/opt/hostedtoolcache/Python/3.11.11/x64/bin:/opt/hostedtoolcache/Python/3.11.11/x64:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
And what it's getting instead in next step
PATH=/opt/hostedtoolcache/Python/3.11.11/x64/bin:/opt/hostedtoolcache/Python/3.11.11/x64:/home/runner/work/iree-turbine/iree-turbine/.turbine-venv/bin:/opt/hostedtoolcache/Python/3.11.11/x64/bin:/opt/hostedtoolcache/Python/3.11.11/x64:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
Our path is still here, but system python was prepended to path again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, echo "$VENV_DIR/bin" >> $GITHUB_PATH
seems to work and it broke torch as it cannot find GPU anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what's going on,
/bin/activate
doesn't do anything special besides updatingPATH
and unsettingPYTHONHOME
It also sets the VIRTUAL_ENV
environment variable, which some tools look for. It's safer to just source the activate script in each step that needs it IMO.
Haha,
echo "$VENV_DIR/bin" >> $GITHUB_PATH
seems to work and it broke torch as it cannot find GPU anymore
That is overwriting PATH
, not appending to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is overwriting PATH, not appending to it?
It's appending, according to docs
echo "$HOME/.local/bin" >> "$GITHUB_PATH"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, gpu tests were failing because we were installing cpu torch first and then pip install -r pytorch-rocm-requirements.txt
was doing nothing. Now things seems to work (besides usual flaky attention tests), PTAL.
.github/workflows/ci.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the jobs in this file run on ephemeral github-hosted machines, using a virtual environment is less important. Could keep the file as-is, with pip caching.
Not that much of a difference in time though:
- Before: https://github.com/iree-org/iree-turbine/actions/runs/12675726967/job/35327371540 (restores a 200MB cache from github then installs packages from it, takes ~20 seconds)
- This PR: https://github.com/iree-org/iree-turbine/actions/runs/12714371228/job/35444294575?pr=379 (downloads packages from pypi and installs them, takes ~20 seconds)
# Note: We install in three steps in order to satisfy requirements | ||
# from non default locations first. Installing the PyTorch CPU | ||
# wheels saves multiple minutes and a lot of bandwidth on runner setup. | ||
pip install --no-compile -r pytorch-rocm-requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is pretty slow on the mi250 runner (11m30s to install deps for 5 minutes of running tests): https://github.com/iree-org/iree-turbine/actions/runs/12757197012/job/35556958617?pr=379
We noticed similar setup time issues over at nod-ai/shark-ai#780. One solution there was to use a different runner, but the coverage on multiple accelerator types is useful here. We could also try using a cache and/or https://github.com/astral-sh/uv instead of pip.
I still think it's worth using a venv instead of installing packages on persistent runners at the system level, though I do wish the default install steps for these requirements was faster. What do you think? Is this workflow performance acceptable for developers working in iree-turbine, or should we iterate some more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if setup-python
s cache: 'pip'
can make any difference. But I agree that using venv is better than system packages. @harsh-nod @raikonenfnu FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tried the cache provided by setup-python
on nod-ai/shark-ai#640 and found that it wasn't compatible or at least didn't help.
Check the timestamps in the logs to see what is taking time. It looks like the downloads were fast or already cached, but the install itself was slow. Switching from pip to uv can help with install time (sometimes by a factor of 10-100x)
Mon, 13 Jan 2025 23:06:58 GMT Collecting pytorch-triton-rocm==3.1.0 (from torch>=2.3.0->-r pytorch-rocm-requirements.txt (line 2))
Mon, 13 Jan 2025 23:06:58 GMT Downloading https://download.pytorch.org/whl/pytorch_triton_rocm-3.1.0-cp311-cp311-linux_x86_64.whl (344.9 MB)
Mon, 13 Jan 2025 23:07:01 GMT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 344.9/344.9 MB 123.0 MB/s eta 0:00:00
Mon, 13 Jan 2025 23:07:01 GMT Collecting sympy==1.13.1 (from torch>=2.3.0->-r pytorch-rocm-requirements.txt (line 2))
...
Mon, 13 Jan 2025 23:07:03 GMT Installing collected packages: mpmath, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, fsspec, filelock, pytorch-triton-rocm, jinja2, torch, torchvision, torchaudio
Mon, 13 Jan 2025 23:14:18 GMT Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.3 pillow-10.2.0 pytorch-triton-rocm-3.1.0 sympy-1.13.1 torch-2.5.1+rocm6.2 torchaudio-2.5.1+rocm6.2 torchvision-0.20.1+rocm6.2 typing-extensions-4.9.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, mi250 runner spent most of the install time in
Tue, 14 Jan 2025 00:43:03 GMT changing mode of /groups/aig_sharks/actions-runner-iree/_work/iree-turbine/iree-turbine/.turbine-venv/bin/proton-viewer to 755
Tue, 14 Jan 2025 00:48:43 GMT changing mode of /groups/aig_sharks/actions-runner-iree/_work/iree-turbine/iree-turbine/.turbine-venv/bin/convert-caffe2-to-onnx to 755
But mi300 didn't
Tue, 14 Jan 2025 00:42:27 GMT changing mode of /home/sai/actions-runner-iree-turbine/_work/iree-turbine/iree-turbine/.turbine-venv/bin/proton-viewer to 755
Tue, 14 Jan 2025 00:43:18 GMT changing mode of /home/sai/actions-runner-iree-turbine/_work/iree-turbine/iree-turbine/.turbine-venv/bin/convert-caffe2-to-onnx to 755
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ScottTodd We decided we can live with that times, let's merge it.
This reverts commit 4ba07eb. Signed-off-by: Ivan Butygin <ivan.butygin@gmail.com>
No description provided.