-
Notifications
You must be signed in to change notification settings - Fork 35
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Concurrent CPU Integration Tests + Reuse Model Artifacts (#655)
# Description There are two things that get added in this PR: 1. Reuse model artifacts when multiple tests in the same module request export/compilation of the same artifacts 2. Add concurrency tests for `2, 4, and 8` requests at the same time. # Reusing model aritfacts Currently, our `cpu_llm_server_integration_tests` generate new model artifacts for each tests, even when they are requesting the exact same artifacts. This causes the tests to take much longer to run than they should, and makes it harder to add more tests without drastically increasing overall test time. We add a static `MODEL_DIR_CACHE`, which is just a hashmap that stores `{ request.params_hash: temporary_dir }`. If a test requests the same artifacts as a previous test, we reuse the already existing artifacts, instead of generating new ones. # Adding concurrency tests We recenly found a bug in concurrency with the Shortfin LLM Server. When sending multiple requests at the same time, we end up with responses that have incorrect tokens. This adds basic concurrent integration tests for 2, 4, and 8 requests sent in parallel. Currently, they are xfailed, but we will be able to use these to validate our fix, when we get there, and ensure that we don't have a regression in concurrency in the future. Will extend the `periodic SGLang Integration tests` to further test concurrency on GPU, with more complex prompts, but for a PR triggered test, this should serve as a good guard.
- Loading branch information
Showing
2 changed files
with
169 additions
and
71 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters