Skip to content

Commit

Permalink
Add Model Zoo testing support (#2990)
Browse files Browse the repository at this point in the history
  • Loading branch information
attila-dusnoki-htec authored Jul 2, 2024
1 parent 006dec2 commit 497c277
Show file tree
Hide file tree
Showing 28 changed files with 2,418 additions and 0 deletions.
4 changes: 4 additions & 0 deletions tools/model_zoo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Model Zoo

- [Test Generator with Datasets](./test_generator/)
- [ONNX Zoo](./onnx_zoo/)
50 changes: 50 additions & 0 deletions tools/model_zoo/onnx_zoo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# ONNX Zoo model tester

Helper script to test [`ONNX Zoo models`](https://onnx.ai/models/) which have test data with [`test_runner.py`](../../test_runner.py)

## Getting the repository

> [!IMPORTANT]
> Make sure to enable git-lfs.
```bash
git clone https://github.com/onnx/models.git --depth 1
```

## Running the tests

> [!IMPORTANT]
> The argument must point to a folder, not a file.
```bash
# VERBOSE=1 DEBUG=1 # use these for more log
# ATOL=0.001 RTOL=0.001 TARGET=gpu # are the default values
./test_models.sh models/validated
```

You can also pass multiple folders, e.g.:

```bash
./test_models.sh models/validated/text/machine_comprehension/t5/ models/validated/vision/classification/shufflenet/
```

## Results

Result are separated by dtype: `logs/fp32` and `logs/fp16`

### Helpers

```bash
# Something went wrong
grep -HRL PASSED logs
# Runtime error
grep -HRi RuntimeError logs/
# Accuracy issue
grep -HRl FAILED logs
```

## Cleanup

If at any point something fails, the following things might need cleanup:
- Remove `tmp_model` folder
- `git lfs prune` in `models`
118 changes: 118 additions & 0 deletions tools/model_zoo/onnx_zoo/test_models.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
#!/bin/bash

#####################################################################################
# The MIT License (MIT)
#
# Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
#
#####################################################################################

set -e

WORK_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
SCRIPT_PATH=$(dirname $(dirname $(dirname $(readlink -f "$0"))))/test_runner.py
TESTER_SCRIPT="${TESTER:-$SCRIPT_PATH}"
ATOL="${ATOL:-0.001}"
RTOL="${RTOL:-0.001}"
TARGET="${TARGET:-gpu}"

if [[ "${DEBUG:-0}" -eq 1 ]]; then
PIPE=/dev/stdout
else
PIPE=/dev/null
fi

if [[ "${VERBOSE:-0}" -eq 1 ]]; then
set -x
fi

# Iterate through input recursively, process any tar.gz file
function iterate() {
local dir="$1"

for file in "$dir"/*; do
if [ -f "$file" ]; then
if [[ $file = *.tar.gz ]]; then
process "$file"
fi
fi

if [ -d "$file" ]; then
iterate "$file"
fi
done
}

# Process will download the lfs file, extract model and test data
# Test it with test_runner.py, then cleanup
function process() {
local file="$1"
echo "INFO: process $file started"
setup $file
test $file fp32
test $file fp16
cleanup $file
echo "INFO: process $file finished"
}

# Download and extract files
function setup() {
local file="$1"
echo "INFO: setup $file"
local_file="$(basename $file)"
# We need to change the folder to pull the file
folder="$(cd -P -- "$(dirname -- "$file")" && pwd -P)"
cd $folder &> "${PIPE}" && git lfs pull --include="$local_file" --exclude="" &> "${PIPE}"; cd - &> "${PIPE}"
tar xzf $file -C $WORK_DIR/tmp_model &> "${PIPE}"
}

# Remove tmp files and prune models
function cleanup() {
local file="$1"
echo "INFO: cleanup $file"
# We need to change the folder to pull the file
folder="$(cd -P -- "$(dirname -- "$file")" && pwd -P)"
cd $folder &> "${PIPE}" && git lfs prune &> "${PIPE}"; cd - &> "${PIPE}"
rm -r $WORK_DIR/tmp_model/* &> "${PIPE}"
}

# Run test_runner.py and log if something goes wrong
function test() {
local file="$1"
echo "INFO: test $file ($2)"
local_file="$(basename $file)"
flag="--atol $ATOL --rtol $RTOL --target $TARGET"
if [[ "$2" = "fp16" ]]; then
flag="$flag --fp16"
fi
EXIT_CODE=0
python3 $TESTER_SCRIPT ${flag} $WORK_DIR/tmp_model/*/ &> "$WORK_DIR/logs/$2/${local_file//\//_}.log" || EXIT_CODE=$?
if [[ "${EXIT_CODE:-0}" -ne 0 ]]; then
echo "WARNING: ${file} failed ($2)"
fi
}

mkdir -p $WORK_DIR/logs/fp32/ $WORK_DIR/logs/fp16/ $WORK_DIR/tmp_model
rm -fr $WORK_DIR/tmp_model/*

for arg in "$@"; do
iterate "$(dirname $(readlink -e $arg))/$(basename $arg)"
done
117 changes: 117 additions & 0 deletions tools/model_zoo/test_generator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Test Generator with Datasets

Helper module to generate real samples from datasets for specific models.

## Prerequisites

```bash
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
```

To use audio based datasets, install sndfile
```bash
apt install libsndfile1
```

## Usage

```bash
usage: generate.py [-h]
[--image {all,none,...}]
[--text {all,none,...}]
[--audio {all,none,...}]
[--output-folder-prefix OUTPUT_FOLDER_PREFIX]
[--sample-limit SAMPLE_LIMIT]
[--decode-limit DECODE_LIMIT]

optional arguments:
-h, --help show this help message and exit
--image {all,none,...}
Image models to test with imagenet-2012-val dataset samples
--text {all,none,...}
Text models to test with squad-hf dataset samples
--audio {all,none,...}
Audio models to test with librispeech-asr dataset samples
--output-folder-prefix OUTPUT_FOLDER_PREFIX
Output path will be "<this-prefix>/<dataset-name>/<model-name>"
--sample-limit SAMPLE_LIMIT
Max number of samples generated. Use 0 to ignore it.
--decode-limit DECODE_LIMIT
Max number of sum-samples generated for decoder models. Use 0 to ignore it. (Only for decoder models)
```

> [!NOTE]
> Some models require permission to access, use `huggingface-cli login`.
To generate everything:
```bash
python generate.py
```

To generate a subset of the supported models:
- `none` to skip it
- `all` for every models
- <name> list supported model names

```bash
python generate.py --image resnet50_v1.5 clip-vit-large-patch14 --text none --audio none
```

## Test models

`test_models.sh` will run all downloaded models on the `generated` samples. The result will be in `logs`.

```bash
./test_models.sh generated/
```

> [!NOTE]
> `generated` is the default output folder, make sure to match `--output-folder-prefix` name.
## Adding more models

To add mode models, first choose the proper place:
- [image](./sample_generator/model/image.py)
- [text](./sample_generator/model/text.py)
- [audio](./sample_generator/model/audio.py)
- [hybrid](./sample_generator/model/hybrid.py)

For example, adding basic would be this (e.g. ResNet):

```python
class ResNet50_v1_5(OptimumHFModelDownloadMixin,
AutoImageProcessorHFMixin, BaseModel):
@property
def model_id(self):
return "microsoft/resnet-50"

@staticmethod
def name():
return "resnet50_v1.5"
```

Define the class with the proper `Mixin`s:
- `OptimumHFModelDownloadMixin`: Download model from Hugging Face and export it to onnx with Optimum
- `AutoImageProcessorHFMixin`: Define the processor from Hugging Face (This depends on the model type)
- `BaseModel`: Default model type, other choice is `DecoderModel`

Provide 2 mandatory fields:
- `model_id`: Hugging Face url
- `name`: unique name for model

To add a more complex model (e.g. Decoder), check [text](./sample_generator/model/text.py).

The [generate](./generate.py) part will need further updating to include the model.

## Adding more datasets

The 3 most common use cases are handled:
- `Image`: with [imagenet](./sample_generator/dataset/imagenet.py)
- `Text`: with [squad](./sample_generator/dataset/squad.py)
- `Audio`: with [librispeech](./sample_generator/dataset/librispeech.py)

To add a new use case, e.g. Video, create a new python file in dataset, and inherit a new class from Base.

The [generate](./generate.py) part will need further updating to include the dataset.
Loading

0 comments on commit 497c277

Please sign in to comment.