Skip to content

TorchVision 0.13, including new Multi-weights API, new pre-trained weights, and more

Compare
Choose a tag to compare
@NicolasHug NicolasHug released this 28 Jun 16:45
· 1218 commits to main since this release
da3794e

Highlights

Models

Multi-weight support API

TorchVision v0.13 offers a new Multi-weight support API for loading different weights to the existing model builder methods:

from torchvision.models import *

# Old weights with accuracy 76.130%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)

# New weights with accuracy 80.858%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)

# Best available weights (currently alias for IMAGENET1K_V2)
# Note that these weights may change across versions
resnet50(weights=ResNet50_Weights.DEFAULT)

# Strings are also supported
resnet50(weights="IMAGENET1K_V2")

# No weights - random initialization
resnet50(weights=None)

The new API bundles along with the weights important details such as the preprocessing transforms and meta-data such as labels. Here is how to make the most out of it:

from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights

img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")

# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}%")

You can read more about the new API in the docs. To provide your feedback, please use this dedicated Github issue.

New architectures and model variants

Classification

The Swin Transformer and EfficienetNetV2 are two popular classification models which are often used for downstream vision tasks. This release includes 6 pre-trained weights for their classification variants. Here is how to use the new models:

import torch
from torchvision.models import *

image = torch.rand(1, 3, 224, 224)
model = swin_t(weights="DEFAULT").eval()
prediction = model(image)

image = torch.rand(1, 3, 384, 384)
model = efficientnet_v2_s(weights="DEFAULT").eval()
prediction = model(image)

In addition to the above, we also provide new variants for existing architectures such as ShuffleNetV2, ResNeXt and MNASNet. The accuracies of all the new pre-trained models obtained on ImageNet-1K are seen below:

Model Acc@1 Acc@5
swin_t 81.474 95.776
swin_s 83.196 96.36
swin_b 83.582 96.64
efficientnet_v2_s 84.228 96.878
efficientnet_v2_m 85.112 97.156
efficientnet_v2_l 85.808 97.788
resnext101_64x4d 83.246 96.454
resnext101_64x4d (quantized) 82.898 96.326
shufflenet_v2_x1_5 72.996 91.086
shufflenet_v2_x1_5 (quantized) 72.052 90.700
shufflenet_v2_x2_0 76.230 93.006
shufflenet_v2_x2_0 (quantized) 75.354 92.488
mnasnet0_75 71.180 90.496
mnas1_3 76.506 93.522

We would like to thank Hu Ye for contributing to TorchVision the Swin Transformer implementation.

[BETA] Object Detection and Instance Segmentation

We have introduced 3 new model variants for RetinaNet, FasterRCNN and MaskRCNN that include several post-paper architectural optimizations and improved training recipes. All models can be used similarly:

import torch
from torchvision.models.detection import *

images = [torch.rand(3, 800, 600)]
model = retinanet_resnet50_fpn_v2(weights="DEFAULT")
# model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT")
# model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT")
model.eval()
prediction = model(images)

Below we present the metrics of the new variants on COCO val2017. In parenthesis we denote the improvement over the old variants:

Model Box mAP Mask mAP
retinanet_resnet50_fpn_v2 41.5 (+5.1) -
fasterrcnn_resnet50_fpn_v2 46.7 (+9.7) -
maskrcnn_resnet50_fpn_v2 47.4 (+9.5) 41.8 (+7.2)

We would like to thank Ross Girshick, Piotr Dollar, Vaibhav Aggarwal, Francisco Massa and Hu Ye for their past research and contributions to this work.

New pre-trained weights

SWAG weights

The ViT and RegNet model variants offer new pre-trained SWAG (Supervised Weakly from hashtAGs) weights. One of the biggest of these models achieves a whopping 88.6% accuracy on ImageNet-1K. We currently offer two versions of the weights: 1) fine-tuned end-to-end weights on ImageNet-1K (highest accuracy) and 2) frozen trunk weights with a linear classifier fit on ImageNet-1K (great for transfer learning). Below we see the detailed accuracies of each model variant:

Model Weights Acc@1 Acc@5
RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_E2E_V1 86.012 98.054
RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 83.976 97.244
RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_E2E_V1 86.838 98.362
RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 84.622 97.48
RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_E2E_V1 88.228 98.682
RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 86.068 97.844
ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1 85.304 97.65
ViT_B_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 81.886 96.18
ViT_L_16_Weights.IMAGENET1K_SWAG_E2E_V1 88.064 98.512
ViT_L_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 85.146 97.422
ViT_H_14_Weights.IMAGENET1K_SWAG_E2E_V1 88.552 98.694
ViT_H_14_Weights.IMAGENET1K_SWAG_LINEAR_V1 85.708 97.73

The weights can be loaded normally as follows:

from torchvision.models import *

model1 = vit_h_14(weights="IMAGENET1K_SWAG_E2E_V1")
model2 = vit_h_14(weights="IMAGENET1K_SWAG_LINEAR_V1")

The SWAG weights are released under the Attribution-NonCommercial 4.0 International license. We would like to thank Laura Gustafson, Mannat Singh and Aaron Adcock for their work and support in making the weights available to TorchVision.

Model Refresh

The release of the Multi-weight support API enabled us to refresh the most popular models and offer more accurate weights. We improved on average each model by ~3 points. The new recipe used was learned on top of ResNet50 and its details were covered on a previous blogpost.

Model Old weights New weights
efficientnet_b1 78.642 79.838
mobilenet_v2 71.878 72.154
mobilenet_v3_large 74.042 75.274
regnet_y_400mf 74.046 75.804
regnet_y_800mf 76.42 78.828
regnet_y_1_6gf 77.95 80.876
regnet_y_3_2gf 78.948 81.982
regnet_y_8gf 80.032 82.828
regnet_y_16gf 80.424 82.886
regnet_y_32gf 80.878 83.368
regnet_x_400mf 72.834 74.864
regnet_x_800mf 75.212 77.522
regnet_x_1_6gf 77.04 79.668
regnet_x_3_2gf 78.364 81.196
regnet_x_8gf 79.344 81.682
regnet_x_16gf 80.058 82.716
regnet_x_32gf 80.622 83.014
resnet50 76.13 80.858
resnet50 (quantized) 75.92 80.282
resnet101 77.374 81.886
resnet152 78.312 82.284
resnext50_32x4d 77.618 81.198
resnext101_32x8d 79.312 82.834
resnext101_32x8d (quantized) 78.986 82.574
wide_resnet50_2 78.468 81.602
wide_resnet101_2 78.848 82.51

We would like to thank Piotr Dollar, Mannat Singh and Hugo Touvron for their past research and contributions to this work.

Ops and Transforms

New Augmentations, Layers and Losses

This release brings a bunch of new primitives which can be used to produce SOTA models. Some highlights include the addition of AugMix data-augmentation method, the DropBlock layer, the cIoU/dIoU loss and many more. We would like to thank Aditya Oke, Abhijit Deo, Yassine Alouini and Hu Ye for contributing to the project and for helping us maintain TorchVision relevant and fresh.

Documentation

We completely revamped our models documentation to make them easier to browse, and added various key information such as supported image sizes, or image pre-processing steps of pre-trained weights. We now have a main model page with various summary tables of available weights, and each model has a dedicated page. Each model builder is also documented in their own page, with more details about the available weights, including accuracy, minimal image size, link to training recipes, and other valuable info. For comparison, our previous models docs are here. To provide feedback on the new documentation, please use the dedicated Github issue.

Backward-incompatible changes

The new Multi-weight support API replaced the legacy “pretrained” parameter of model builders. Both solutions are currently supported to maintain backwards compatibility but our intention is to remove the deprecated API in 2 versions. Migrating to the new API is very straightforward. The following method calls between the 2 APIs are all equivalent:

from torchvision.models import resnet50, ResNet50_Weights

# Using pretrained weights:
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
resnet50(weights="IMAGENET1K_V1")
resnet50(pretrained=True)  # deprecated
resnet50(True)  # deprecated

# Using no weights:
resnet50(weights=None)
resnet50()
resnet50(pretrained=False)  # deprecated
resnet50(False)  # deprecated

Deprecations

[models, models.quantization] Reinstate and deprecate model_urls and quant_model_urls (#5992)
[transforms] Deprecate int as interpolation argument type (#5974)

New Features

[models] New Multi-weight API support (#5618, #5859, #6047, #6026, #5848)
[models] Adding Swin Transformer architecture (#5491)
[models] Adding EfficientNetV2 architecture (#5450)
[models] Adding detection model improved weights: RetinaNet, MaskRCNN, FasterRCNN (#5756, #5773, #5763)
[models] Adding classification model weight: resnext101 64x4d, mnasnet0_75, mnasnet1_3 (#5935, #6019)
[models] Add SWAG model pretrained weights (#5714, #5722, #5732, #5793, #5721)
[ops] AddingIoU loss function variants: DIoU, CIoU (#5786, #5776)
[ops] Adding various ops and test for ops (#6053, #5416, #5792, #5783)
[transforms] Adding AugMix transforms implementation (#5411)
[reference scripts] Support custom weight decay setting in classification reference script (#5671)
[transforms, reference scripts] Improve detection reference script: Scale Jitter, RandomShortestSize, FixedSizeCrop (#5435, #5610, #5607)
[ci] Add M1 support : (#6167)
[ci] Add Python-3.10 (build and test) (#5420)

Improvements

[documentation] Complete new revamp of models documentation (#5821, #5876, #5899, #6025, #5885, #5884, #5886, #5891, #6023, #6009, #5852, #5831, #5832, #6003, #6013, #5856, #6004, #6005, #5878, #6012, #5894, #6002, #5854, #5864, #5920, #5869, #5871, #6021, #6006, #6016, #5905, #6028, #5915, #5924, #5977, #5918, #5921, #5934, #5936, #5937, #5933, #5949, #5988, #5962, #5963, #5975, #5900, #5917, #5895, #5901, #6033, #6032, #6030, #5904, #5661, #6035, #6049, #6036, #5908, #5907, #6044, #6039, #5874, #6151)
[documentation] Various documentation improvements (#5695, #5930, #5814, #5799, #5827, #5796, #5923, #5599, #5554, #5995, #5457, #6163, #6031, #6000, #5847, #6024))
[documentation] Add warnings in docs to document Beta APIs (#6115)
[datasets] improve GDrive downloads (#5704, #5645)
[datasets] indicate md5 checksum is not used for security (#5717)
[models] Add shufflenetv2 1.5 and 2.0 weights (#5906)
[models] Reduce unnecessary cuda sync in anchor_utils.py (#5515)
[models] Adding improved MobileNetV2 weights (#5560)
[models] Remove (N, T, H, W, C) => (N, T, C, H, W) from presets (#6058)
[models] add swin_s and swin_b variants and improved swin_t (#6048)
[models] Update ShuffleNetV2 annotations for x1_5 and x2_0 variants (#6022)
[models] Better error message in ViT (#5820)
[models, ops] Add private support for ciou and diou (#5984, #5685, #5690)
[models, reference scripts] Various improvements to detection recipe and models (#5715, #5444)
[transforms, tests] add functional vertical flip tests on segmentation mask (#5860)
[transforms] make _max_value jit-scriptable (#5623)
[transforms] Make ScaleJitter proportional (#5559)
[transforms] add tensor kernels for normalize and erase (#5462)
[transforms] Update transforms following PIL deprecation (#5898)
[transforms, models, datasets…] Replace asserts with exceptions (#5587, #5659)
[utils] add warning if font is not set in draw_bounding_boxes (#5785)
[utils] Throw warning for empty masks or box tensors on draw_segmentation_masks and draw_bounding_boxes (#5857)
[video] Add output_format do video datasets and readers (#6061)
[video, io] Better compatibility with FFMPEG 5.0 (#5644)
[video, io] Allow cuda device to be passed without the index for GPU decoding (#5505)
[reference scripts] Simplify EMA to use Pytorch's update_parameters (#5469)
[reference scripts] Reduce variance of evaluation in reference (#5819)
[reference scripts] Various improvements to RAFT training reference (#5590)
[tests] Speed up Model tests by 20% (#5574)
[tests] Make test suite fail on unexpected test success (#5556)
[tests] Skip big model in test to reduce memory usage in CI (#5903, #5902)
[tests] Improve test of backbone utils (#5552)
[tests] Validate against expected files on videos (#6077)
[ci] Support for CUDA 11.6 (#5803, 5862)
[ci] pre-download model weights in CI docs build (#5625)

Bug Fixes

[transforms] remove option to pass fill as str in transforms (#5632)
[transforms] Better handling for Pad's fill argument (#5596)
[transforms] [FBcode->GH] Fix accimage tests (#5545)
[transforms] Update _pil_constants.py (#6154) (#6156)
[transforms] Fix resize transform when size == small_edge_size and max_size isn't None (#5409)
[transforms] Fixed rotate transform with expand inconsistency (#5677)
[transforms] Fixed upstream issue with padding (#5875)
[transforms] Fix functional.adjust_gamma (#5427)
[models] Respect strict=False when loading detection models (#5841)
[models] Fix resnet norm initialization (#6082) (#6085)
[models] Use frozen BN only if pre-trained for detection models. (#5443)
[models] fix fcos gtarea calculation (#5816)
[models, onnx] Add topk min function for trace and onnx (#5310)
[models, tests] fix mobilnet norm layer test (#5643)
[reference scripts] Fix regression on Detection training script (#5985)
[datasets] do not re-download from GDrive if file is already present (#5805)
[datasets] Fix datasets: kinetics, Flowers102, VOC_2009, INaturalist 2021_train, caltech (#5578, #5775, #5425, #5844, #5789)
[documentation] Fixes device mismatch issue while building docs (#5428)
[documentation] Fix Accuracy meta-data on shufflenetv2 (#5896)
[documentation] fix typo in docstrings of some transforms (#5609)
[video, documentation] Fix append of audio_pts (#5488)
[io, tests] More robust check in tests for 16 bits images (#5652)
[video, io] Fix shape mismatch error in video reader (#5489)
[io] Address nvjpeg leak on CUDA < 11.6 issue (#5713, #5482)
[ci] Fixing issue with setup_env.sh in docker: resolve "unsafe directory" error (#6106) (#6109)
[ci] fix documentation version problems when new release is tagged (#5583)
[ci] Replace jcenter and fix version for android (#6046)
[tests] Add .float() before .mean() on test_backbone_utils.py because .mean() dont accept integer dtype (#6090) (#6091)
[tests] Fix keypointrcnn_resnet50_fpn flaky test (#5911)
[tests] Disable test_encode|write_jpeg_reference tests (#5910)
[mobile] Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624)
[feature extraction] Add default tracer args for model feature extraction function (#5637)
[build] Fix libtorchvision.so not able to encode images by adding *_FOUND macros to CMakeLists.txt (#5547)

Code Quality

[dataset, models] Better deprecation message for voc2007 and SqueezeExcitation (#5391)
[datasets, reference scripts] Use Kinetics instead of Kinetics400 in references (#5787) (#5952)
[models] CleanUp DenseNet code (#5966)
[models] Minor Swin Transformer fixes (#6054)
[models, onnx] Use onnx function only in tracing mode (#5468)
[models] Refactor swin transfomer so later we can reuse component for 3d version (#6088) (#6100)
[models, tests] Fix minor issues with model tests. (#5576)
[transforms] Remove to_tensor() and ToTensor() usages (#5553)
[transforms] Refactor Augmentation Space calls to speed up. (#5402)
[transforms] Recoded _max_value method using a dictionary (#5566)
[transforms] Replace get_image_size/num_channels with get_dimensions (#5487)
[ops] Replace usages of atomicAdd with gpuAtomicAdd (#5823)
[ops] Fix unused variable warning in ps_roi_align_kernel.cu (#5408)
[ops] Remove custom ops interpolation with antialiasing (#5329)
[ops] Move Permute layer to ops. (#6055)
[ops] Remove assertions for generalized_box_iou (#5691)
[utils] Moving sequence_to_str to torchvision._utils (#5604)
[utils] Clarify TypeError message in make_grid (#5997)
[video, io] replace distutils.spawn with shutil.which per PEP632 in setup script (#5849)
[video, io] Move VideoReader out of init (#5495)
[video, io] Remove unnecessary initialisation in GPUDecoder (#5507)
[video, io] Remove unused member variable and argument in GPUDecoder (#5499)
[video, io] Improve test_video_reader (#5498)
[video, io] Update private attribute name for readability (#5484)
[video, tests] Improve test_videoapi (#5497)
[reference scripts] Minor updates to optical flow ref for consistency (#5654)
[reference scripts] Add barrier() after init_process_group() (#5475)
[ci] Delete stale packaging scripts (#5433)
[ci] remove explicit install of Pillow throughout CI (#5950)
[ci, test] remove unnecessary pytest install (#5739)
[ci, tests] Remove unnecessary PYTORCH_TEST_WITH_SLOW env (#5631)
[ci] Add .git-blame-ignore-revs to ignore specific commits in git blame (#5696)
[ci] Remove CUDA 11.1 support (#5477, #5470, #5451, #5978)
[ci] Minor linting improvement (#5880)
[ci] Remove Bandit and CodeQL jobs (#5734)
[ci] Various type annotation fixes / issues (#5598, #5970, #5943)

Contributors

We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Abhijit Deo, Aditya Oke, Andrey Talman, Anton Thomma, Behrooz, Bruno Korbar, Daniel Angelov, Dbhasin1, Drishti Bhasin, F-G Fernandez, Federico Pozzi, FG Fernandez, Georg Grab, Gouvernathor, Hu Ye, Jeffery (Zeyu) Zhao, Joao Gomes, kaijieshi, Kazuki Adachi, KyleCZH, kylematoba, LEGRAND Matthieu, Lezwon Castelino, Luming Tang, Matti Picus, Nicolas Hug, Nikita, Nikita Shulga, oxabz, Philip Meier, Prabhat Roy, puhuk, Richard Barnes, Sahil Goyal, satojkovic, Shijie, Shubham Bhokare, talregev, tcmyxc, Vasilis Vryniotis, vfdev, WuZhe, XiaobingZhang, Xu Zhao, Yassine Alouini, Yonghye Kwon, YosuaMichael, Yulv-git, Zhiqiang Wang