TorchVision 0.13, including new Multi-weights API, new pre-trained weights, and more
Highlights
Models
Multi-weight support API
TorchVision v0.13 offers a new Multi-weight support API for loading different weights to the existing model builder methods:
from torchvision.models import *
# Old weights with accuracy 76.130%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
# New weights with accuracy 80.858%
resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
# Best available weights (currently alias for IMAGENET1K_V2)
# Note that these weights may change across versions
resnet50(weights=ResNet50_Weights.DEFAULT)
# Strings are also supported
resnet50(weights="IMAGENET1K_V2")
# No weights - random initialization
resnet50(weights=None)
The new API bundles along with the weights important details such as the preprocessing transforms and meta-data such as labels. Here is how to make the most out of it:
from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights
img = read_image("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
# Step 1: Initialize model with the best available weights
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
# Step 2: Initialize the inference transforms
preprocess = weights.transforms()
# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
# Step 4: Use the model and print the predicted category
prediction = model(batch).squeeze(0).softmax(0)
class_id = prediction.argmax().item()
score = prediction[class_id].item()
category_name = weights.meta["categories"][class_id]
print(f"{category_name}: {100 * score:.1f}%")
You can read more about the new API in the docs. To provide your feedback, please use this dedicated Github issue.
New architectures and model variants
Classification
The Swin Transformer and EfficienetNetV2 are two popular classification models which are often used for downstream vision tasks. This release includes 6 pre-trained weights for their classification variants. Here is how to use the new models:
import torch
from torchvision.models import *
image = torch.rand(1, 3, 224, 224)
model = swin_t(weights="DEFAULT").eval()
prediction = model(image)
image = torch.rand(1, 3, 384, 384)
model = efficientnet_v2_s(weights="DEFAULT").eval()
prediction = model(image)
In addition to the above, we also provide new variants for existing architectures such as ShuffleNetV2, ResNeXt and MNASNet. The accuracies of all the new pre-trained models obtained on ImageNet-1K are seen below:
Model | Acc@1 | Acc@5 |
---|---|---|
swin_t | 81.474 | 95.776 |
swin_s | 83.196 | 96.36 |
swin_b | 83.582 | 96.64 |
efficientnet_v2_s | 84.228 | 96.878 |
efficientnet_v2_m | 85.112 | 97.156 |
efficientnet_v2_l | 85.808 | 97.788 |
resnext101_64x4d | 83.246 | 96.454 |
resnext101_64x4d (quantized) | 82.898 | 96.326 |
shufflenet_v2_x1_5 | 72.996 | 91.086 |
shufflenet_v2_x1_5 (quantized) | 72.052 | 90.700 |
shufflenet_v2_x2_0 | 76.230 | 93.006 |
shufflenet_v2_x2_0 (quantized) | 75.354 | 92.488 |
mnasnet0_75 | 71.180 | 90.496 |
mnas1_3 | 76.506 | 93.522 |
We would like to thank Hu Ye for contributing to TorchVision the Swin Transformer implementation.
[BETA] Object Detection and Instance Segmentation
We have introduced 3 new model variants for RetinaNet, FasterRCNN and MaskRCNN that include several post-paper architectural optimizations and improved training recipes. All models can be used similarly:
import torch
from torchvision.models.detection import *
images = [torch.rand(3, 800, 600)]
model = retinanet_resnet50_fpn_v2(weights="DEFAULT")
# model = fasterrcnn_resnet50_fpn_v2(weights="DEFAULT")
# model = maskrcnn_resnet50_fpn_v2(weights="DEFAULT")
model.eval()
prediction = model(images)
Below we present the metrics of the new variants on COCO val2017. In parenthesis we denote the improvement over the old variants:
Model | Box mAP | Mask mAP |
---|---|---|
retinanet_resnet50_fpn_v2 | 41.5 (+5.1) | - |
fasterrcnn_resnet50_fpn_v2 | 46.7 (+9.7) | - |
maskrcnn_resnet50_fpn_v2 | 47.4 (+9.5) | 41.8 (+7.2) |
We would like to thank Ross Girshick, Piotr Dollar, Vaibhav Aggarwal, Francisco Massa and Hu Ye for their past research and contributions to this work.
New pre-trained weights
SWAG weights
The ViT and RegNet model variants offer new pre-trained SWAG (Supervised Weakly from hashtAGs) weights. One of the biggest of these models achieves a whopping 88.6% accuracy on ImageNet-1K. We currently offer two versions of the weights: 1) fine-tuned end-to-end weights on ImageNet-1K (highest accuracy) and 2) frozen trunk weights with a linear classifier fit on ImageNet-1K (great for transfer learning). Below we see the detailed accuracies of each model variant:
Model Weights | Acc@1 | Acc@5 |
---|---|---|
RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 86.012 | 98.054 |
RegNet_Y_16GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 83.976 | 97.244 |
RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 86.838 | 98.362 |
RegNet_Y_32GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 84.622 | 97.48 |
RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.228 | 98.682 |
RegNet_Y_128GF_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 86.068 | 97.844 |
ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1 | 85.304 | 97.65 |
ViT_B_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 81.886 | 96.18 |
ViT_L_16_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.064 | 98.512 |
ViT_L_16_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 85.146 | 97.422 |
ViT_H_14_Weights.IMAGENET1K_SWAG_E2E_V1 | 88.552 | 98.694 |
ViT_H_14_Weights.IMAGENET1K_SWAG_LINEAR_V1 | 85.708 | 97.73 |
The weights can be loaded normally as follows:
from torchvision.models import *
model1 = vit_h_14(weights="IMAGENET1K_SWAG_E2E_V1")
model2 = vit_h_14(weights="IMAGENET1K_SWAG_LINEAR_V1")
The SWAG weights are released under the Attribution-NonCommercial 4.0 International license. We would like to thank Laura Gustafson, Mannat Singh and Aaron Adcock for their work and support in making the weights available to TorchVision.
Model Refresh
The release of the Multi-weight support API enabled us to refresh the most popular models and offer more accurate weights. We improved on average each model by ~3 points. The new recipe used was learned on top of ResNet50 and its details were covered on a previous blogpost.
Model | Old weights | New weights |
---|---|---|
efficientnet_b1 | 78.642 | 79.838 |
mobilenet_v2 | 71.878 | 72.154 |
mobilenet_v3_large | 74.042 | 75.274 |
regnet_y_400mf | 74.046 | 75.804 |
regnet_y_800mf | 76.42 | 78.828 |
regnet_y_1_6gf | 77.95 | 80.876 |
regnet_y_3_2gf | 78.948 | 81.982 |
regnet_y_8gf | 80.032 | 82.828 |
regnet_y_16gf | 80.424 | 82.886 |
regnet_y_32gf | 80.878 | 83.368 |
regnet_x_400mf | 72.834 | 74.864 |
regnet_x_800mf | 75.212 | 77.522 |
regnet_x_1_6gf | 77.04 | 79.668 |
regnet_x_3_2gf | 78.364 | 81.196 |
regnet_x_8gf | 79.344 | 81.682 |
regnet_x_16gf | 80.058 | 82.716 |
regnet_x_32gf | 80.622 | 83.014 |
resnet50 | 76.13 | 80.858 |
resnet50 (quantized) | 75.92 | 80.282 |
resnet101 | 77.374 | 81.886 |
resnet152 | 78.312 | 82.284 |
resnext50_32x4d | 77.618 | 81.198 |
resnext101_32x8d | 79.312 | 82.834 |
resnext101_32x8d (quantized) | 78.986 | 82.574 |
wide_resnet50_2 | 78.468 | 81.602 |
wide_resnet101_2 | 78.848 | 82.51 |
We would like to thank Piotr Dollar, Mannat Singh and Hugo Touvron for their past research and contributions to this work.
Ops and Transforms
New Augmentations, Layers and Losses
This release brings a bunch of new primitives which can be used to produce SOTA models. Some highlights include the addition of AugMix data-augmentation method, the DropBlock layer, the cIoU/dIoU loss and many more. We would like to thank Aditya Oke, Abhijit Deo, Yassine Alouini and Hu Ye for contributing to the project and for helping us maintain TorchVision relevant and fresh.
Documentation
We completely revamped our models documentation to make them easier to browse, and added various key information such as supported image sizes, or image pre-processing steps of pre-trained weights. We now have a main model page with various summary tables of available weights, and each model has a dedicated page. Each model builder is also documented in their own page, with more details about the available weights, including accuracy, minimal image size, link to training recipes, and other valuable info. For comparison, our previous models docs are here. To provide feedback on the new documentation, please use the dedicated Github issue.
Backward-incompatible changes
The new Multi-weight support API replaced the legacy “pretrained” parameter of model builders. Both solutions are currently supported to maintain backwards compatibility but our intention is to remove the deprecated API in 2 versions. Migrating to the new API is very straightforward. The following method calls between the 2 APIs are all equivalent:
from torchvision.models import resnet50, ResNet50_Weights
# Using pretrained weights:
resnet50(weights=ResNet50_Weights.IMAGENET1K_V1)
resnet50(weights="IMAGENET1K_V1")
resnet50(pretrained=True) # deprecated
resnet50(True) # deprecated
# Using no weights:
resnet50(weights=None)
resnet50()
resnet50(pretrained=False) # deprecated
resnet50(False) # deprecated
Deprecations
[models, models.quantization] Reinstate and deprecate model_urls
and quant_model_urls
(#5992)
[transforms] Deprecate int as interpolation argument type (#5974)
New Features
[models] New Multi-weight API support (#5618, #5859, #6047, #6026, #5848)
[models] Adding Swin Transformer architecture (#5491)
[models] Adding EfficientNetV2 architecture (#5450)
[models] Adding detection model improved weights: RetinaNet, MaskRCNN, FasterRCNN (#5756, #5773, #5763)
[models] Adding classification model weight: resnext101 64x4d, mnasnet0_75, mnasnet1_3 (#5935, #6019)
[models] Add SWAG model pretrained weights (#5714, #5722, #5732, #5793, #5721)
[ops] AddingIoU loss function variants: DIoU, CIoU (#5786, #5776)
[ops] Adding various ops and test for ops (#6053, #5416, #5792, #5783)
[transforms] Adding AugMix transforms implementation (#5411)
[reference scripts] Support custom weight decay setting in classification reference script (#5671)
[transforms, reference scripts] Improve detection reference script: Scale Jitter, RandomShortestSize, FixedSizeCrop (#5435, #5610, #5607)
[ci] Add M1 support : (#6167)
[ci] Add Python-3.10 (build and test) (#5420)
Improvements
[documentation] Complete new revamp of models documentation (#5821, #5876, #5899, #6025, #5885, #5884, #5886, #5891, #6023, #6009, #5852, #5831, #5832, #6003, #6013, #5856, #6004, #6005, #5878, #6012, #5894, #6002, #5854, #5864, #5920, #5869, #5871, #6021, #6006, #6016, #5905, #6028, #5915, #5924, #5977, #5918, #5921, #5934, #5936, #5937, #5933, #5949, #5988, #5962, #5963, #5975, #5900, #5917, #5895, #5901, #6033, #6032, #6030, #5904, #5661, #6035, #6049, #6036, #5908, #5907, #6044, #6039, #5874, #6151)
[documentation] Various documentation improvements (#5695, #5930, #5814, #5799, #5827, #5796, #5923, #5599, #5554, #5995, #5457, #6163, #6031, #6000, #5847, #6024))
[documentation] Add warnings in docs to document Beta APIs (#6115)
[datasets] improve GDrive downloads (#5704, #5645)
[datasets] indicate md5 checksum is not used for security (#5717)
[models] Add shufflenetv2 1.5 and 2.0 weights (#5906)
[models] Reduce unnecessary cuda sync in anchor_utils.py (#5515)
[models] Adding improved MobileNetV2 weights (#5560)
[models] Remove (N, T, H, W, C) => (N, T, C, H, W)
from presets (#6058)
[models] add swin_s and swin_b variants and improved swin_t (#6048)
[models] Update ShuffleNetV2 annotations for x1_5 and x2_0 variants (#6022)
[models] Better error message in ViT (#5820)
[models, ops] Add private support for ciou and diou (#5984, #5685, #5690)
[models, reference scripts] Various improvements to detection recipe and models (#5715, #5444)
[transforms, tests] add functional vertical flip tests on segmentation mask (#5860)
[transforms] make _max_value jit-scriptable (#5623)
[transforms] Make ScaleJitter proportional (#5559)
[transforms] add tensor kernels for normalize and erase (#5462)
[transforms] Update transforms following PIL deprecation (#5898)
[transforms, models, datasets…] Replace asserts with exceptions (#5587, #5659)
[utils] add warning if font is not set in draw_bounding_boxes (#5785)
[utils] Throw warning for empty masks or box tensors on draw_segmentation_masks and draw_bounding_boxes (#5857)
[video] Add output_format do video datasets and readers (#6061)
[video, io] Better compatibility with FFMPEG 5.0 (#5644)
[video, io] Allow cuda device to be passed without the index for GPU decoding (#5505)
[reference scripts] Simplify EMA to use Pytorch's update_parameters (#5469)
[reference scripts] Reduce variance of evaluation in reference (#5819)
[reference scripts] Various improvements to RAFT training reference (#5590)
[tests] Speed up Model tests by 20% (#5574)
[tests] Make test suite fail on unexpected test success (#5556)
[tests] Skip big model in test to reduce memory usage in CI (#5903, #5902)
[tests] Improve test of backbone utils (#5552)
[tests] Validate against expected files on videos (#6077)
[ci] Support for CUDA 11.6 (#5803, 5862)
[ci] pre-download model weights in CI docs build (#5625)
Bug Fixes
[transforms] remove option to pass fill as str in transforms (#5632)
[transforms] Better handling for Pad's fill argument (#5596)
[transforms] [FBcode->GH] Fix accimage tests (#5545)
[transforms] Update _pil_constants.py (#6154) (#6156)
[transforms] Fix resize transform when size == small_edge_size and max_size isn't None (#5409)
[transforms] Fixed rotate transform with expand inconsistency (#5677)
[transforms] Fixed upstream issue with padding (#5875)
[transforms] Fix functional.adjust_gamma (#5427)
[models] Respect strict=False
when loading detection models (#5841)
[models] Fix resnet norm initialization (#6082) (#6085)
[models] Use frozen BN only if pre-trained for detection models. (#5443)
[models] fix fcos gtarea calculation (#5816)
[models, onnx] Add topk min function for trace and onnx (#5310)
[models, tests] fix mobilnet norm layer test (#5643)
[reference scripts] Fix regression on Detection training script (#5985)
[datasets] do not re-download from GDrive if file is already present (#5805)
[datasets] Fix datasets: kinetics, Flowers102, VOC_2009, INaturalist 2021_train, caltech (#5578, #5775, #5425, #5844, #5789)
[documentation] Fixes device mismatch issue while building docs (#5428)
[documentation] Fix Accuracy meta-data on shufflenetv2 (#5896)
[documentation] fix typo in docstrings of some transforms (#5609)
[video, documentation] Fix append of audio_pts (#5488)
[io, tests] More robust check in tests for 16 bits images (#5652)
[video, io] Fix shape mismatch error in video reader (#5489)
[io] Address nvjpeg leak on CUDA < 11.6 issue (#5713, #5482)
[ci] Fixing issue with setup_env.sh in docker: resolve "unsafe directory" error (#6106) (#6109)
[ci] fix documentation version problems when new release is tagged (#5583)
[ci] Replace jcenter and fix version for android (#6046)
[tests] Add .float() before .mean() on test_backbone_utils.py because .mean() dont accept integer dtype (#6090) (#6091)
[tests] Fix keypointrcnn_resnet50_fpn flaky test (#5911)
[tests] Disable test_encode|write_jpeg_reference
tests (#5910)
[mobile] Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624)
[feature extraction] Add default tracer args for model feature extraction function (#5637)
[build] Fix libtorchvision.so not able to encode images by adding *_FOUND macros to CMakeLists.txt (#5547)
Code Quality
[dataset, models] Better deprecation message for voc2007 and SqueezeExcitation (#5391)
[datasets, reference scripts] Use Kinetics instead of Kinetics400 in references (#5787) (#5952)
[models] CleanUp DenseNet code (#5966)
[models] Minor Swin Transformer fixes (#6054)
[models, onnx] Use onnx function only in tracing mode (#5468)
[models] Refactor swin transfomer so later we can reuse component for 3d version (#6088) (#6100)
[models, tests] Fix minor issues with model tests. (#5576)
[transforms] Remove to_tensor()
and ToTensor()
usages (#5553)
[transforms] Refactor Augmentation Space calls to speed up. (#5402)
[transforms] Recoded _max_value method using a dictionary (#5566)
[transforms] Replace get_image_size/num_channels with get_dimensions (#5487)
[ops] Replace usages of atomicAdd with gpuAtomicAdd (#5823)
[ops] Fix unused variable warning in ps_roi_align_kernel.cu (#5408)
[ops] Remove custom ops interpolation with antialiasing (#5329)
[ops] Move Permute layer to ops. (#6055)
[ops] Remove assertions for generalized_box_iou (#5691)
[utils] Moving sequence_to_str
to torchvision._utils
(#5604)
[utils] Clarify TypeError message in make_grid (#5997)
[video, io] replace distutils.spawn with shutil.which per PEP632 in setup script (#5849)
[video, io] Move VideoReader out of init (#5495)
[video, io] Remove unnecessary initialisation in GPUDecoder (#5507)
[video, io] Remove unused member variable and argument in GPUDecoder (#5499)
[video, io] Improve test_video_reader (#5498)
[video, io] Update private attribute name for readability (#5484)
[video, tests] Improve test_videoapi (#5497)
[reference scripts] Minor updates to optical flow ref for consistency (#5654)
[reference scripts] Add barrier() after init_process_group() (#5475)
[ci] Delete stale packaging scripts (#5433)
[ci] remove explicit install of Pillow throughout CI (#5950)
[ci, test] remove unnecessary pytest install (#5739)
[ci, tests] Remove unnecessary PYTORCH_TEST_WITH_SLOW env (#5631)
[ci] Add .git-blame-ignore-revs to ignore specific commits in git blame (#5696)
[ci] Remove CUDA 11.1 support (#5477, #5470, #5451, #5978)
[ci] Minor linting improvement (#5880)
[ci] Remove Bandit and CodeQL jobs (#5734)
[ci] Various type annotation fixes / issues (#5598, #5970, #5943)
Contributors
We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Abhijit Deo, Aditya Oke, Andrey Talman, Anton Thomma, Behrooz, Bruno Korbar, Daniel Angelov, Dbhasin1, Drishti Bhasin, F-G Fernandez, Federico Pozzi, FG Fernandez, Georg Grab, Gouvernathor, Hu Ye, Jeffery (Zeyu) Zhao, Joao Gomes, kaijieshi, Kazuki Adachi, KyleCZH, kylematoba, LEGRAND Matthieu, Lezwon Castelino, Luming Tang, Matti Picus, Nicolas Hug, Nikita, Nikita Shulga, oxabz, Philip Meier, Prabhat Roy, puhuk, Richard Barnes, Sahil Goyal, satojkovic, Shijie, Shubham Bhokare, talregev, tcmyxc, Vasilis Vryniotis, vfdev, WuZhe, XiaobingZhang, Xu Zhao, Yassine Alouini, Yonghye Kwon, YosuaMichael, Yulv-git, Zhiqiang Wang