dog_image_classification

This study investigates the efficacy of modern deep learning architectures in image classification tasks, focusing on the recognition of dog breeds. Leveraging the Stanford Dogs Dataset, we evaluate the performance of Vision Transformer (ViT), VGG-16, and ResNet-50 models, aiming to surpass previous benchmarks set by Hsu (2015) using conventional convolutional neural networks (CNNs). The Vision Transformer (ViT) architecture, originally designed for natural language processing, represents a modern approach to image classification by processing entire images as sequences of tokens. Our results demonstrate significant accuracy improvements over the baseline established by Hsu (2015). VGG-16 achieved 65% testing accuracy, ResNet-50 achieved 84%, and surprisingly, ViT outperformed both with 91% accuracy. These findings suggest the potential of transformer architectures in handling smaller-scale datasets with fine-grained categories. The study contributes to the growing body of research indicating the viability of transformer models in various image classification tasks and calls for further exploration to enhance their performance as the architecture continues to evolve.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
dog_code.ipynb		dog_code.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dog_image_classification

About

Releases

Packages

Languages

dlongert/dog_image_classification

Folders and files

Latest commit

History

Repository files navigation

dog_image_classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages