SwinSOD

Code release for paper "SwinSOD: Salient Object Detection using Swin-Transformer"

!!News: The SSRN Preprinted paper "SwinSOD: Salient Object Detection using Swin-Transformer" has been accepted by 《Image and Vision Computing》.

Introduction

The Transformer structure has achieved excellent performance in a wide range of applications in the field of computer vision, and Swin Transformer also shows strong feature representation capabilities. On this basis, we proposed a fusion model SwinSOD for RGB salient object detection. This model used Swin-Transformer as the encoder to extract hierarchical features, was driven by a multi-head attention mechanism to bridge the gap between hierarchical features, progressively fused adjacent layer feature information under the guidance of global information, and refined the boundaries of saliency objects through the feedback information. Specifically, the Swin-Transformer encoder extracted multi-level features and then recalibrates the channels to optimize intra-layer channel features. The feature fusion module realized feature fusion between each layer under the guidance of global information. In order to clarify the fuzzy boundaries, the second stage feature fusion achieved edge refinement under the guidance of feedback information. The proposed model outperforms state-of-the-art models on five popular SOD datasets, demonstrating the advanced performance of this network.

Requirements

python 3.6
PyTorch >= 1.7
torchvision >= 0.4.2
PIL
Numpy

Data Preparation

Download the following datasets and unzip them into data folder

Directory Structure

 data --------------------------
      |-DUTS        -image/
      |             -mask/
      |             -test.txt
      |             -train.txt
      --------------------------
      |-DUT-OMRON   -image/
      |             -mask/
      |             -test.txt
      --------------------------
      |-ECSSD       -image/
      |             -mask/
      |             -test.txt
      --------------------------
      |-HKU-IS      -image/
      |             -mask/
      |             -test.txt
      --------------------------
      |-PASCAL-S    -image/
      |             -mask/
      |             -test.txt
      --------------------------

Train

Clone repository

git clone https://github.com/user-wu/SwinSOD.git
cd SwinSOD/src/

Train

python train.py

Swin-Transformer is used as the backbone of SwinSOD and DUTS-TR is used to train the model
batch=32, lr=0.05, momen=0.9, decay=5e-4, epoch=32
Warm-up and linear decay strategies are used to change the learning rate lr
Pretrained model.

Inference

cd SwinSOD/src/
python test.py

After testing, saliency maps of PASCAL-S, ECSSD, HKU-IS, DUT-OMRON, DUTS-TE will be saved in eval/maps/ folder.
Trained model: model
Saliency maps for reference: saliency maps

Citation

If you find this work is helpful, please cite our paper

@article{wu2024swinsod,
  title={SwinSOD: Salient object detection using swin-transformer},
  author={Wu, Shuang and Zhang, Guangjian and Liu, Xuefeng},
  journal={Image and Vision Computing},
  pages={105039},
  year={2024},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
pre		pre
result		result
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwinSOD

Introduction

Requirements

Data Preparation

Directory Structure

Train

Inference

Citation

About

Releases

Packages

Languages

License

user-wu/SwinSOD

Folders and files

Latest commit

History

Repository files navigation

SwinSOD

Introduction

Requirements

Data Preparation

Directory Structure

Train

Inference

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages