Skip to content

Commit

Permalink
add papers and new sub-category of dtt
Browse files Browse the repository at this point in the history
  • Loading branch information
greatzh committed Dec 5, 2023
1 parent 5ae916d commit 67c724b
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 43 deletions.
83 changes: 51 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,14 @@ description: >-
#### Image Editing

<details open>

<summary>2023 This year</summary>

* [ ] GP-Net: Image Manipulation Detection and Localization via Long-Range Modeling and Transformers (_Appl. Sci. (IF: 2.8, not included in CCFs), MDPI, '23_) **\[[Paper](https://www.mdpi.com/2076-3417/13/21/12053)]**
* [ ] DS-Net: Dual supervision neural network for image manipulation localization _(IET-IPR '23)_ **[[Paper](https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/ipr2.12885)]**
* [ ] Learning to Immunize Images for Tamper Localization and Self Recovery _(TPAMI ‘23)_ **[[Paper](https://arxiv.org/pdf/2210.15902.pdf)]**
* [ ] Semantic-agnostic progressive subtractive network for image manipulation detection and localization _(Neurocomputing '23)_ **[[Paper](https://doi.org/10.1016/j.neucom.2023.126263)]**
* [ ] Towards Effective Image Manipulation Detection with Proposal Contrastive Learning _(TCSVT '23)_ **[[Paper](https://arxiv.org/pdf/2210.08529.pdf)]** **[[Code](https://github.com/Sandy-Zeng/PCL)]**
* [ ] Effective image tampering localization with multi-scale ConvNeXt feature fusion (_JVCIR '23)_ **\[**[**Paper**](https://arxiv.org/abs/2208.13739)**]** **[[Code](https://github.com/multimediaFor/ConvNeXtFF)]**
* [ ] Evading Detection Actively: Toward Anti-Forensics against Forgery Localization (_arXiv '23_) **\[**[**Paper**](https://arxiv.org/abs/2310.10036)**]** **\[**[**Code**](https://github.com/tansq/SEAR)**]**
* [ ] Multi-scale attention context-aware network for detection and localization of image splicing: Efficient and robust identification network _(Appl. Intell. 23')_ **\[**[**Paper**](https://link.springer.com/article/10.1007/s10489-022-04421-3)**]**
* [ ] [ReLoc: A Restoration-Assisted Framework for Robust Image Tampering Localization](image-forgery/2023/reloc.md) (_TIFS '23_) **\[**[**Paper**](https://arxiv.org/abs/2211.03930)**]** **\[**[**Code**](https://github.com/ZhuangPeiyu/ReLoc)**]**
Expand Down Expand Up @@ -73,6 +78,7 @@ description: >-

<summary>2022</summary>

* [ ] DS-UNet: A dual streams UNet for refined image forgery localization _(InfoS '22)_ **[[Paper](https://dl.acm.org/doi/abs/10.1016/j.ins.2022.08.005)]**
* [ ] MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task Image Manipulation Detection and Localization (_ArXiv '22_) **\[**[**Paper**](https://arxiv.org/abs/2211.03140)**]**
* [ ] Towards JPEG-Resistant Image Forgery Detection and Localization Via Self-Supervised Domain Adaptation (_TPAMI '22_) **\[**[**Paper**](https://ieeexplore.ieee.org/document/9904872)**]**
* [ ] ESRNet: Efficient Search and Recognition Network for Image Manipulation Detection (_TOMCCAP '22_) **\[**[**Paper**](https://doi.org/10.1145/3506853)**]** **\[**[**Tool**](https://github.com/tampered816/rrr)**]**
Expand Down Expand Up @@ -228,6 +234,50 @@ _Some of the above papers also contain methods to detect tampered images generat
* [ ] [Shrinking the Semantic Gap: Spatial Pooling of Local Moment Invariants for Copy-Move Forgery Detection](copy-move/word2phrasecmfd.md) _(TIFS '23)_ **\[**[**Paper**](https://arxiv.org/abs/2207.09135)**]** **\[**[**Code**](https://github.com/ChaoWang1016/word2phraseCMFD)**]**
* [ ] Image Copy-Move Forgery Detection via Deep Cross-Scale PatchMatch (_ICME '23_) **\[**[**Paper**](https://arxiv.org/abs/2308.04188)**]**

### Tamper Text in Detection

图像中的**文本篡改检测**问题 (parts of)

- [ ] Towards Robust Tampered Text Detection in Document Image: New dataset and New Solution (_CVPR '23_) **\[**[**Paper**](https://openaccess.thecvf.com/content/CVPR2023/papers/Qu_Towards_Robust_Tampered_Text_Detection_in_Document_Image_New_Dataset_CVPR_2023_paper.pdf)**]** **[[Code](https://github.com/qcf-568/DocTamper)]**
- [ ] Progressive Supervision for Tampering Localization in Document Images (_ICONIP '23_) **[[Paper](https://link.springer.com/chapter/10.1007/978-981-99-8184-7_11)]**
- [ ] SigScatNet: A Siamese + Scattering based Deep Learning Approach for Signature Forgery Detection and Similarity Assessment _(arXiv '23)_ **[[Paper](https://arxiv.org/pdf/2311.05579.pdf)]**
- [ ] Image Generation and Learning Strategy for Deep Document Forgery Detection _(arXiv '23)_ **[[Paper](https://arxiv.org/abs/2311.03650)]**
- [ ] Forgery-free signature verification with stroke-aware cycle-consistent generative adversarial network _(Neurocomputing '22)_ **[[Paper](https://doi.org/10.1016/j.neucom.2022.08.017)]** **[[Code](https://github.com/KAKAFEI123/Stroke-cCycleGAN)]**
- [ ] Document Forgery Detection in the Context of Double JPEG Compression _(ICPR '22)_ **[[Paper](https://link.springer.com/chapter/10.1007/978-3-031-37745-7_5)]**

### Low Level Vision

Related resources:

* [https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision](https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision)
* [https://github.com/lcybuzz/Low-Level-Vision-Paper-Record](https://github.com/lcybuzz/Low-Level-Vision-Paper-Record)

Low-level tasks include super-resolution, denoise, dehze, low-light enhancement, etc. High-level tasks include classification, detection, segmentation, etc. segmentation, and so on. However, the ones I have listed here are probably still mainly related to tampering detection.

> Testing the new layout of paper title.
>
> 📖Paper, 👨‍💻Code, 📦Dataset, 🔗Other links, 📜News,
>
> \*Equal contribution. #Corresponding author.
* [ ] (**EVP**) Explicit Visual Prompting for Low-Level Structure Segmentations (_CVPR '23_) [📖](https://arxiv.org/abs/2303.10883), [👨‍💻](https://github.com/NiFangBaAGe/Explicit-Visual-Prompt) (_including defocus blur, shadow, forgery, camouflaged dection_)

> [Weihuang Liu](https://github.com/nifangbaage)<sup>1</sup>, [Xi Shen](https://xishen0220.github.io/)<sup>2</sup>, [Chi-Man Pun](https://www.cis.um.edu.mo/\~cmpun/)<sup>#,1</sup>, [Xiaodong Cun](https://vinthony.github.io/)<sup>#,2</sup>
>
> <sup>1</sup>University of Macau <sup>2</sup>Tencent AI Lab
* [ ] SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device (_ICCV '23_) [📖](https://arxiv.org/abs/2308.08137), [👨‍💻](https://github.com/sanechips-multimedia/syenet)

> [Weiran Gou](https://github.com/WeiranGou)<sup>∗1,2</sup>, Ziyao Yi<sup>∗1,2</sup>, Yan Xiang<sup>1,2</sup>, Shaoqing Li<sup>1,2</sup>, Zibin Liu<sup>1,2</sup>, Dehui Kong<sup>1,2</sup>, Ke Xu<sup>#1,2</sup>
>
> <sup>1</sup>State Key Laboratory of Mobile Network and Mobile Multimedia Technology, <sup>2</sup>Sanechips Technology, Chengdu, China
* [ ] Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision (_arXiv_ '23\_) [📖](https://arxiv.org/abs/2309.14181), [👨‍💻](https://github.com/VQAssessment/Q-Bench)

> [Haoning Wu](https://teowu.github.io/)<sup>1\*</sup>, [Zicheng Zhang](https://github.com/zzc-1998)<sup>2\*</sup>, [Erli Zhang](https://github.com/ZhangErliCarl/)<sup>1\*</sup>, [Chaofeng Chen](https://chaofengc.github.io/)<sup>1</sup>, [Liang Liao](https://liaoliang92.github.io/)<sup>1</sup>, [Annan Wang](https://github.com/AnnanWangDaniel)<sup>1</sup>, [Chunyi Li](https://github.com/lcysyzxdxc)<sup>2</sup>, [Wenxiu Sun](https://wenxiusun.com/)<sup>3</sup>, [Qiong Yan](https://scholar.google.com/citations?user=uT9CtPYAAAAJ\&hl=en)<sup>3</sup>, [Guangtao Zhai](https://ee.sjtu.edu.cn/en/FacultyDetail.aspx?id=24\&infoid=153\&flag=153)<sup>2</sup>, [Weisi Lin](https://personal.ntu.edu.sg/wslin/Home.html)<sup>1#</sup>
>
> <sup>1</sup>Nanyang Technological University, <sup>2</sup>Shanghai Jiaotong University, <sup>3</sup>Sensetime Research
### Image Matching

**特征匹配**,图像匹配问题。
Expand Down Expand Up @@ -261,37 +311,6 @@ _Some of the above papers also contain methods to detect tampered images generat
* [ ] EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation **\[**[**Paper**](https://arxiv.org/abs/2205.14756)**]** **\[**[**Code**](https://github.com/mit-han-lab/efficientvit)**]**
* [ ] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation **\[**[**Paper**](https://arxiv.org/abs/2303.11797)**]** **\[**[**Code**](https://github.com/KU-CVLAB/CAT-Seg)**]** **\[**[**Project**](https://ku-cvlab.github.io/CAT-Seg/)**]** **\[**[**Note\_community**](https://blog.csdn.net/P\_LarT/article/details/131083586)**]**

### Low Level Vision

Related resources:

* [https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision](https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision)
* [https://github.com/lcybuzz/Low-Level-Vision-Paper-Record](https://github.com/lcybuzz/Low-Level-Vision-Paper-Record)

Low-level tasks include super-resolution, denoise, dehze, low-light enhancement, etc. High-level tasks include classification, detection, segmentation, etc. segmentation, and so on. However, the ones I have listed here are probably still mainly related to tampering detection.

> Testing the new layout of paper title.
>
> 📖Paper, 👨‍💻Code, 📦Dataset, 🔗Other links, 📜News,
>
> \*Equal contribution. #Corresponding author.
* [ ] (**EVP**) Explicit Visual Prompting for Low-Level Structure Segmentations (_CVPR '23_) [📖](https://arxiv.org/abs/2303.10883), [👨‍💻](https://github.com/NiFangBaAGe/Explicit-Visual-Prompt) (_including defocus blur, shadow, forgery, camouflaged dection_)

> [Weihuang Liu](https://github.com/nifangbaage)<sup>1</sup>, [Xi Shen](https://xishen0220.github.io/)<sup>2</sup>, [Chi-Man Pun](https://www.cis.um.edu.mo/\~cmpun/)<sup>#,1</sup>, [Xiaodong Cun](https://vinthony.github.io/)<sup>#,2</sup>
>
> <sup>1</sup>University of Macau <sup>2</sup>Tencent AI Lab
* [ ] SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device (_ICCV '23_) [📖](https://arxiv.org/abs/2308.08137), [👨‍💻](https://github.com/sanechips-multimedia/syenet)

> [Weiran Gou](https://github.com/WeiranGou)<sup>∗1,2</sup>, Ziyao Yi<sup>∗1,2</sup>, Yan Xiang<sup>1,2</sup>, Shaoqing Li<sup>1,2</sup>, Zibin Liu<sup>1,2</sup>, Dehui Kong<sup>1,2</sup>, Ke Xu<sup>#1,2</sup>
>
> <sup>1</sup>State Key Laboratory of Mobile Network and Mobile Multimedia Technology, <sup>2</sup>Sanechips Technology, Chengdu, China
* [ ] Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision (_arXiv_ '23\_) [📖](https://arxiv.org/abs/2309.14181), [👨‍💻](https://github.com/VQAssessment/Q-Bench)

> [Haoning Wu](https://teowu.github.io/)<sup>1\*</sup>, [Zicheng Zhang](https://github.com/zzc-1998)<sup>2\*</sup>, [Erli Zhang](https://github.com/ZhangErliCarl/)<sup>1\*</sup>, [Chaofeng Chen](https://chaofengc.github.io/)<sup>1</sup>, [Liang Liao](https://liaoliang92.github.io/)<sup>1</sup>, [Annan Wang](https://github.com/AnnanWangDaniel)<sup>1</sup>, [Chunyi Li](https://github.com/lcysyzxdxc)<sup>2</sup>, [Wenxiu Sun](https://wenxiusun.com/)<sup>3</sup>, [Qiong Yan](https://scholar.google.com/citations?user=uT9CtPYAAAAJ\&hl=en)<sup>3</sup>, [Guangtao Zhai](https://ee.sjtu.edu.cn/en/FacultyDetail.aspx?id=24\&infoid=153\&flag=153)<sup>2</sup>, [Weisi Lin](https://personal.ntu.edu.sg/wslin/Home.html)<sup>1#</sup>
>
> <sup>1</sup>Nanyang Technological University, <sup>2</sup>Shanghai Jiaotong University, <sup>3</sup>Sensetime Research
### Useful Links

1. ICCV 2023 Paper List <https://huggingface.co/spaces/ICCV2023/ICCV2023-papers>
Expand Down
21 changes: 10 additions & 11 deletions image-forgery/2023/tbformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,43 +24,42 @@ description: 'TBFormer: Two-Branch Transformer for Image Forgery Localization'

![TBFormer网络框架图](https://s2.loli.net/2023/03/16/gzTnDclFH6BKsqu.png)

如图所示,一个RGB分支和一个噪声分支,RGB颜色域的图片 $\boldsymbol{I}_{c} \in \mathbb{R}^{H \times W \times 3}$ 经过BayarConv (Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection) 网络得到图片的噪声图$\boldsymbol{I}_{n} \in \mathbb{R}^{H \times W \times 3}$ ,然后将RGB图片划分为多个16 x 16的patch,$\boldsymbol{X}_{c}=\left\{\boldsymbol{x}_{c}^{(1)}, \boldsymbol{x}_{c}^{(2)}, \cdots, \boldsymbol{x}_{c}^{(N)}\right\}$, where $\boldsymbol{x}_{c}^{(i)} \in \mathbb{R}^{16 \times 16 \times 3}$ and $N=H / 16 \times W / 16$ 组成序列,通过线性映射,序列中的每个图片块都会被reshape成1维向量,而一维向量组成的序列就构成了块嵌入序列,patch embedding sequence $\boldsymbol{P}_{c}=\left\{\boldsymbol{p}_{c}^{(1)}, \boldsymbol{p}_{c}^{(2)}, \cdots, \boldsymbol{p}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$;而对应的位置编码则是如图中所示,直接分别加到对应的嵌入序列里面,组成最后的输入序列。$\boldsymbol{E}_{c}=\left\{\boldsymbol{e}_{c}^{(1)}, \boldsymbol{e}_{c}^{(2)}, \ldots, \boldsymbol{e}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$, where $\boldsymbol{e}_{c}^{(i)}=\boldsymbol{p}_{c}^{(i)}+\operatorname{pos}_{c}^{(i)}$
如图所示,一个RGB分支和一个噪声分支,RGB颜色域的图片 $\boldsymbol{I}_{c} \in \mathbb{R}^{H \times W \times 3}$ 经过BayarConv (Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection) 网络得到图片的噪声图$\boldsymbol{I}_{n} \in \mathbb{R}^{H \times W \times 3}$ ,然后将RGB图片划分为多个16 x 16的patch,$\boldsymbol{X}_{c}=\left\{\boldsymbol{x}_{c}^{(1)}, \boldsymbol{x}_{c}^{(2)}, \cdots, \boldsymbol{x}_{c}^{(N)}\right\}$, where $\boldsymbol{x}_{c}^{(i)} \in \mathbb{R}^{16 \times 16 \times 3}$ and $N=H / 16 \times W / 16$ 组成序列,通过线性映射,序列中的每个图片块都会被reshape成1维向量,而一维向量组成的序列就构成了块嵌入序列,patch embedding sequence $\boldsymbol{P}_{c}=\left\{\boldsymbol{p}_{c}^{(1)}, \boldsymbol{p}_{c}^{(2)}, \cdots, \boldsymbol{p}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$;而对应的位置编码则是如图中所示,直接分别加到对应的嵌入序列里面,组成最后的输入序列。$\boldsymbol{E}_{c}=\left\{\boldsymbol{e}_{c}^{(1)}, \boldsymbol{e}_{c}^{(2)}, \ldots, \boldsymbol{e}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$, where $\boldsymbol{e}_{c}^{(i)}=\boldsymbol{p}_{c}^{(i)}+\text{pos}_{c}^{(i)}$

接着将输入序列喂进由12个Transformer层(多头自注意力模块和一个多层感知模块(这不就是CNN?MLP))组成的特征提取器,然后收集第4,8,12层的输出$\boldsymbol{T}_{c}^{(4)}, \boldsymbol{T}_{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}$;

$$
\boldsymbol{T}_{c}=\left\{\boldsymbol{T}_{c}^{(4)}, \boldsymbol{T}_{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}\right\}=f_{c}\left(\boldsymbol{E}_{c}\right)
$$

$$
\begin{aligned}
\boldsymbol{M}_{c}^{(i)} & =\operatorname{MSA}_{c}^{(i)}\left(\operatorname{LN}\left(\boldsymbol{T}_{c}^{(i-1)}\right)\right)+\boldsymbol{T}_{c}^{(i-1)} \\
\boldsymbol{T}_{c}^{(i)} & =\operatorname{MLP}_{c}^{(i)}\left(\operatorname{LN}\left(\boldsymbol{M}_{c}^{(i)}\right)\right)+\boldsymbol{M}_{c}^{(i)}
\boldsymbol{M}_{c}^{(i)} & =\text{MSA}_{c}^{(i)}\left(\text{LN}\left(\boldsymbol{T}_{c}^{(i-1)}\right)\right)+\boldsymbol{T}_{c}^{(i-1)} \\
\boldsymbol{T}_{c}^{(i)} & =\text{MLP}_{c}^{(i)}\left(\text{LN}\left(\boldsymbol{M}_{c}^{(i)}\right)\right)+\boldsymbol{M}_{c}^{(i)}
\end{aligned}
$$

$$
\operatorname{SA}_{c}^{(i)}\left(\boldsymbol{T}_{c}^{(i-1)}\right)=\operatorname{softmax}\left(\boldsymbol{Q}_{c}^{(i)}\left(\boldsymbol{K}_{c}^{(i)}\right)^{\mathrm{T}} / \sqrt{L}\right) \boldsymbol{V}_{c}^{(i)}
\text{SA}_{c}^{(i)}\left(\boldsymbol{T}_{c}^{(i-1)}\right)=\text{softmax}\left(\boldsymbol{Q}_{c}^{(i)}\left(\boldsymbol{K}_{c}^{(i)}\right)^{\mathrm{T}} / \sqrt{L}\right) \boldsymbol{V}_{c}^{(i)}
$$

$$
\boldsymbol{Q}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \\\boldsymbol{K}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \\
\boldsymbol{V}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cV}}^{(i)}, \\
and \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \boldsymbol{W}_{\mathrm{cV}}^{(i)}
\boldsymbol{Q}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \\\boldsymbol{K}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \\\boldsymbol{V}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cV}}^{(i)}, \\and \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \boldsymbol{W}_{\mathrm{cV}}^{(i)}
$$

同样的,在噪声分支上,以相同的模块,但是不共享的权重。紧接着来到AHFM模块,进行两个分支的特征的融合。由于两个分支的特征相差较大,所以在注意力感知层次特征模块里面,作者构建了一个位置注意力模块(position attention PA)模块。如下图所示,分别将从特征提取器第4/8/12层得到的特征图,首先经过转置然后reshape成三维向量;接着将两个分支的转置变换后的特征相加(concatenate,以通道维度),再经过卷积,再次经过三个不同卷积核的卷积,得到三个新的特征图,再经过softmax得到位置注意力权重,最后进一步得到融合的特征图。以同样的方式得到第八层,12层,三个融合后的特征图,经过逐个元素的相加,3*3的卷积,批标准化,ReLU激活得encoder阶段最后的融合的特征图。

![位置注意力模块](https://s2.loli.net/2023/03/21/raUZ68HEn7mDxfX.png)
$$
\boldsymbol{A}^{(4)}=\operatorname{softmax}\left(\left(\boldsymbol{T}^{\left(4 \_1\right)}\right)^{\mathrm{T}} \boldsymbol{T}^{\left(4 \_2\right)}\right)
\boldsymbol{A}^{(4)}=\text{softmax}\left(\left(\boldsymbol{T}^{\left(4 \_1\right)}\right)^{\mathrm{T}} \boldsymbol{T}^{\left(4 \_2\right)}\right)
$$

$$
\boldsymbol{Z}^{(4)}=\operatorname{Conv}^{(4)}\left(\alpha^{(4)}\left(\boldsymbol{T}^{\left(4 \_3\right)} \boldsymbol{A}^{(4)}\right)_{\text {reshape }} \oplus \hat{\boldsymbol{T}}^{(4)}\right)
\boldsymbol{Z}^{(4)}=\text{Conv}^{(4)}\left(\alpha^{(4)}\left(\boldsymbol{T}^{\left(4 \_3\right)} \boldsymbol{A}^{(4)}\right)_{\text {reshape }} \oplus \hat{\boldsymbol{T}}^{(4)}\right)
$$

$$
\boldsymbol{Z}=\operatorname{Conv}\left(\boldsymbol{Z}^{(12)} \oplus \boldsymbol{Z}^{(8)} \oplus \boldsymbol{Z}^{(4)}\right)
\boldsymbol{Z} = \text{Conv}\left(\boldsymbol{Z}^{(12)} \oplus \boldsymbol{Z}^{(8)} \oplus \boldsymbol{Z}^{(4)}\right)
$$

接着来到了解码阶段,直接当做语义分割的任务来对待,设置两个可学习的类别嵌入(真实的,篡改的)来进一步学习真实的和篡改的特征表示,这两个类别嵌入和融合特征的块嵌入一起输入到解码器的两个Transformer层里面,来得到预测掩码。为了得到融合特征的块嵌入,patch embeddings,首先是reshape,transpose,然后线性映射;这些嵌入和类别嵌入一起输入到Transformer层,经过正则化上采样等操作得到最后的预测掩码。
Expand All @@ -69,7 +68,7 @@ $$
$$

$$
\boldsymbol{M}=\operatorname{softmax}(\operatorname{Upsample}(\boldsymbol{Y}))
\boldsymbol{M}=\text{softmax}(\text{Upsample}(\boldsymbol{Y}))
$$

如上述公式,Z代表encoder的融合特征的嵌入,而S代表类别嵌入,经过proj(线性映射函数)以及L2(正则)的带最后的量化值Y,然后再经过对Y的上采样,得到M 预测掩码。
Expand Down
5 changes: 5 additions & 0 deletions image-harmonization/pih.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# PIH

Semi-supervised Parametric Real-world Image Harmonization

作者使用双流半监督训练策略来预测全局的RGB曲线和局部的阴影掩码,实现了第一个解决局部阴影调整问题的图像融合。

1 comment on commit 67c724b

@vercel
Copy link

@vercel vercel bot commented on 67c724b Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

papers – ./

papers-greatzh.vercel.app
papers-git-main-greatzh.vercel.app
papersofjz.vercel.app

Please sign in to comment.