add papers and new sub-category of dtt

greatzh · Dec 5, 2023 · 67c724b · 67c724b · vercel · Dec 5, 2023
1 parent 5ae916d
commit 67c724b
Show file tree

Hide file tree

Showing 3 changed files with 66 additions and 43 deletions.
diff --git a/README.md b/README.md
@@ -40,9 +40,14 @@ description: >-
 #### Image Editing
 
 <details open>
-
 <summary>2023 This year</summary>
 
+* [ ] GP-Net: Image Manipulation Detection and Localization via Long-Range Modeling and Transformers (_Appl. Sci.  (IF: 2.8, not included in CCFs), MDPI, '23_) **\[[Paper](https://www.mdpi.com/2076-3417/13/21/12053)]**
+* [ ] DS-Net: Dual supervision neural network for image manipulation localization _(IET-IPR '23)_ **[[Paper](https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/ipr2.12885)]**
+* [ ] Learning to Immunize Images for Tamper Localization and Self Recovery _(TPAMI ‘23)_ **[[Paper](https://arxiv.org/pdf/2210.15902.pdf)]** 
+* [ ] Semantic-agnostic progressive subtractive network for image manipulation detection and localization _(Neurocomputing '23)_ **[[Paper](https://doi.org/10.1016/j.neucom.2023.126263)]**
+* [ ] Towards Effective Image Manipulation Detection with Proposal Contrastive Learning _(TCSVT '23)_ **[[Paper](https://arxiv.org/pdf/2210.08529.pdf)]** **[[Code](https://github.com/Sandy-Zeng/PCL)]**
+* [ ] Effective image tampering localization with multi-scale ConvNeXt feature fusion (_JVCIR '23)_ **\[**[**Paper**](https://arxiv.org/abs/2208.13739)**]** **[[Code](https://github.com/multimediaFor/ConvNeXtFF)]**
 * [ ] Evading Detection Actively: Toward Anti-Forensics against Forgery Localization (_arXiv '23_) **\[**[**Paper**](https://arxiv.org/abs/2310.10036)**]** **\[**[**Code**](https://github.com/tansq/SEAR)**]**
 * [ ] Multi-scale attention context-aware network for detection and localization of image splicing: Efficient and robust identification network _(Appl. Intell. 23')_ **\[**[**Paper**](https://link.springer.com/article/10.1007/s10489-022-04421-3)**]**
 * [ ] [ReLoc: A Restoration-Assisted Framework for Robust Image Tampering Localization](image-forgery/2023/reloc.md) (_TIFS '23_) **\[**[**Paper**](https://arxiv.org/abs/2211.03930)**]** **\[**[**Code**](https://github.com/ZhuangPeiyu/ReLoc)**]**
@@ -73,6 +78,7 @@ description: >-
 
 <summary>2022</summary>
 
+* [ ] DS-UNet: A dual streams UNet for refined image forgery localization _(InfoS '22)_ **[[Paper](https://dl.acm.org/doi/abs/10.1016/j.ins.2022.08.005)]**
 * [ ] MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task Image Manipulation Detection and Localization (_ArXiv '22_) **\[**[**Paper**](https://arxiv.org/abs/2211.03140)**]**
 * [ ] Towards JPEG-Resistant Image Forgery Detection and Localization Via Self-Supervised Domain Adaptation (_TPAMI '22_) **\[**[**Paper**](https://ieeexplore.ieee.org/document/9904872)**]**
 * [ ] ESRNet: Efficient Search and Recognition Network for Image Manipulation Detection (_TOMCCAP '22_) **\[**[**Paper**](https://doi.org/10.1145/3506853)**]** **\[**[**Tool**](https://github.com/tampered816/rrr)**]**
@@ -228,6 +234,50 @@ _Some of the above papers also contain methods to detect tampered images generat
 * [ ] [Shrinking the Semantic Gap: Spatial Pooling of Local Moment Invariants for Copy-Move Forgery Detection](copy-move/word2phrasecmfd.md) _(TIFS '23)_ **\[**[**Paper**](https://arxiv.org/abs/2207.09135)**]** **\[**[**Code**](https://github.com/ChaoWang1016/word2phraseCMFD)**]**
 * [ ] Image Copy-Move Forgery Detection via Deep Cross-Scale PatchMatch (_ICME '23_) **\[**[**Paper**](https://arxiv.org/abs/2308.04188)**]**
 
+### Tamper Text in Detection
+
+图像中的**文本篡改检测**问题 (parts of)
+
+- [ ] Towards Robust Tampered Text Detection in Document Image: New dataset and New Solution (_CVPR '23_) **\[**[**Paper**](https://openaccess.thecvf.com/content/CVPR2023/papers/Qu_Towards_Robust_Tampered_Text_Detection_in_Document_Image_New_Dataset_CVPR_2023_paper.pdf)**]** **[[Code](https://github.com/qcf-568/DocTamper)]**
+- [ ] Progressive Supervision for Tampering Localization in Document Images (_ICONIP '23_) **[[Paper](https://link.springer.com/chapter/10.1007/978-981-99-8184-7_11)]**
+- [ ] SigScatNet: A Siamese + Scattering based Deep Learning Approach for Signature Forgery Detection and Similarity Assessment _(arXiv '23)_ **[[Paper](https://arxiv.org/pdf/2311.05579.pdf)]**
+- [ ] Image Generation and Learning Strategy for Deep Document Forgery Detection _(arXiv '23)_ **[[Paper](https://arxiv.org/abs/2311.03650)]**
+- [ ] Forgery-free signature verification with stroke-aware cycle-consistent generative adversarial network _(Neurocomputing '22)_ **[[Paper](https://doi.org/10.1016/j.neucom.2022.08.017)]** **[[Code](https://github.com/KAKAFEI123/Stroke-cCycleGAN)]**
+- [ ] Document Forgery Detection in the Context of Double JPEG Compression _(ICPR '22)_ **[[Paper](https://link.springer.com/chapter/10.1007/978-3-031-37745-7_5)]**
+
+### Low Level Vision
+
+Related resources:
+
+* [https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision](https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision)
+* [https://github.com/lcybuzz/Low-Level-Vision-Paper-Record](https://github.com/lcybuzz/Low-Level-Vision-Paper-Record)
+
+Low-level tasks include super-resolution, denoise, dehze, low-light enhancement, etc. High-level tasks include classification, detection, segmentation, etc. segmentation, and so on. However, the ones I have listed here are probably still mainly related to tampering detection.
+
+> Testing the new layout of paper title.
+>
+> 📖Paper, 👨‍💻Code, 📦Dataset, 🔗Other links, 📜News,
+>
+> \*Equal contribution. #Corresponding author.
+
+* [ ] (**EVP**) Explicit Visual Prompting for Low-Level Structure Segmentations (_CVPR '23_) [📖](https://arxiv.org/abs/2303.10883), [👨‍💻](https://github.com/NiFangBaAGe/Explicit-Visual-Prompt) (_including defocus blur, shadow, forgery, camouflaged dection_)
+
+  > [Weihuang Liu](https://github.com/nifangbaage)<sup>1</sup>, [Xi Shen](https://xishen0220.github.io/)<sup>2</sup>, [Chi-Man Pun](https://www.cis.um.edu.mo/\~cmpun/)<sup>#,1</sup>, [Xiaodong Cun](https://vinthony.github.io/)<sup>#,2</sup>
+  >
+  > <sup>1</sup>University of Macau <sup>2</sup>Tencent AI Lab
+
+* [ ] SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device (_ICCV '23_) [📖](https://arxiv.org/abs/2308.08137), [👨‍💻](https://github.com/sanechips-multimedia/syenet)
+
+  > [Weiran Gou](https://github.com/WeiranGou)<sup>∗1,2</sup>, Ziyao Yi<sup>∗1,2</sup>, Yan Xiang<sup>1,2</sup>, Shaoqing Li<sup>1,2</sup>, Zibin Liu<sup>1,2</sup>, Dehui Kong<sup>1,2</sup>, Ke Xu<sup>#1,2</sup>
+  >
+  > <sup>1</sup>State Key Laboratory of Mobile Network and Mobile Multimedia Technology, <sup>2</sup>Sanechips Technology, Chengdu, China
+
+* [ ] Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision (_arXiv_ '23\_) [📖](https://arxiv.org/abs/2309.14181), [👨‍💻](https://github.com/VQAssessment/Q-Bench)
+
+  > [Haoning Wu](https://teowu.github.io/)<sup>1\*</sup>, [Zicheng Zhang](https://github.com/zzc-1998)<sup>2\*</sup>, [Erli Zhang](https://github.com/ZhangErliCarl/)<sup>1\*</sup>, [Chaofeng Chen](https://chaofengc.github.io/)<sup>1</sup>, [Liang Liao](https://liaoliang92.github.io/)<sup>1</sup>, [Annan Wang](https://github.com/AnnanWangDaniel)<sup>1</sup>, [Chunyi Li](https://github.com/lcysyzxdxc)<sup>2</sup>, [Wenxiu Sun](https://wenxiusun.com/)<sup>3</sup>, [Qiong Yan](https://scholar.google.com/citations?user=uT9CtPYAAAAJ\&hl=en)<sup>3</sup>, [Guangtao Zhai](https://ee.sjtu.edu.cn/en/FacultyDetail.aspx?id=24\&infoid=153\&flag=153)<sup>2</sup>, [Weisi Lin](https://personal.ntu.edu.sg/wslin/Home.html)<sup>1#</sup>
+  >
+  > <sup>1</sup>Nanyang Technological University, <sup>2</sup>Shanghai Jiaotong University, <sup>3</sup>Sensetime Research
+
 ### Image Matching
 
 **特征匹配**，图像匹配问题。
@@ -261,37 +311,6 @@ _Some of the above papers also contain methods to detect tampered images generat
 * [ ] EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation **\[**[**Paper**](https://arxiv.org/abs/2205.14756)**]** **\[**[**Code**](https://github.com/mit-han-lab/efficientvit)**]**
 * [ ] CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation **\[**[**Paper**](https://arxiv.org/abs/2303.11797)**]** **\[**[**Code**](https://github.com/KU-CVLAB/CAT-Seg)**]** **\[**[**Project**](https://ku-cvlab.github.io/CAT-Seg/)**]** **\[**[**Note\_community**](https://blog.csdn.net/P\_LarT/article/details/131083586)**]**
 
-### Low Level Vision
-
-Related resources:
-
-* [https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision](https://github.com/Kobaayyy/Awesome-ICCV2021-Low-Level-Vision)
-* [https://github.com/lcybuzz/Low-Level-Vision-Paper-Record](https://github.com/lcybuzz/Low-Level-Vision-Paper-Record)
-
-Low-level tasks include super-resolution, denoise, dehze, low-light enhancement, etc. High-level tasks include classification, detection, segmentation, etc. segmentation, and so on. However, the ones I have listed here are probably still mainly related to tampering detection.
-
-> Testing the new layout of paper title.
->
-> 📖Paper, 👨‍💻Code, 📦Dataset, 🔗Other links, 📜News,
->
-> \*Equal contribution. #Corresponding author.
-
-*   [ ] (**EVP**) Explicit Visual Prompting for Low-Level Structure Segmentations (_CVPR '23_) [📖](https://arxiv.org/abs/2303.10883), [👨‍💻](https://github.com/NiFangBaAGe/Explicit-Visual-Prompt) (_including defocus blur, shadow, forgery, camouflaged dection_)
-
-    > [Weihuang Liu](https://github.com/nifangbaage)<sup>1</sup>, [Xi Shen](https://xishen0220.github.io/)<sup>2</sup>, [Chi-Man Pun](https://www.cis.um.edu.mo/\~cmpun/)<sup>#,1</sup>, [Xiaodong Cun](https://vinthony.github.io/)<sup>#,2</sup>
-    >
-    > <sup>1</sup>University of Macau <sup>2</sup>Tencent AI Lab
-*   [ ] SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device (_ICCV '23_) [📖](https://arxiv.org/abs/2308.08137), [👨‍💻](https://github.com/sanechips-multimedia/syenet)
-
-    > [Weiran Gou](https://github.com/WeiranGou)<sup>∗1,2</sup>, Ziyao Yi<sup>∗1,2</sup>, Yan Xiang<sup>1,2</sup>, Shaoqing Li<sup>1,2</sup>, Zibin Liu<sup>1,2</sup>, Dehui Kong<sup>1,2</sup>, Ke Xu<sup>#1,2</sup>
-    >
-    > <sup>1</sup>State Key Laboratory of Mobile Network and Mobile Multimedia Technology, <sup>2</sup>Sanechips Technology, Chengdu, China
-*   [ ] Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision (_arXiv_ '23\_) [📖](https://arxiv.org/abs/2309.14181), [👨‍💻](https://github.com/VQAssessment/Q-Bench)
-
-    > [Haoning Wu](https://teowu.github.io/)<sup>1\*</sup>, [Zicheng Zhang](https://github.com/zzc-1998)<sup>2\*</sup>, [Erli Zhang](https://github.com/ZhangErliCarl/)<sup>1\*</sup>, [Chaofeng Chen](https://chaofengc.github.io/)<sup>1</sup>, [Liang Liao](https://liaoliang92.github.io/)<sup>1</sup>, [Annan Wang](https://github.com/AnnanWangDaniel)<sup>1</sup>, [Chunyi Li](https://github.com/lcysyzxdxc)<sup>2</sup>, [Wenxiu Sun](https://wenxiusun.com/)<sup>3</sup>, [Qiong Yan](https://scholar.google.com/citations?user=uT9CtPYAAAAJ\&hl=en)<sup>3</sup>, [Guangtao Zhai](https://ee.sjtu.edu.cn/en/FacultyDetail.aspx?id=24\&infoid=153\&flag=153)<sup>2</sup>, [Weisi Lin](https://personal.ntu.edu.sg/wslin/Home.html)<sup>1#</sup>
-    >
-    > <sup>1</sup>Nanyang Technological University, <sup>2</sup>Shanghai Jiaotong University, <sup>3</sup>Sensetime Research
-
 ### Useful Links
 
 1. ICCV 2023 Paper List <https://huggingface.co/spaces/ICCV2023/ICCV2023-papers>

diff --git a/image-forgery/2023/tbformer.md b/image-forgery/2023/tbformer.md
@@ -24,43 +24,42 @@ description: 'TBFormer: Two-Branch Transformer for Image Forgery Localization'
 
 ![TBFormer网络框架图](https://s2.loli.net/2023/03/16/gzTnDclFH6BKsqu.png)
 
-如图所示，一个RGB分支和一个噪声分支，RGB颜色域的图片 $\boldsymbol{I}_{c} \in \mathbb{R}^{H \times W \times 3}$ 经过BayarConv (Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection) 网络得到图片的噪声图$\boldsymbol{I}_{n} \in \mathbb{R}^{H \times W \times 3}$ ，然后将RGB图片划分为多个16 x 16的patch，$\boldsymbol{X}_{c}=\left\{\boldsymbol{x}_{c}^{(1)}, \boldsymbol{x}_{c}^{(2)}, \cdots, \boldsymbol{x}_{c}^{(N)}\right\}$, where $\boldsymbol{x}_{c}^{(i)} \in \mathbb{R}^{16 \times 16 \times 3}$ and $N=H / 16 \times W / 16$ 组成序列，通过线性映射，序列中的每个图片块都会被reshape成1维向量，而一维向量组成的序列就构成了块嵌入序列，patch embedding sequence $\boldsymbol{P}_{c}=\left\{\boldsymbol{p}_{c}^{(1)}, \boldsymbol{p}_{c}^{(2)}, \cdots, \boldsymbol{p}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$；而对应的位置编码则是如图中所示，直接分别加到对应的嵌入序列里面，组成最后的输入序列。$\boldsymbol{E}_{c}=\left\{\boldsymbol{e}_{c}^{(1)}, \boldsymbol{e}_{c}^{(2)}, \ldots, \boldsymbol{e}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$, where $\boldsymbol{e}_{c}^{(i)}=\boldsymbol{p}_{c}^{(i)}+\operatorname{pos}_{c}^{(i)}$
+如图所示，一个RGB分支和一个噪声分支，RGB颜色域的图片 $\boldsymbol{I}_{c} \in \mathbb{R}^{H \times W \times 3}$ 经过BayarConv (Constrained Convolutional Neural Networks: A New Approach Towards General Purpose Image Manipulation Detection) 网络得到图片的噪声图$\boldsymbol{I}_{n} \in \mathbb{R}^{H \times W \times 3}$ ，然后将RGB图片划分为多个16 x 16的patch，$\boldsymbol{X}_{c}=\left\{\boldsymbol{x}_{c}^{(1)}, \boldsymbol{x}_{c}^{(2)}, \cdots, \boldsymbol{x}_{c}^{(N)}\right\}$, where $\boldsymbol{x}_{c}^{(i)} \in \mathbb{R}^{16 \times 16 \times 3}$ and $N=H / 16 \times W / 16$ 组成序列，通过线性映射，序列中的每个图片块都会被reshape成1维向量，而一维向量组成的序列就构成了块嵌入序列，patch embedding sequence $\boldsymbol{P}_{c}=\left\{\boldsymbol{p}_{c}^{(1)}, \boldsymbol{p}_{c}^{(2)}, \cdots, \boldsymbol{p}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$；而对应的位置编码则是如图中所示，直接分别加到对应的嵌入序列里面，组成最后的输入序列。$\boldsymbol{E}_{c}=\left\{\boldsymbol{e}_{c}^{(1)}, \boldsymbol{e}_{c}^{(2)}, \ldots, \boldsymbol{e}_{c}^{(N)}\right\} \in \mathbb{R}^{N \times L}$, where $\boldsymbol{e}_{c}^{(i)}=\boldsymbol{p}_{c}^{(i)}+\text{pos}_{c}^{(i)}$
 
 接着将输入序列喂进由12个Transformer层（多头自注意力模块和一个多层感知模块(这不就是CNN？MLP)）组成的特征提取器，然后收集第4，8，12层的输出$\boldsymbol{T}_{c}^{(4)}, \boldsymbol{T}_{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}$；
+
 $$
 \boldsymbol{T}_{c}=\left\{\boldsymbol{T}_{c}^{(4)}, \boldsymbol{T}_{c}^{(8)}, \boldsymbol{T}_{c}^{(12)}\right\}=f_{c}\left(\boldsymbol{E}_{c}\right)
 $$
 
 $$
 \begin{aligned}
-\boldsymbol{M}_{c}^{(i)} & =\operatorname{MSA}_{c}^{(i)}\left(\operatorname{LN}\left(\boldsymbol{T}_{c}^{(i-1)}\right)\right)+\boldsymbol{T}_{c}^{(i-1)} \\
-\boldsymbol{T}_{c}^{(i)} & =\operatorname{MLP}_{c}^{(i)}\left(\operatorname{LN}\left(\boldsymbol{M}_{c}^{(i)}\right)\right)+\boldsymbol{M}_{c}^{(i)}
+\boldsymbol{M}_{c}^{(i)} & =\text{MSA}_{c}^{(i)}\left(\text{LN}\left(\boldsymbol{T}_{c}^{(i-1)}\right)\right)+\boldsymbol{T}_{c}^{(i-1)} \\
+\boldsymbol{T}_{c}^{(i)} & =\text{MLP}_{c}^{(i)}\left(\text{LN}\left(\boldsymbol{M}_{c}^{(i)}\right)\right)+\boldsymbol{M}_{c}^{(i)}
 \end{aligned}
 $$
 
 $$
-\operatorname{SA}_{c}^{(i)}\left(\boldsymbol{T}_{c}^{(i-1)}\right)=\operatorname{softmax}\left(\boldsymbol{Q}_{c}^{(i)}\left(\boldsymbol{K}_{c}^{(i)}\right)^{\mathrm{T}} / \sqrt{L}\right) \boldsymbol{V}_{c}^{(i)}
+\text{SA}_{c}^{(i)}\left(\boldsymbol{T}_{c}^{(i-1)}\right)=\text{softmax}\left(\boldsymbol{Q}_{c}^{(i)}\left(\boldsymbol{K}_{c}^{(i)}\right)^{\mathrm{T}} / \sqrt{L}\right) \boldsymbol{V}_{c}^{(i)}
 $$
 
 $$
-\boldsymbol{Q}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cQ}}^{(i)},  \\\boldsymbol{K}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \\
-\boldsymbol{V}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cV}}^{(i)}, \\
-and \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \boldsymbol{W}_{\mathrm{cV}}^{(i)}
+\boldsymbol{Q}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cQ}}^{(i)},  \\\boldsymbol{K}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \\\boldsymbol{V}_{c}^{(i)}=\boldsymbol{T}_{c}^{(i-1)} \boldsymbol{W}_{\mathrm{cV}}^{(i)}, \\and \boldsymbol{W}_{\mathrm{cQ}}^{(i)}, \boldsymbol{W}_{\mathrm{cK}}^{(i)}, \boldsymbol{W}_{\mathrm{cV}}^{(i)}
 $$
 
 同样的，在噪声分支上，以相同的模块，但是不共享的权重。紧接着来到AHFM模块，进行两个分支的特征的融合。由于两个分支的特征相差较大，所以在注意力感知层次特征模块里面，作者构建了一个位置注意力模块（position attention PA）模块。如下图所示，分别将从特征提取器第4/8/12层得到的特征图，首先经过转置然后reshape成三维向量；接着将两个分支的转置变换后的特征相加（concatenate，以通道维度），再经过卷积，再次经过三个不同卷积核的卷积，得到三个新的特征图，再经过softmax得到位置注意力权重，最后进一步得到融合的特征图。以同样的方式得到第八层，12层，三个融合后的特征图，经过逐个元素的相加，3*3的卷积，批标准化，ReLU激活得encoder阶段最后的融合的特征图。
 
 ![位置注意力模块](https://s2.loli.net/2023/03/21/raUZ68HEn7mDxfX.png)
 $$
-\boldsymbol{A}^{(4)}=\operatorname{softmax}\left(\left(\boldsymbol{T}^{\left(4 \_1\right)}\right)^{\mathrm{T}} \boldsymbol{T}^{\left(4 \_2\right)}\right)
+\boldsymbol{A}^{(4)}=\text{softmax}\left(\left(\boldsymbol{T}^{\left(4 \_1\right)}\right)^{\mathrm{T}} \boldsymbol{T}^{\left(4 \_2\right)}\right)
 $$
 
 $$
-\boldsymbol{Z}^{(4)}=\operatorname{Conv}^{(4)}\left(\alpha^{(4)}\left(\boldsymbol{T}^{\left(4 \_3\right)} \boldsymbol{A}^{(4)}\right)_{\text {reshape }} \oplus \hat{\boldsymbol{T}}^{(4)}\right)
+\boldsymbol{Z}^{(4)}=\text{Conv}^{(4)}\left(\alpha^{(4)}\left(\boldsymbol{T}^{\left(4 \_3\right)} \boldsymbol{A}^{(4)}\right)_{\text {reshape }} \oplus \hat{\boldsymbol{T}}^{(4)}\right)
 $$
 
 $$
-\boldsymbol{Z}=\operatorname{Conv}\left(\boldsymbol{Z}^{(12)} \oplus \boldsymbol{Z}^{(8)} \oplus \boldsymbol{Z}^{(4)}\right)
+\boldsymbol{Z} = \text{Conv}\left(\boldsymbol{Z}^{(12)} \oplus \boldsymbol{Z}^{(8)} \oplus \boldsymbol{Z}^{(4)}\right)
 $$
 
 接着来到了解码阶段，直接当做语义分割的任务来对待，设置两个可学习的类别嵌入（真实的，篡改的）来进一步学习真实的和篡改的特征表示，这两个类别嵌入和融合特征的块嵌入一起输入到解码器的两个Transformer层里面，来得到预测掩码。为了得到融合特征的块嵌入，patch embeddings，首先是reshape，transpose，然后线性映射；这些嵌入和类别嵌入一起输入到Transformer层，经过正则化上采样等操作得到最后的预测掩码。
@@ -69,7 +68,7 @@ $$
 $$
 
 $$
-\boldsymbol{M}=\operatorname{softmax}(\operatorname{Upsample}(\boldsymbol{Y}))
+\boldsymbol{M}=\text{softmax}(\text{Upsample}(\boldsymbol{Y}))
 $$
 
 如上述公式，Z代表encoder的融合特征的嵌入，而S代表类别嵌入，经过proj（线性映射函数）以及L2（正则）的带最后的量化值Y，然后再经过对Y的上采样，得到M 预测掩码。

diff --git a/image-harmonization/pih.md b/image-harmonization/pih.md
@@ -0,0 +1,5 @@
+# PIH
+
+Semi-supervised Parametric Real-world Image Harmonization
+
+作者使用双流半监督训练策略来预测全局的RGB曲线和局部的阴影掩码，实现了第一个解决局部阴影调整问题的图像融合。