TensorFlow Implementation of SwinT-ChARM (Zhu et al. ICLR 2022) #151
Replies: 2 comments
-
Cool, nice work! I was impressed with the swint-charm paper (and appreciate their use of "CHARM" as an acronym for our entropy model, which I've now adopted). So it's great to see a TF implementation here. It's clear at this point that the simple "four conv layers" transforms we've been using for analysis and synthesis hold back the RD performance of our models. That's hardly new -- (Cheng 2020) used a better transform and there are probably earlier references from CLIC 2019 -- but the swint-charm paper seems like a major step forward in terms of RD for a fast model. Based on (Zhu 2022), I'm now wondering how much the attention window pattern matters in the context of compression. There have been many papers looking at alternatives that evaluate on classification tasks (cswin, axial attention, maxvit, etc.) but I don't know if their positive results transfer to dense prediction tasks in general and image/video compression specifically. One interpretation of (Zhu 2022) is that transformers are key for the autoencoder in a compression model based on nonlinear transform coding (NTC), but some other papers use non-transformer architectures with residual connections and "simple" attention blocks and get comparable RD results lower runtime. For example:
|
Beta Was this translation helpful? Give feedback.
-
Hello @minnend, thanks a lot for sharing your thoughts! You might find this interesting as well: https://github.com/Nikolai10/LIC-TCM. I was also wondering if you are going to release the benchmarking code used in "Advancing the Rate-Distortion-Computation Frontier for Neural Image Compression". I am quite curious how exactly the overall setup looks like (for PyTorch the go-to variant would be the deepspeed profiler, is there a TensorFlow equivalent?) |
Beta Was this translation helpful? Give feedback.
-
Dear TFC-Community,
I have recently been working on a TF implementation of SwinT-ChARM (Zhu et al. ICLR 2022):
https://github.com/Nikolai10/SwinT-ChARM.
SwinT-ChARM can be considered as an extension to Minnen et al. ICIP 2020, where the conventional nonlinear parts are replaced by Swin-transformer blocks. As a result, SwinT-ChARM achieves a better overall rate-distortion-computation trade-off, making it a promising direction for resource-constrained environments.
In my implementation I have taken some care to closely follow the naming conventions presented by Minnen et al., without changing the overall structure. I have also included a Google Colab demo, which you can find here.
I hope this work will be useful for further exploration and learning.
Kind regards,
Nikolai
Beta Was this translation helpful? Give feedback.
All reactions