TensorFlow Implementation of SwinT-ChARM (Zhu et al. ICLR 2022) #151

Nikolai10 · 2022-08-17T10:16:37Z

Nikolai10
Aug 17, 2022

Dear TFC-Community,

I have recently been working on a TF implementation of SwinT-ChARM (Zhu et al. ICLR 2022):
https://github.com/Nikolai10/SwinT-ChARM.

SwinT-ChARM can be considered as an extension to Minnen et al. ICIP 2020, where the conventional nonlinear parts are replaced by Swin-transformer blocks. As a result, SwinT-ChARM achieves a better overall rate-distortion-computation trade-off, making it a promising direction for resource-constrained environments.

In my implementation I have taken some care to closely follow the naming conventions presented by Minnen et al., without changing the overall structure. I have also included a Google Colab demo, which you can find here.

I hope this work will be useful for further exploration and learning.

Kind regards,
Nikolai

minnend · 2022-08-17T20:13:19Z

minnend
Aug 17, 2022

Cool, nice work!

I was impressed with the swint-charm paper (and appreciate their use of "CHARM" as an acronym for our entropy model, which I've now adopted). So it's great to see a TF implementation here.

It's clear at this point that the simple "four conv layers" transforms we've been using for analysis and synthesis hold back the RD performance of our models. That's hardly new -- (Cheng 2020) used a better transform and there are probably earlier references from CLIC 2019 -- but the swint-charm paper seems like a major step forward in terms of RD for a fast model.

Based on (Zhu 2022), I'm now wondering how much the attention window pattern matters in the context of compression. There have been many papers looking at alternatives that evaluate on classification tasks (cswin, axial attention, maxvit, etc.) but I don't know if their positive results transfer to dense prediction tasks in general and image/video compression specifically.

One interpretation of (Zhu 2022) is that transformers are key for the autoencoder in a compression model based on nonlinear transform coding (NTC), but some other papers use non-transformer architectures with residual connections and "simple" attention blocks and get comparable RD results lower runtime. For example:

The Devil Is in the Details: Window-based Attention for Image Compression -- they present a model very similar to swint-charm but also discuss a simpler windowed attention model with comparable performance.
ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding -- RD performance is on par with swint-charm with a faster model. I like their idea of uneven splits in CHARM as a way to better explore the RDC frontier. Mixing CHARM and a checkerboard decomposition is also a good idea, though I haven't seen much of an improvement over CHARM alone in my preliminary experiments (but that could be due to a missing implementation or training detail).

0 replies

Nikolai10 · 2023-10-06T14:41:07Z

Nikolai10
Oct 6, 2023
Author

Hello @minnend, thanks a lot for sharing your thoughts! You might find this interesting as well: https://github.com/Nikolai10/LIC-TCM.

I was also wondering if you are going to release the benchmarking code used in "Advancing the Rate-Distortion-Computation Frontier for Neural Image Compression". I am quite curious how exactly the overall setup looks like (for PyTorch the go-to variant would be the deepspeed profiler, is there a TensorFlow equivalent?)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow Implementation of SwinT-ChARM (Zhu et al. ICLR 2022) #151

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

TensorFlow Implementation of SwinT-ChARM (Zhu et al. ICLR 2022) #151

Nikolai10 Aug 17, 2022

Replies: 2 comments

minnend Aug 17, 2022

Nikolai10 Oct 6, 2023 Author

Nikolai10
Aug 17, 2022

minnend
Aug 17, 2022

Nikolai10
Oct 6, 2023
Author