You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your wonderful work! I am really curious about the "clean text" after the "Textual Restoration", have you tried to decode it? Or there is no way to decode it into some words which can be understood by human but only implicit feature vector? If so, why do you claim your method as "text restoration"? It is just like some auto-regression depend on self-attention.
The text was updated successfully, but these errors were encountered:
Thanks for your attention.
We have tried decode projected words back into the image, and some of the words indeed reflect degradation patterns of restoration tasks we choose in training phase. But we have not tried to decode it into explicit text.
Thank you for your reply.
I have some dense concerns:
As you depict in your answer, you can get reasonable results by project the "clean text embedding" back to the image space, so what's the meaning of restoration in textual space? Why not implement this function in image space directly?
It is a lack of analysis of the function and result of your proposed "img-to-text" module and "text-restoration" module, which is your core claim in this paper. I notice you use simple LDM loss function during the training of phase1, so it is hard for me to understand this trained two part thotoughly.
As you utilize other SotA restoration models directly, and the enhance of psnr and ssim is less in most of your experiment, I think it is more important for you to provide more theoretical or experimental analysis about "img-to-text" module and "text-restoration" module. Probably it is more significant to use image embeddings which has more details and high level infos rather convert them into textual space.
Thanks again for your nice work which inspires me a lot!
Thank you for your wonderful work! I am really curious about the "clean text" after the "Textual Restoration", have you tried to decode it? Or there is no way to decode it into some words which can be understood by human but only implicit feature vector? If so, why do you claim your method as "text restoration"? It is just like some auto-regression depend on self-attention.
The text was updated successfully, but these errors were encountered: