Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about the explicit text? #2

Open
lyf1212 opened this issue Jan 7, 2024 · 2 comments
Open

What about the explicit text? #2

lyf1212 opened this issue Jan 7, 2024 · 2 comments

Comments

@lyf1212
Copy link

lyf1212 commented Jan 7, 2024

Thank you for your wonderful work! I am really curious about the "clean text" after the "Textual Restoration", have you tried to decode it? Or there is no way to decode it into some words which can be understood by human but only implicit feature vector? If so, why do you claim your method as "text restoration"? It is just like some auto-regression depend on self-attention.

@mrluin
Copy link
Owner

mrluin commented Mar 11, 2024

Thanks for your attention.
We have tried decode projected words back into the image, and some of the words indeed reflect degradation patterns of restoration tasks we choose in training phase. But we have not tried to decode it into explicit text.

@lyf1212
Copy link
Author

lyf1212 commented Mar 13, 2024

Thank you for your reply.
I have some dense concerns:

  1. As you depict in your answer, you can get reasonable results by project the "clean text embedding" back to the image space, so what's the meaning of restoration in textual space? Why not implement this function in image space directly?
  2. It is a lack of analysis of the function and result of your proposed "img-to-text" module and "text-restoration" module, which is your core claim in this paper. I notice you use simple LDM loss function during the training of phase1, so it is hard for me to understand this trained two part thotoughly.
  3. As you utilize other SotA restoration models directly, and the enhance of psnr and ssim is less in most of your experiment, I think it is more important for you to provide more theoretical or experimental analysis about "img-to-text" module and "text-restoration" module. Probably it is more significant to use image embeddings which has more details and high level infos rather convert them into textual space.
    Thanks again for your nice work which inspires me a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants