-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RADIO with lora #83
Comments
The detailed parameters of RADIO-L integrated with LoRA in our experiments are as follows: |
Hello, how are you pre-processing inputs into RADIO-L? Is the data passed as RGB values in a [0,1] range? |
We initialized the image preprocessor in the following manner:
And this is an example of the input image which we fed to the RADIO model.
|
Hello, yes the dynamic range of your inputs looks OK indeed. In our LLaVA1.5 experiments we don't do a center crop. Instead we resize the image such that the longest edge becomes 768 long, keeping the aspect ratio of the input image, and padding the shortest edge to the nearest multiple of 16 pixels. We did not evaluate the model on MMBench however our results on TextVQA, VQAv2, GQA, POPE were very much in favor of RADIO-L (see the README at the root of this repository). We didn't use LoRA but we kept RADIO-L frozen, training only the projector and LLM. |
I used RADIO-L as the visual encoder for LLaVA, and added LoRA to RADIO-L in both the pretraining and finetuning stages. However, we found the following two intriguing conclusions:
I'm not certain whether the issue is caused by RADIO-L's sensitivity to resolution, or the way RADIO-L is integrated with LoRA. I am looking forward to discussing this in more depth with you.
The text was updated successfully, but these errors were encountered: