Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility Assisstance #24

Open
kanlions opened this issue Jan 24, 2025 · 3 comments
Open

Reproducibility Assisstance #24

kanlions opened this issue Jan 24, 2025 · 3 comments

Comments

@kanlions
Copy link

kanlions commented Jan 24, 2025

Thank you for great research work and making it available for research community. I am particularly working on the zero shot classification. I have worked with the demo code in zero shot starter. The datasets are CRC100k (colorectal cancer tissue classification), WSSS4LUAD (LUAD tissue classification) and SICAP (Gleason pattern classification). If I am correct I see the numbers in .75 ish range and I am getting way off. Can you please confirm is it the same prompts which is available in prompts folder. Also it would be very kind if you can also tell me whether I am using correct set CRC-VAL-HE-7K, (with 8 classes), WSS4luad (training 1.14gb, val 150mb from Grand challenges) and SICAPv2 (18793 images). If any discrepancy please guide me to the correct prompts files for these 3 datasets and also data sources. It is kind of important for my understanding and future use of this model that the correct baseline is established and these 3 datasets work uniformly for the associated prompt.

Thanks in advance

@fedshyvana
Copy link
Collaborator

Did you use starter code or the ensemble example? We ensemble multiple prompts / templates similar to CLIP (except we also ensemble multiple classnames per class). Example of reproducing CRC-100K results is provided here: https://github.com/mahmoodlab/CONCH/blob/main/notebooks/zeroshot_classification_example_ensemble.ipynb.

In the paper, we report both non-ensembled (i.e. single prompt) and ensembled results.

@kanlions
Copy link
Author

Thank you very much for your reply. If I understand there are two results one is in main paper Figure 2 (c) which I am talking about. And in extended supplementary Figure 2 extended where for the three datasets are shown. Since I am struggling with reproducibility, can you please provide the following information about what subsets of images for the three datasets was used for the results and also prompts as for SICAP only primary and secondary gleason score is available. It will be very helpful if the relevant prompts of the paper are uploaded in Github as it is a VLM model so text cues may have a significant impact. My first goal is to run reproduce your method to establish baseline. As in future when I run on new set of images and cite your paper I want to be sure that correct results are generated and communicated. Again thanks for your help.

@fedshyvana
Copy link
Collaborator

Prompts are listed in supplementary tables 38 - 44. The splits are described in Methods section under Downstream evaluation datasets (CRC-100k = validation set, WSS4LUAD = subset of training set, SICAP = test set).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants