Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand CDA to NCIT Mappings for 12 tissues & Enhance OpUberonMapper #79

Merged
merged 31 commits into from
Feb 28, 2024

Conversation

rajdeepmondaldotcom
Copy link
Collaborator

@rajdeepmondaldotcom rajdeepmondaldotcom commented Feb 26, 2024

This introduces a significant expansion to our CDA to NCIT mappings, featuring a comprehensive addition of twelve CSV files for 12 tissues for CDA to NCIT mappings. These files cover a diverse range of tissues: bone, brain, breast, cervix, colon, heart, kidney, liver, lung, pancreas, skin, and thyroid. Please find these detailed mappings organized in src/oncoexporter/ncit_mapping_files/cda_to_ncit_tissue_wise_mappings.

Furthermore, OpUberonMapper was updated within src/oncoexporter/cda/mapper/op_uberon_mapper.py, adding new terms and mappings that translate the string representations of anatomical locations into their corrosponding UBERON terms.

Kindly please take a look,

Thanks a lot,
Rajdeep

@rajdeepmondaldotcom rajdeepmondaldotcom self-assigned this Feb 26, 2024
@rajdeepmondaldotcom rajdeepmondaldotcom changed the title Expand CDA to NCIT Tissue Mappings for 12 tissues & Enhance OpUberonMapper Expand CDA to NCIT Mappings for 12 tissues & Enhance OpUberonMapper Feb 26, 2024
@justaddcoffee
Copy link
Member

great @rajdeepmondal-el ! Maybe we can review tomorrow?

@justaddcoffee
Copy link
Member

cc: @pnrobinson

@rajdeepmondaldotcom rajdeepmondaldotcom removed their assignment Feb 26, 2024
Copy link
Member

@justaddcoffee justaddcoffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajdeepmondal-el these mappings seem very good

Manually checking these, I did spot 1-2 that probably need updating, e.g. this one should probably be NCIT:C3749 Alveolar Rhabdomyosarcoma and not NCIT:C194245 Malignant Neoplasm of Connective and Soft Tissue of Head, Face and Neck

Maybe @pnrobinson, you and I could manually review some more of these today on the call?

@rajdeepmondaldotcom
Copy link
Collaborator Author

rajdeepmondaldotcom commented Feb 27, 2024

Thanks a lot for pointing it out, Agree with you @justaddcoffee I can explain why the result is that so, and some probable ways to make the predictions even more accurate. For this specific example, the primary_diagnosis_site consists of quite a lot of spurious information that might make the model a bit confused, it is trying to be more context-aware than necessary, which I can solve by assigning weights to the primary diagnosis part.

Also, there are some edge cases which i will also share.

Copy link
Member

@justaddcoffee justaddcoffee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handing this off to @ielis

Copy link
Member

@ielis ielis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rajdeepmondal-el

thanks a lot for your excellent work!

I plugged the mapping files into the library and tested to run one of the driver script and it looks OK.

It's looking good, please feel free to merge the PR!

@rajdeepmondaldotcom rajdeepmondaldotcom merged commit ab658ed into develop Feb 28, 2024
2 checks passed
@rajdeepmondaldotcom
Copy link
Collaborator Author

Thank you very much @ielis, I have merged it.

@ielis ielis deleted the el-ncit-mappings branch February 28, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants