In the dynamic field of artificial intelligence, multimodal models have taken center stage for their ability to interpret complex, multi-faceted data across numerous domains like healthcare, autonomous navigation, and content recommendation systems. Our project tackles a significant challenge in this domain: enhancing the explainability of these complex models. Our Hybrid model architecture combines visual features from images, contextual information from text, and patterns from structured data, showcasing the advantage of multimodal integration in improving prediction accuracy. By integrating both tabular and textual data, we use an approach employing a JointMasking strategy for explainability, offering deeper insights into how different data modalities influence model predictions.
We've applied our methodology to the pet finder dataset, providing a multimodal strategy to improve predictions and explainations on pet adoption outcomes.
To get started with this project, follow the steps below:
- Clone the repository
git clone https://github.com/harinkris11/Explainablity-of-Multimodal-Models.git
- Install Dependencies Ensure you have Python 3.7+ installed. Navigate to the project directory and install the required Python packages:
cd Explainablity-of-Multimodal-Models
pip install -r requirements.txt
-
Harin Raja Radha Krishan - Email: hradhakrishnan@ucsd.edu
-
Sai Kaushik Soma - Email: ssoma@ucsd.edu
-
Venkata Harsha Vardhan Gangala - Email: vgangala@ucsd.edu