This repository provides a Python implementation to extract multimodal features from images and texts, either high-level ones from pretrained deep learning models (e.g., CNNs-extracted embeddings), or low-level ones (e.g., color and shape).
List of publications that used the codes from this repository:
- A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems (accepted at CVFAD@CVPR2021)
- V-Elliot: Design, Evaluate and Tune Visual Recommender Systems (accepted at RecSys2021)
- Leveraging Content-Style Item Representation for Visual Recommendation (accepted at ECIR2022)
- Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews (accepted at DL4SR@CIKM2022)
The list will be constantly updated. If any of your works is missing, please contact me (daniele.malitesta@poliba.it)!
... and remember to cite us:
@inproceedings{DBLP:conf/cvpr/DeldjooNMM21,
author = {Yashar Deldjoo and
Tommaso Di Noia and
Daniele Malitesta and
Felice Antonio Merra},
title = {A Study on the Relative Importance of Convolutional Neural Networks
in Visually-Aware Recommender Systems},
booktitle = {{CVPR} Workshops},
pages = {3961--3967},
publisher = {Computer Vision Foundation / {IEEE}},
year = {2021}
}
Table of Contents:
To begin with, please make sure your system has these installed:
- Python 3.6.8
- CUDA 10.1
- cuDNN 7.6.4
Then, install all required Python dependencies with the command:
pip install -r requirements.txt
Finally, you are supposed to structure the dataset folders in the following way:
# EXAMPLE VISUAL DATA
./data
amazon_baby/
original/
images/
0.jpg
1.jpg
...
amazon_boys_girls/
original/
images/
0.jpg
1.jpg
...
# EXAMPLE TEXTUAL DATA
./data
amazon_baby
original/
all_items_descriptions.tsv
amazon_boys_girls/
original/
all_items_descriptions.tsv
To classify images and extract visual features from them, please run the following script:
python classify_extract_visual.py \
--gpu <gpu-id>
--dataset <dataset-name> \
--model_name <list-of-cnns> \
--cnn_output_name <list-of-output-names-for-each-cnn> \
--cnn_output_shape <list-of-output-shapes-for-each-cnn> \
--cnn_output_split <whether-to-store-separately-output-features-or-not> \
--category_dim <dimension-for-dimensionality-reduction> \
--print_each <print-status-each>
The input parameters model_name
, cnn_output_name
, and cnn_output_shape
are lists of values for whom there must exist a correspondence across all the lists, e.g., model_name[0] --> VGG19
, cnn_output_name[0] --> fc2
, cnn_output_shape[0] --> ()
. Setting the output shape as ()
means no reshape is performed after extraction.
- Principal Component Analysis (PCA)
The script will generate three output files, namely:
classes_<model_name>.csv
, a csv file with the classification outcomes for the input images and the adopted modelcnn_features_<model_name>_<output_name>.npy
, a npy file with the extracted features for the input images, the adopted model and extraction layercnn_features_<model_name>_<output_name>_pca<dim>.npy
, a npy file with the extracted features for the input images, the adopted model, extraction layer, and reduction dimension.
N.B. Depending on how you set the argument --cnn_output_split
, you may store a unique numpy array (see above), or different numpy arrays, one for each extracted visual feature (in this case, they will be stored to the directory cnn_features_<model_name>_<output_name>/
or cnn_features_<model_name>_<output_name>_pca<dim>/
).
To extract textual features from texts, please run the following script:
python extract_textual.py \
--gpu <gpu-id>
--dataset <dataset-name> \
--model_name <list-of-textual-encoders> \
--text_output_split <whether-to-store-separately-output-features-or-not>
--column <column-to-encode>
--print_each <print-status-each>
Please, refer to SentenceTransformers for an indication of the available pre-trained models.
The script will generate three output files, namely:
text_features_<model_name>.npy
, a npy file with the extracted features for the input texts and the adopted model
N.B. Depending on how you set the argument --text_output_split
, you may store a unique numpy array (see above), or different numpy arrays, one for each extracted textual feature (in this case, they will be stored to the directory text_features_<model_name>/
).
This section refers to the novel metric visual diversity (VisDiv), proposed in our paper A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems.
To calculate the VisDiv, please run the following script:
python evaluate_visual_profile.py \
--dataset <dataset-name> \
--image_feat_extractors <list-of-image-feature-extractors> \
--visual_recommenders <list-of-visual-recommenders> \
--top_k <top-k-to-calculate-visdiv-on> \
--save_plots <whether-to-save-the-output-plots>
To run, the script requires the folder with the obtained recommendation results. It must be formatted in the following way:
./results/
amazon_baby_vgg19/
VBPR.tsv
DeepStyle.tsv
...
amazon_boys_girls_resnet50/
ACF.tsv
VNPR.tsv
...
where each tsv file refers to the recommendation lists produced by the best performing configuration for each visual recommender.
The script will generate the following outputs, namely:
./plots/<dataset-name>_<top-k>/<visual-recommender>/<image-feature-extractor>/u_<user-id>.pdf
, a set of pdf files having the t-SNE graphical representation of the VisDiv for each user./plots/<dataset-name>_<top-k>/<visual-recommender>/<image-feature-extractor>/all_users_stats.csv
, a csv file to store all VisDiv values for each user./plots/<dataset-name>_<top-k>/<visual-recommender>/<image-feature-extractor>/final_stats.out
, a txt file to store the final statistics for the VisDiv metric
Daniele Malitesta (daniele.malitesta@poliba.it)