This was an interesting competition and I would like to thank everyone involved with the organization of it. This is a simple textbook solution that heavily relies on external TMA data and strong labels.
- Inference
- libvips/pyvips Installation and Getting Started
- UBC-OCEAN - JPEG Dataset Pipeline
- UBC-OCEAN - EDA
- UBC-OCEAN - Dataset
- GitHub Repository
Masks of WSIs are resized to thumbnail sizes. Tiles of WSIs and masks are extracted from their thumbnails with stride of 384 and they are padded to 512. A MaxViT Tiny FPN model is trained on those padded tiles and masks. Segmentation model outputs are activated with sigmoid and 3x TTA (horizontal, vertical and diagonal flip) are applied after the activation.
Final segmentation mask prediction is blocky since the model was trained on tiles and merged later.
Segmentation mask predictions are cast to 8-bit integer and upsampled to original WSI size with nearest neighbor interpolation.
- WSI and their mask predictions are cropped maximum number of times with stride of 1024.
- Crops are sorted based on their mask areas in descending order
- Top 16 crops are taken and WSI label is assigned to them
Rows and columns with low standard deviation are dropped on TMAs with the function below. The purpose of this preprocessing is removing white regions and making WSIs and TMAs as similar as possible. Using higher values of threshold were dropping areas in the tissue region so the standard deviation threshold is set to 10.
def drop_low_std(image, threshold):
"""
Drop rows and columns that are below the given standard deviation threshold
Parameters
----------
image: numpy.ndarray of shape (height, width, 3)
Image array
threshold: int
Standard deviation threshold
Returns
-------
image: numpy.ndarray of shape (cropped_height, cropped_width, 3)
Cropped image array
"""
vertical_stds = image.std(axis=(1, 2))
horizontal_stds = image.std(axis=(0, 2))
cropped_image = image[vertical_stds > threshold, :, :]
cropped_image = cropped_image[:, horizontal_stds > threshold, :]
return cropped_image
Multi-label stratified kfold is used as the cross-validation scheme.
Dataset is split into 5 folds.
label
and is_tma
columns are used for stratification.
EfficientNetV2 small model is used as the backbone with a regular classification head.
CrossEntropyLoss with class weights are used as the loss function. Class weights are calculated as n / n ith class.
AdamW optimizer is used with 0.0001 learning rate. Cosine annealing scheduler is used with 0.00001 minimum learning rate.
AMP is also used for faster training and regularization.
Each fold is trained for 15 epochs and epochs with the highest balanced accuracy are selected.
Training transforms are:
- Resize TMAs to size 1024 (WSI crops are already 1024 sized)
- Magnification normalization (resize WSI to 512 and resize it back to 1024 with a random chance)
- Horizontal flip
- Vertical flip
- Random 90-degree rotation
- Shift scale rotate with 45-degree rotations and mild shift/scale augmentation
- Color jitter with strong hue and saturation
- Channel shuffle
- Gaussian blur
- Coarse dropout (cutout)
- ImageNet normalization
5 folds of EfficientNetV2 small model are used in the inference pipeline. Average of 5 folds are taken after predicting with each model.
3x TTA (horizontal, vertical and diagonal flip) are applied and average of predictions are taken.
16 crops are extracted for each WSI and average of their predictions are taken.
The average pooling order for a single image is:
- Predict original and flipped images, activate predictions with softmax and average
- Predict with all folds and average
- Predict all crops and average if WSI
The model had 86.70 OOF score (TMA: 84, WSI: 86.59) at that point but the LB score was 0.47 (private 0.52/32th-42th) which was very low.
I noticed some people were getting better LB scores with worse OOF scores and I was stuck at 0.47 for a while. I had worked on Optiver competition for 2 weeks and came back. I decided to dedicate my time to finding external data because breaking the entire pipeline and starting from scratch didn't make sense.
The most obvious one is the test set image that is classified as HGSC confidently. 16 crops are extracted from that image and HGSC label is assigned to them.
134 ovarian cancer TMAs are downloaded from here.
Classes are converted with this mapping
CLASS_MAPPING = {
'fibroma of ovary spindle cell fibroma of ovary': 'Other',
'carcinoma papillary serous': 'HGSC',
'carcinoma endometrioid': 'EC',
'lymphoma precursor B lymphoblastic': 'Other',
'carcinoma adeno': 'HGSC',
'carcinoma clear cell': 'CC',
'carcinoma mucinous': 'MC',
'carcinoma adeno mucinous': 'MC',
'seminoma dysgerminoma': 'Other'
}
This dataset is downloaded from here. HGSC label is assigned to images in the Serous directory. Images in the Non_Cancerous directory are not used. 398 ovarian cancer TMAs are found here.
Screenshots of high resolution previews are taken from here. 1221 ovarian cancer TMAs are found here.
Screenshots of high resolution previews are taken from here. 440 ovarian cancer TMAs are found here.
Images are downloaded from here. 376 ovarian cancer TMAs are found here.
Those were the sources where I found the external data.
Images | Type | HGSC | EC | CC | LGSC | MC | Other | |
---|---|---|---|---|---|---|---|---|
UBC Ocean Public Test | 16 | WSI | 16 | 0 | 0 | 0 | 0 | 0 |
Stanford Tissue Microarray Database | 134 | TMA | 37 | 11 | 4 | 0 | 4 | 78 |
kztymsrjx9 | 398 | TMA | 100 | 98 | 100 | 0 | 100 | 0 |
tissuearray.com | 1221 | TMA | 348 | 39 | 24 | 140 | 100 | 570 |
usbiolab.com | 440 | TMA | 124 | 40 | 29 | 89 | 68 | 90 |
proteinatlas.org | 376 | TMA | 25 | 155 | 0 | 63 | 133 | 0 |
Final dataset (including 16 crops per WSI) label distribution was like this
- HGSC: 4127
- EC: 2252
- CC: 1666
- MC: 1066
- LGSC: 969
- Other: 738
and image type distribution was like this
- WSI (16x 1024 crops): 8224
- TMA: 2594
All the external data are concatenated to each fold's training sets. Validation sets are not changed in order to get comparable results. OOF score is decreased from 86.70 to 83.85 but LB score jumped to 0.54. I thought this jump was related to Other class but the improvement wasn't good enough. That's when I thought private test set could have more Other classes which is very likely of Kaggle competitions. Twist of this competition was predicting TMAs and Other so private test set would likely have more of them. I decided to trust LB and selected a submission with the highest LB score. That submission scored 0.54 on public and 0.58 on private.