The ETAB benchmark suite encapsulates a diverse set of tasks that are meant to test the quality of visual representations of echocardiograms with respect to different downstream setups of interest across different datasets. The benchmark tasks fall in four different catgeories: 🔴 cardiac structure identification tasks where the goal is to automatically identify anatomical regions of interest, 🔵 cardiac function estimation tasks where the goal is to evaluate cardiac hemodynamics and left ventricle measurements, 🟢 view recognition tasks where the goal is to automate view annotations for echocardiography clips, and 🟡 clinical prediction tasks where the goal is to predict clinical outcomes or issue diagnoses based on observed echocardiograms. Combinations of these tasks constitute adaptation benchmarks that can be used to evaluate transferrability of features across views, data sets and annotations. In this Section, we provide an overview of the ETAB benchmark suite and the supported built-in vision models, along with code snippets and demo notebooks illustrating how users can run a benchmark experiment out-of-the-box.
Each benchmark is encoded with a 5-character code that designates the source dataset, the echocardiography view and the downstream task. The structure of the benchmark code follows the layout below:
Benchmark code (5 characters) |
||
Task code (2 characters) |
View code (2 characters) |
Dataset code (1 character) |
The 1-character dataset code can be interpreted using the following table:
Dataset code |
|||
EchoNet |
E |
CAMUS |
C |
TMED |
T |
Unity |
U |
View code |
|||
An |
Apical n-chamber |
||
PL |
Parasternal long axis |
||
PS |
Parasternal short axi |
Currently, ETAB includes 9 core tasks across the 4 task categories. The list of all tasks and their corresponding 2-character codes are summarized in the table below. Tasks with strikethrough marks are still under implementation and will be included in the next release.
Task code |
Description |
Datasets (Views) |
🔴 Cardiac Structure Identification Tasks (Category: a) | ||
0 | Segmenting the left ventricle (LV) | EchoNet (AP4CH), CAMUS (AP2CH and AP4CH) |
1 | Segmenting the left atrium (LA) | CAMUS (AP2CH and AP4CH) |
2 | Segmenting the myocardial wall (MY) | CAMUS (AP2CH and AP4CH) |
🔵 Cardiac Function Estimation Tasks (Category: b) | ||
0 | Estimating LV ejection fraction | EchoNet (AP4CH), CAMUS (AP2CH and AP4CH) |
1 | Classifying end-systole and end-diastole frames | EchoNet (AP4CH), CAMUS (AP2CH and AP4CH) |
2 | ||
3 | ||
4 | ||
🟢 View Recognition Tasks (Category: c) | ||
0 | Classifying apical 2- and 4-chamber views | CAMUS (AP2CH vs. AP4CH) |
1 | Classifying parasternal short and long axis views | TMED (PLAX vs. PSAX) |
2 | ||
🟡 Clinical Prediction Tasks (Category: d) | ||
0 | Diagnose cardiomyopathy | EchoNet (AP4CH), CAMUS (AP2CH and AP4CH) |
1 | Diagnose aortic stenosis | TMED (PSAX and PLAX) |
benchmark_code = "a0-A4-E"
This code designates the benchmark task of segmenting the LV using apical 4-chamber echoes sampled from the EchoNet dataset.
The ETAB library provides a unified API for training a number of baseline models on all the benchmark tasks listed above. Each baseline model comprises a backbone representation and a task-specific head as illustrated below.
The backbone representation is a general-purpose representation of echocardiographic images (or clips) that is independent of the task, whereas the head changes based on the task. The backbone representations supported in ETAB fall into two categories: convolutional neural networks and vision transformers. The list of all backbone representations in ETAB are listed below.
Available backbones |
Reference |
|
Convolutional Neural Networks (CNN) | ResNet | |
ResNeXt | ||
DenseNet | ||
Inception | ||
MobileNet | ||
ConvNeXt | ||
Vision Transformers (ViT) | Mix Transformer encoders (MiT) | |
Pyramid Vision Transformer (PVT) | ||
Multi-scale vision Transformer (ResT) | ||
PoolFormer | ||
UniFormer | ||
Dual Attention Vision Transformers (DaViT) |
The set of all available task-specific heads (for classification, regression and segmentation tasks) are listed below.
Available task-specific heads |
Reference |
|
Classification and regression heads (still image) | Standard linear probe | --- |
Classification and regression heads (video clips) | RNN + Linear output layer | --- |
LSTM + Linear output layer | --- |
|
Segmentation heads | U-Net | |
U-Net++ | ||
MAnet | ||
Linknet | ||
PSPNet | ||
DeepLabV3 | ||
SegFormer | ||
TopFormer |
To display all available baseline models, you can print the output of the available_backbones() and available_heads() functions in the etab.baselines.models modules as follows.
from etab.baselines.models import *
print(available_backbones())
print(available_heads())
Note that some benchmark tasks (e.g., estimation of LV ejection fraction) are defined with respect to video clips rather than still images, whereas other tasks and datasets are limited to 2D images. In the current release of ETAB, we restrict the backbone representations to frame embeddings and use these representations repeatedly over sequences of images and defer the modeling of the temporal correlations between these embeddings to the head through variants of RNNs. By limiting the backbone representations to frame embeddings, we can evaluate the quality of a backbone representation by tuning the attached task-specific heads across all benchmark tasks above to obtain the ETAB score as we discuss in the next Section.
Running a benchmark experiment out-of-the-box (demo notebook)
In what follows, we describe how users can run a benchmark experiments out-of-the-box using the ETAB API. Next, we will show how an experiment can be ran from the terminal using our built-in scripts.
The first step in running a benchmark experiment is to load the relevant dataset. Consider again the benchmark task "a0-A4-E". This task involves segmenting the LV using apical 4-chamber views from the EchoNet dataset. The dataset can be loaded as follows:
from etab.datasets import ETAB_dataset
echonet = ETAB_dataset(name="echonet",
target="LV_seg",
view="A4",
video=False,
normalize=True,
frame_l=224,
frame_w=224,
clip_l=1)
echonet.load_data(n_clips=7000)
train_loader, valid_loader, test_loader = training_data_split(echonet.data, train_frac=0.6, val_frac=0.1,
batch_size=batch_size, return_loaders=True)
We have covered the data loading and processing tools in the previous section. More details can be found in this demo notebook. The next step is to compose a baseline model by creating an instance of the ETABmodel class as follows.
from etab.baselines.models import ETABmodel
model = ETABmodel(task="segmentation",
backbone="ResNet-50",
head="U-Net")
The model.backbone and model.head are both torch model classes, the hyper-parameters of which can be altered by modifying the values of the attributes of model.backbone and model.head after instantiating the model. Here, we instantiate a standard segmentation model with a ResNet-50 backbone and a U-Net head, but the user can create alternative models using the options specified in the table above. Now, to start training the instantiated model on task "a0-A4-E", we need to set the optimizer and training parameters as follows:
batch_size = 32
learning_rate = 0.001
n_epoch = 100
ckpt_dir = "/directory for saving the trained model"
We can then train the model by invoking the .fit method in the ETABmodel class after passing the training and validation loaders along with the optimization and training parameters.
model.fit(train_loader,
valid_loader,
n_epoch=n_epoch,
task_code="EA40",
learning_rate=learning_rate,
ckpt_dir=ckpt_dir)
After the model is trained, we can inspect its predictions on samples from test data as follows:
inputs, ground_truths = next(iter(test_loader))
preds = model.predict(inputs.cuda())
# set index to an integer number to select a test sample
plot_segment(inputs[index, :, :, :],
preds[index, :, :],
overlay=True, color="r")
To evaluate the performance of the model on the testing sample, you can use the evaluate_model function in etab.utils.metrics as follows:
from etab.utils.metrics import *
dice_coeff = evaluate_model(model, test_loader, task_code="a0")
By passing the task code to this general-purpose evaluation function, it automatically selects the corresponding evaluation metric for the task. Because this is a segmentation task, the output is a Jaccard Index/Dice coefficient. The function computes the AUC-ROC for classification tasks and the mean square error for regression tasks.
In the example above, we have trained a model by fully optimizing all its parameters for the task at hand. In many cases, we might be interested in only tuning the task-specific head and keeping the backbone representation frozen. We can do so by calling the freeze_backbone method in ETABmodel after the model instantiation command as follows:
model = ETABmodel(task="segmentation",
backbone="ResNet-50",
head="U-Net")
model.freeze_backbone()
As we will show in the next Section, when computing the ETAB score we are interested in evaluating a pre-trained representation, hence we freeze the backbone model for all benchmark tasks and only tune the head and evaluate the pefromance of the model on test data.
To run the above experiments your self, please refer to the following demo notebook.
You can also run any benchmark task directly from the terminal using the following command:
$ python run_benchmark.py --task "a0-A4-E" --backbone "ResNet-50" --head "U-Net" --freeze_backbone False \
--train_frac 0.6 --val_frac 0.1 --lr 0.001 --epochs 100 --batch 32
To run a task adaptation benchmark, where the backbone representation is trained on a source task and then tuned on a target task, you can use the following command:
$ python run_benchmark.py --source_task "a0-A4-E" --target_task "a1-A2-C" --backbone "ResNet-50" --head "U-Net" \
--freeze_backbone False --train_frac 0.6 --val_frac 0.1 --lr 0.001 --epochs 100 --batch 32
In the example above, the experiment will proceed by training a model to segment the LV using AP4CH views in EchoNet data, and then tune the resulting model to segment the LA using A2CH views in CAMUS dataset.
Our model API builds on the implementations of the following libraries and resources:
[2] https://github.com/sithu31296/semantic-segmentation