Skip to content

Commit

Permalink
complete config file documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
themattinthehatt committed Nov 5, 2024
1 parent 8d1d6ae commit 796b3de
Show file tree
Hide file tree
Showing 5 changed files with 189 additions and 65 deletions.
3 changes: 3 additions & 0 deletions docs/source/faqs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ FAQs
Note that you can also run the ``ffmpeg`` command directly from the command line.


.. _faq_oom:

.. dropdown:: What if I encounter a CUDA out of memory error?

Model training can be GPU-memory-intensive, particularly when using unsupervised losses, the
Expand All @@ -82,6 +84,7 @@ FAQs

See :ref:`The configuration file <config_file>` section for more information about the above parameters.


.. dropdown:: Why does the network produce high confidence values for keypoints even when they are occluded?

Generally, when a keypoint is briefly occluded and its location can be resolved by the network,
Expand Down
209 changes: 164 additions & 45 deletions docs/source/user_guide/config_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,68 +27,140 @@ Data parameters
All of these parameters except ``downsample_factor`` are dataset-specific and will need to be
provided.

* ``data.image_resize_dims.height/width`` (int): images (and videos) will be resized to the specified
height and width before being processed by the network.
* ``data.image_resize_dims.height/width`` (*int*): images (and videos) will be resized to the
specified height and width before being processed by the network.
Supported values are {64, 128, 256, 384, 512}.
The height and width need not be identical.
Some points to keep in mind when selecting these values:
if the resized images are too small, you will lose resolution/details;
if they are too large, the model takes longer to train and might not train as well.

* ``data.data_dir/video_dir`` (str): update these to reflect your (absolute) local paths
* ``data.data_dir/video_dir`` (*str*): update these to reflect your (absolute) local paths

* ``data.csv_file`` (str): location of labels csv file; this should be relative to ``data_dir``
* ``data.csv_file`` (*str*): location of labels csv file; this should be relative to
``data.data_dir``

* ``data.downsample_factor`` (int, default: 2): factor by which to downsample the heatmaps relative to ``image_resize_dims``
* ``data.downsample_factor`` (*int, default: 2*): factor by which to downsample the heatmaps
relative to ``data.image_resize_dims``

* ``data.num_keypoints`` (int): the number of body parts.
* ``data.num_keypoints`` (*int*): the number of body parts.
If using a mirrored setup, this should be the number of body parts summed across all views.
If using a multiview setup, this number should indicate the number of keyponts per view
(must be the same across all views).

* ``data.keypoint_names`` (list): keypoint names should reflect the actual names/order in the csv file.
* ``data.keypoint_names`` (*list*): keypoint names should reflect the actual names/order in the
csv file.
This field is necessary if, for example, you are running inference on a machine that does not
have the training data saved on it.

* ``data.mirrored_column_matches`` (list): see the :ref:`Multiview PCA documentation <unsup_loss_pcamv>`
* ``data.mirrored_column_matches`` (*list*): see the
:ref:`Multiview PCA documentation <unsup_loss_pcamv>`

* ``data.columns_for_singleview_pca`` (list): see the :ref:`Pose PCA documentation <unsup_loss_pcasv>`
* ``data.columns_for_singleview_pca`` (*list*): see the
:ref:`Pose PCA documentation <unsup_loss_pcasv>`


Training parameters
===================

The following parameters relate to model training.
Reasonable defaults are provided, though parameters like the batch sizes may need modification
depending on the size of the data and the available compute resources.
Reasonable defaults are provided, though parameters like the batch sizes
(``train_batch_size``, ``val_batch_size``, ``test_batch_size``)
may need modification depending on the size of the data and the available compute resources.
See the :ref:`FAQs <faq_oom>` for more information on memory management.

* ``training.train_batch_size``: batch size for labeled data
* ``training.imgaug`` (*str, default: dlc*): select from one of several predefined image/video
augmentation pipelines:

* ``training.min_epochs`` / ``training.max_epochs``: length of training.
* default: resizing only
* dlc: imgaug pipeline implmented in DLC 2.0 package
* dlc-top-down: dlc augmentations plus additional vertical and horizontal flips

* ``training.train_batch_size`` (*int, default: 16*): batch size for labeled data during training

* ``training.val_batch_size`` (*int, default: 32*): batch size for labeled data during validation

* ``training.test_batch_size`` (*int, default: 32*): batch size for labeled data during test

* ``training.train_prob`` (*float, default: 0.95*): fraction of labeled data used for training

* ``training.val_prob`` (*float, default: 0.05*): fraction of labeled data used for validation;
any remaining frames not assigned to train or validation sets are assigned to the test set

* ``training.train_frames`` (*float or int, default: 1*): this parameter determines how many of the
frames assigned to to training data (using ``train_prob``) are actually used for training.
This option is generally more useful for testing new algorithms rather than training production
models.
If the value is a float between 0 and 1 then it is interpreted as the fraction of total train frames.
If the value is an integer greater than 1 then it is interpreted as the number of total train frames.

.. _config_num_gpus:
* ``training.num_gpus`` (*int, default: 1*): the number of GPUs for
:ref:`multi-GPU training <multi_gpu_training>`

* ``training.num_workers`` (*int, default: 4*): number of cpu workers for data loaders

* ``training.unfreezing_epoch`` (*int, default: 20*): epoch at which backbone network weights begin
updating. A value >0 allows the smaller number of parameters in the heatmap head to adjust to
the backbone outputs first.

* ``training.min_epochs`` / ``training.max_epochs`` (*int, default: 300*): length of training.
An epoch is one full pass through the dataset.
As an example, if you have 400 labeled frames, and ``training.train_batch_size=10``, then your
dataset is divided into 400/10 = 40 batches.
One "batch" in this case is equivalent to one "iteration" for DeepLabCut.
Therefore, 300 epochs, at 40 batches per epoch, is equal to 300*40=12k total batches
(or iterations).

.. _config_num_gpus:
* ``training.num_gpus``: the number of GPUs for :ref:``multi-GPU training <multi_gpu_training>``.
* ``training.log_every_n_steps`` (*int, default: 10*): frequency to log training metrics for
tensorboard (one step is one batch)

* ``training.accumulate_grad_batches``: (experimental) number of batches to accumulate gradients
for before updating weights. Simulates larger batch sizes with memory-constrained GPUs. This
parameter is not included in the config by default and should be added manually to the
* ``training.check_val_every_n_epochs`` (*int, default: 5*): frequency to log validation metrics
for tensorboard

* ``training.ckpt_every_n_epochs`` (*int or null, default: null*): save model weights every n
epochs; must be divisible by ``training.check_val_every_n_epochs`` above.
If null, only the best weights will be saved after training, where "best" is defined as the
weights from the epoch with the lowest validation loss.

* ``training.early_stopping`` (*bool, default: false*): if false, the default is to train for the
max number of epochs and save out the best model according to the validation loss; if true, early
stopping will exit training if the validation loss continues to increase for a given number of
validation checks (see ``training.early_stop_patience`` below).

* ``training.early_stop_patience`` (*int, default: 3*): number of validation checks over which to
assess validation metrics for early stopping; this number, multiplied by
``training.ckpt_every_n_epochs``, gives the number of epochs over which the validation loss must
increase before exiting.

* ``training.rng_seed_data_pt`` (*int, default: 0*): rng seed for splitting labeled data into
train/val/test

* ``training.rng_seed_model_pt`` (*int, default: 0*): rng seed for weight initialization of the head

* ``training.lr_scheduler`` (*str, default: multisteplr*): reduce the learning rate by a certain
factor after a given number of epochs (see ``training.lr_scheduler_params.multisteplr`` below)

* ``training.lr_scheduler_params.multistep_lr``: milestones: epochs at which to reduce learning
rate; gamma: factor by which to multiply learning rate at each milestone

* ``training.uniform_heatmaps_for_nan_keypoints`` (*bool, default: true*): how to treat missing
hand labels; false to drop, true to force uniform heatmaps. True will lead to better confidence
values, while false allows for incompletely labeled data.

* ``training.accumulate_grad_batches`` (*int, default: 1*): (experimental) number of batches to
accumulate gradients for before updating weights. Simulates larger batch sizes with
memory-constrained GPUs.
This parameter is not included in the config by default and should be added manually to the
``training`` section.

* ``model.model_type``:
Model parameters
================

The following parameters relate to model architecture and unsupervised losses.

* regression: model directly outputs an (x, y) prediction for each keypoint; not recommended
* heatmap: model outputs a 2D heatmap for each keypoint
* heatmap_mhcrnn: the "multi-head convolutional RNN", this model takes a temporal window of
frames as input, and outputs two heatmaps: one "context-aware" and one "static".
The prediction with the highest confidence is automatically chosen.

* ``model.losses_to_use``: defines the unsupervised losses.
* ``model.losses_to_use`` (*list, default: []*): defines the unsupervised losses.
An empty list indicates a fully supervised model.
Each element of the list corresponds to an unsupervised loss.
For example, ``model.losses_to_use=[pca_multiview,temporal]`` will fit both a pca_multiview loss
Expand All @@ -98,12 +170,14 @@ depending on the size of the data and the available compute resources.
* pca_singleview: penalize implausible body configurations
* temporal: penalize large temporal jumps

* ``model.checkpoint``: to initialize weights from an existing checkpoint, update this parameter
to the absolute path of a pytorch .ckpt file
See the :ref:`unsupervised losses<unsupervised_losses>` page for more details on the various
losses and their associated hyperparameters.

* ``model.backbone``: a variety of pretrained backbones are available:

* resnet50_animal_ap10k (recommended): ResNet-50 pretrained on the AP-10k dataset (Yu et al 2021, AP-10k: A Benchmark for Animal Pose Estimation in the Wild)
* ``model.backbone`` (*str, default: resnet50_animal_ap10k*): a variety of pretrained backbones are
available:

* resnet50_animal_ap10k: ResNet-50 pretrained on the AP-10k dataset (Yu et al 2021, AP-10k: A Benchmark for Animal Pose Estimation in the Wild)
* resnet18: ResNet-18 pretrained on ImageNet
* resnet34: ResNet-34 pretrained on ImageNet
* resnet50: ResNet-50 pretrained on ImageNet
Expand All @@ -120,27 +194,72 @@ depending on the size of the data and the available compute resources.
* efficientnet_b2: EfficientNet-B2 pretrained on ImageNet
* vit_b_sam: Segment Anything Model (Vision Transformer Base)

See the :ref:`Unsupervised losses <unsupervised_losses>` section for more details on the various
losses and their associated hyperparameters.
Note: the file size for a single ResNet-50 network is approximately 275 MB.


* ``model.model_type`` (*str, default: heatmap*):

* regression: model directly outputs an (x, y) prediction for each keypoint; not recommended
* heatmap: model outputs a 2D heatmap for each keypoint
* heatmap_mhcrnn: the "multi-head convolutional RNN", this model takes a temporal window of
frames as input, and outputs two heatmaps: one "context-aware" and one "static".
The prediction with the highest confidence is automatically chosen.
See the :ref:`Temporal Context Network<mhcrnn>` page for more information.

A note on model checkpointing: by default the "best" model will be saved out according to the
validation loss.
If you would like to additionally save out checkpoints after a specified number of epochs, set the
field ``training.ckpt_every_n_epochs``.
The file size for a single ResNet-50 network is approximately 275 MB.
* ``model.heatmap_loss_type`` (*str, default: mse*): (experimental) loss to compute difference
between ground truth and predicted heatmaps

You may also utilize early stopping, in which model training exits early if the validation loss
does not improve after a certain number of epochs, by setting ``training.early_stopping`` to true.
Model checkpointing is still handled as described above.
* ``model.model_name`` (*str, default: test*): directory name for model saving

* ``model.checkpoint`` (*str or null, default: null*): to initialize weights from an existing
checkpoint, update this parameter to the absolute path of a pytorch .ckpt file


Video loading parameters
========================

Some arguments relate to video loading, both for semi-supervised models and when predicting new
videos with any of the models:
Some parameters relate to video loading, both for semi-supervised models and when predicting new
videos with any of the models.
The parameters may need modification depending on the size of the data and the available compute
resources.
See the :ref:`FAQs <faq_oom>` for more information on memory management.

* ``dali.base.train.sequence_length`` (*int, default: 32*): number of unlabeled frames per batch in
"regression" and "heatmap" models (i.e. "base" models that do not use temporal context frames)
* ``dali.base.predict.sequence_length`` (*int, default: 96*): batch size when predicting on a new
video with a base model
* ``dali.context.train.batch_size`` (*int, default: 16*): number of unlabeled frames per batch in
heatmap_mhcrnn model (i.e. "context" models that utilize temporal context frames)
* ``dali.context.predict.sequence_length`` (*int, default: 96*): batch size when predicting on a
new video with a "context" model

Evaluation
==========

The following parameters are used for general evaluation.

* ``eval.predict_vids_after_training`` (*bool, default: true*): if true, after training (when using
scripts/train_hydra.py) run inference with the best model on all videos located in
``eval.test_videos_directory`` (see below)

* ``eval.test_videos_directory`` (*str, default: null*): absolute path to a video directory
containing videos for prediction; used in scripts/train_hydra.py and scripts/predict_new_vids.py

* ``eval.save_vids_after_training`` (*bool, default: false*): save out an mp4 file with predictions
overlaid after running inference; used in scripts/train_hydra.py and scripts/predict_new_vids.py

* ``eval.colormap`` (*str, default: cool*): colormap options for labeled videos; options include
sequential colormaps (viridis, plasma, magma, inferno, cool, etc) and diverging colormaps (RdBu,
coolwarm, Spectral, etc)

* ``eval.confidence_thresh_for_vid`` (*float, default: 0.9*): predictions with confidence below this
value will not be plotted in the labeled videos

* ``eval.hydra_paths`` (*list, default: []*): absolute paths to hydra output folders for use with
scripts/predict_new_vids.py (see :ref:`inference <inference>` docs) and
scripts/create_fiftyone_dataset.py (see :ref:`FiftyOne <fiftyone>` docs)

* ``eval.fiftyone.dataset_name`` (*str, default: test*): name of the FiftyOne dataset

* ``dali.base.train.sequence_length`` - number of unlabeled frames per batch in ``regression`` and ``heatmap`` models (i.e. "base" models that do not use temporal context frames)
* ``dali.base.predict.sequence_length`` - batch size when predicting on a new video with a "base" model
* ``dali.context.train.batch_size`` - number of unlabeled frames per batch in ``heatmap_mhcrnn`` model (i.e. "context" models that utilize temporal context frames); each frame in this batch will be accompanied by context frames, so the true batch size will actually be larger than this number
* ``dali.context.predict.sequence_length`` - batch size when predicting on a new video with a "context" model
* ``eval.fiftyone.model_display_names`` (*list, default: [test_model]*): shorthand name for each of
the models specified in ``hydra_paths``
2 changes: 2 additions & 0 deletions docs/source/user_guide_advanced/context_frames.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _mhcrnn:

########################
Temporal Context Network
########################
Expand Down
2 changes: 1 addition & 1 deletion lightning_pose/utils/scripts.py
Original file line number Diff line number Diff line change
Expand Up @@ -756,5 +756,5 @@ def export_predictions_and_labeled_video(
ys_arr=ys_arr,
mask_array=mask_array,
filename=labeled_mp4_file,
colormap=colormap=cfg.eval.get("colormap", "cool")
colormap=cfg.eval.get("colormap", "cool")
)
38 changes: 19 additions & 19 deletions scripts/configs/config_default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ data:

training:
# select from one of several predefined image/video augmentation pipelines
# default- resizing only
# dlc- imgaug pipeline implemented in DLC 2.0 package
# dlc-top-down- dlc augmentations plus vertical and horizontal flips
# default: resizing only
# dlc: imgaug pipeline implemented in DLC 2.0 package
# dlc-top-down: dlc augmentations plus vertical and horizontal flips
imgaug: dlc
# batch size of labeled data during training
train_batch_size: 16
Expand Down Expand Up @@ -148,36 +148,36 @@ losses:
prob_threshold: 0.05

eval:
# paths to the hydra config files in the output folder, OR absolute paths to such folders.
# used in scripts/predict_new_vids.py and scripts/create_fiftyone_dataset.py
hydra_paths: [" "]
# predict? used in scripts/train_hydra.py
predict_vids_after_training: true
# str with an absolute path to a directory containing videos for prediction.
# set to null to skip automatic video prediction from train_hydra.py script
# used in scripts/train_hydra.py and scripts/predict_new_vids.py
test_videos_directory: null
# save labeled .mp4? used in scripts/train_hydra.py and scripts/predict_new_vids.py
save_vids_after_training: false
# matplotlib sequential or diverging colormap name for prediction visualization
# sequential options: viridis, plasma, magma, inferno, cool, etc.
# diverging options: RdBu, coolwarm, Spectral, etc.
colormap: "cool"
# confidence threshold for plotting a vid
confidence_thresh_for_vid: 0.90

# paths to the hydra config files in the output folder, OR absolute paths to such folders.
# used in scripts/predict_new_vids.py and scripts/create_fiftyone_dataset.py
hydra_paths: [" "]

fiftyone:
# will be the name of the dataset (Mongo DB) created by FiftyOne. for video dataset, we will append dataset_name + "_video"
# will be the name of the dataset (Mongo DB) created by FiftyOne
dataset_name: test
# if you want to manually provide a different model name to be displayed in FiftyOne
model_display_names: ["test_model"]
# whether to launch the app from the script (True), or from ipython (and have finer control over the outputs)
launch_app_from_script: false

remote: true # for LAI, must be False
address: 127.0.0.1 # ip to launch the app on.
port: 5151 # port to launch the app on.

# str with an absolute path to a directory containing videos for prediction.
# set to null to skip automatic video prediction from train_hydra.py script
# used in scripts/train_hydra.py and scripts/predict_new_vids.py
test_videos_directory: null
# matplotlib sequential or diverging colormap name for prediction visualization
# sequential options: viridis, plasma, magma, inferno, cool, etc.
# diverging options: RdBu, coolwarm, Spectral, etc.
colormap: "cool"
# confidence threshold for plotting a vid
confidence_thresh_for_vid: 0.90

callbacks:
anneal_weight:
attr_name: total_unsupervised_importance
Expand Down

0 comments on commit 796b3de

Please sign in to comment.