diff --git a/docs/source/faqs.rst b/docs/source/faqs.rst index 8d1eeca8..bdf47b36 100644 --- a/docs/source/faqs.rst +++ b/docs/source/faqs.rst @@ -63,6 +63,8 @@ FAQs Note that you can also run the ``ffmpeg`` command directly from the command line. +.. _faq_oom: + .. dropdown:: What if I encounter a CUDA out of memory error? Model training can be GPU-memory-intensive, particularly when using unsupervised losses, the @@ -82,6 +84,7 @@ FAQs See :ref:`The configuration file ` section for more information about the above parameters. + .. dropdown:: Why does the network produce high confidence values for keypoints even when they are occluded? Generally, when a keypoint is briefly occluded and its location can be resolved by the network, diff --git a/docs/source/user_guide/config_file.rst b/docs/source/user_guide/config_file.rst index 33f6b9bc..c0e2df17 100644 --- a/docs/source/user_guide/config_file.rst +++ b/docs/source/user_guide/config_file.rst @@ -27,44 +27,84 @@ Data parameters All of these parameters except ``downsample_factor`` are dataset-specific and will need to be provided. -* ``data.image_resize_dims.height/width`` (int): images (and videos) will be resized to the specified - height and width before being processed by the network. +* ``data.image_resize_dims.height/width`` (*int*): images (and videos) will be resized to the + specified height and width before being processed by the network. Supported values are {64, 128, 256, 384, 512}. The height and width need not be identical. Some points to keep in mind when selecting these values: if the resized images are too small, you will lose resolution/details; if they are too large, the model takes longer to train and might not train as well. -* ``data.data_dir/video_dir`` (str): update these to reflect your (absolute) local paths +* ``data.data_dir/video_dir`` (*str*): update these to reflect your (absolute) local paths -* ``data.csv_file`` (str): location of labels csv file; this should be relative to ``data_dir`` +* ``data.csv_file`` (*str*): location of labels csv file; this should be relative to + ``data.data_dir`` -* ``data.downsample_factor`` (int, default: 2): factor by which to downsample the heatmaps relative to ``image_resize_dims`` +* ``data.downsample_factor`` (*int, default: 2*): factor by which to downsample the heatmaps + relative to ``data.image_resize_dims`` -* ``data.num_keypoints`` (int): the number of body parts. +* ``data.num_keypoints`` (*int*): the number of body parts. If using a mirrored setup, this should be the number of body parts summed across all views. If using a multiview setup, this number should indicate the number of keyponts per view (must be the same across all views). -* ``data.keypoint_names`` (list): keypoint names should reflect the actual names/order in the csv file. +* ``data.keypoint_names`` (*list*): keypoint names should reflect the actual names/order in the + csv file. This field is necessary if, for example, you are running inference on a machine that does not have the training data saved on it. -* ``data.mirrored_column_matches`` (list): see the :ref:`Multiview PCA documentation ` +* ``data.mirrored_column_matches`` (*list*): see the + :ref:`Multiview PCA documentation ` -* ``data.columns_for_singleview_pca`` (list): see the :ref:`Pose PCA documentation ` +* ``data.columns_for_singleview_pca`` (*list*): see the + :ref:`Pose PCA documentation ` Training parameters =================== The following parameters relate to model training. -Reasonable defaults are provided, though parameters like the batch sizes may need modification -depending on the size of the data and the available compute resources. +Reasonable defaults are provided, though parameters like the batch sizes +(``train_batch_size``, ``val_batch_size``, ``test_batch_size``) +may need modification depending on the size of the data and the available compute resources. +See the :ref:`FAQs ` for more information on memory management. -* ``training.train_batch_size``: batch size for labeled data +* ``training.imgaug`` (*str, default: dlc*): select from one of several predefined image/video + augmentation pipelines: -* ``training.min_epochs`` / ``training.max_epochs``: length of training. + * default: resizing only + * dlc: imgaug pipeline implmented in DLC 2.0 package + * dlc-top-down: dlc augmentations plus additional vertical and horizontal flips + +* ``training.train_batch_size`` (*int, default: 16*): batch size for labeled data during training + +* ``training.val_batch_size`` (*int, default: 32*): batch size for labeled data during validation + +* ``training.test_batch_size`` (*int, default: 32*): batch size for labeled data during test + +* ``training.train_prob`` (*float, default: 0.95*): fraction of labeled data used for training + +* ``training.val_prob`` (*float, default: 0.05*): fraction of labeled data used for validation; + any remaining frames not assigned to train or validation sets are assigned to the test set + +* ``training.train_frames`` (*float or int, default: 1*): this parameter determines how many of the + frames assigned to to training data (using ``train_prob``) are actually used for training. + This option is generally more useful for testing new algorithms rather than training production + models. + If the value is a float between 0 and 1 then it is interpreted as the fraction of total train frames. + If the value is an integer greater than 1 then it is interpreted as the number of total train frames. + +.. _config_num_gpus: +* ``training.num_gpus`` (*int, default: 1*): the number of GPUs for + :ref:`multi-GPU training ` + +* ``training.num_workers`` (*int, default: 4*): number of cpu workers for data loaders + +* ``training.unfreezing_epoch`` (*int, default: 20*): epoch at which backbone network weights begin + updating. A value >0 allows the smaller number of parameters in the heatmap head to adjust to + the backbone outputs first. + +* ``training.min_epochs`` / ``training.max_epochs`` (*int, default: 300*): length of training. An epoch is one full pass through the dataset. As an example, if you have 400 labeled frames, and ``training.train_batch_size=10``, then your dataset is divided into 400/10 = 40 batches. @@ -72,23 +112,55 @@ depending on the size of the data and the available compute resources. Therefore, 300 epochs, at 40 batches per epoch, is equal to 300*40=12k total batches (or iterations). -.. _config_num_gpus: -* ``training.num_gpus``: the number of GPUs for :ref:``multi-GPU training ``. +* ``training.log_every_n_steps`` (*int, default: 10*): frequency to log training metrics for + tensorboard (one step is one batch) -* ``training.accumulate_grad_batches``: (experimental) number of batches to accumulate gradients - for before updating weights. Simulates larger batch sizes with memory-constrained GPUs. This - parameter is not included in the config by default and should be added manually to the +* ``training.check_val_every_n_epochs`` (*int, default: 5*): frequency to log validation metrics + for tensorboard + +* ``training.ckpt_every_n_epochs`` (*int or null, default: null*): save model weights every n + epochs; must be divisible by ``training.check_val_every_n_epochs`` above. + If null, only the best weights will be saved after training, where "best" is defined as the + weights from the epoch with the lowest validation loss. + +* ``training.early_stopping`` (*bool, default: false*): if false, the default is to train for the + max number of epochs and save out the best model according to the validation loss; if true, early + stopping will exit training if the validation loss continues to increase for a given number of + validation checks (see ``training.early_stop_patience`` below). + +* ``training.early_stop_patience`` (*int, default: 3*): number of validation checks over which to + assess validation metrics for early stopping; this number, multiplied by + ``training.ckpt_every_n_epochs``, gives the number of epochs over which the validation loss must + increase before exiting. + +* ``training.rng_seed_data_pt`` (*int, default: 0*): rng seed for splitting labeled data into + train/val/test + +* ``training.rng_seed_model_pt`` (*int, default: 0*): rng seed for weight initialization of the head + +* ``training.lr_scheduler`` (*str, default: multisteplr*): reduce the learning rate by a certain + factor after a given number of epochs (see ``training.lr_scheduler_params.multisteplr`` below) + +* ``training.lr_scheduler_params.multistep_lr``: milestones: epochs at which to reduce learning + rate; gamma: factor by which to multiply learning rate at each milestone + +* ``training.uniform_heatmaps_for_nan_keypoints`` (*bool, default: true*): how to treat missing + hand labels; false to drop, true to force uniform heatmaps. True will lead to better confidence + values, while false allows for incompletely labeled data. + +* ``training.accumulate_grad_batches`` (*int, default: 1*): (experimental) number of batches to + accumulate gradients for before updating weights. Simulates larger batch sizes with + memory-constrained GPUs. + This parameter is not included in the config by default and should be added manually to the ``training`` section. -* ``model.model_type``: +Model parameters +================ + +The following parameters relate to model architecture and unsupervised losses. - * regression: model directly outputs an (x, y) prediction for each keypoint; not recommended - * heatmap: model outputs a 2D heatmap for each keypoint - * heatmap_mhcrnn: the "multi-head convolutional RNN", this model takes a temporal window of - frames as input, and outputs two heatmaps: one "context-aware" and one "static". - The prediction with the highest confidence is automatically chosen. -* ``model.losses_to_use``: defines the unsupervised losses. +* ``model.losses_to_use`` (*list, default: []*): defines the unsupervised losses. An empty list indicates a fully supervised model. Each element of the list corresponds to an unsupervised loss. For example, ``model.losses_to_use=[pca_multiview,temporal]`` will fit both a pca_multiview loss @@ -98,12 +170,14 @@ depending on the size of the data and the available compute resources. * pca_singleview: penalize implausible body configurations * temporal: penalize large temporal jumps -* ``model.checkpoint``: to initialize weights from an existing checkpoint, update this parameter - to the absolute path of a pytorch .ckpt file + See the :ref:`unsupervised losses` page for more details on the various + losses and their associated hyperparameters. -* ``model.backbone``: a variety of pretrained backbones are available: - * resnet50_animal_ap10k (recommended): ResNet-50 pretrained on the AP-10k dataset (Yu et al 2021, AP-10k: A Benchmark for Animal Pose Estimation in the Wild) +* ``model.backbone`` (*str, default: resnet50_animal_ap10k*): a variety of pretrained backbones are + available: + + * resnet50_animal_ap10k: ResNet-50 pretrained on the AP-10k dataset (Yu et al 2021, AP-10k: A Benchmark for Animal Pose Estimation in the Wild) * resnet18: ResNet-18 pretrained on ImageNet * resnet34: ResNet-34 pretrained on ImageNet * resnet50: ResNet-50 pretrained on ImageNet @@ -120,27 +194,72 @@ depending on the size of the data and the available compute resources. * efficientnet_b2: EfficientNet-B2 pretrained on ImageNet * vit_b_sam: Segment Anything Model (Vision Transformer Base) -See the :ref:`Unsupervised losses ` section for more details on the various -losses and their associated hyperparameters. + Note: the file size for a single ResNet-50 network is approximately 275 MB. + + +* ``model.model_type`` (*str, default: heatmap*): + + * regression: model directly outputs an (x, y) prediction for each keypoint; not recommended + * heatmap: model outputs a 2D heatmap for each keypoint + * heatmap_mhcrnn: the "multi-head convolutional RNN", this model takes a temporal window of + frames as input, and outputs two heatmaps: one "context-aware" and one "static". + The prediction with the highest confidence is automatically chosen. + See the :ref:`Temporal Context Network` page for more information. -A note on model checkpointing: by default the "best" model will be saved out according to the -validation loss. -If you would like to additionally save out checkpoints after a specified number of epochs, set the -field ``training.ckpt_every_n_epochs``. -The file size for a single ResNet-50 network is approximately 275 MB. +* ``model.heatmap_loss_type`` (*str, default: mse*): (experimental) loss to compute difference + between ground truth and predicted heatmaps -You may also utilize early stopping, in which model training exits early if the validation loss -does not improve after a certain number of epochs, by setting ``training.early_stopping`` to true. -Model checkpointing is still handled as described above. +* ``model.model_name`` (*str, default: test*): directory name for model saving + +* ``model.checkpoint`` (*str or null, default: null*): to initialize weights from an existing + checkpoint, update this parameter to the absolute path of a pytorch .ckpt file Video loading parameters ======================== -Some arguments relate to video loading, both for semi-supervised models and when predicting new -videos with any of the models: +Some parameters relate to video loading, both for semi-supervised models and when predicting new +videos with any of the models. +The parameters may need modification depending on the size of the data and the available compute +resources. +See the :ref:`FAQs ` for more information on memory management. + +* ``dali.base.train.sequence_length`` (*int, default: 32*): number of unlabeled frames per batch in + "regression" and "heatmap" models (i.e. "base" models that do not use temporal context frames) +* ``dali.base.predict.sequence_length`` (*int, default: 96*): batch size when predicting on a new + video with a base model +* ``dali.context.train.batch_size`` (*int, default: 16*): number of unlabeled frames per batch in + heatmap_mhcrnn model (i.e. "context" models that utilize temporal context frames) +* ``dali.context.predict.sequence_length`` (*int, default: 96*): batch size when predicting on a + new video with a "context" model + +Evaluation +========== + +The following parameters are used for general evaluation. + +* ``eval.predict_vids_after_training`` (*bool, default: true*): if true, after training (when using + scripts/train_hydra.py) run inference with the best model on all videos located in + ``eval.test_videos_directory`` (see below) + +* ``eval.test_videos_directory`` (*str, default: null*): absolute path to a video directory + containing videos for prediction; used in scripts/train_hydra.py and scripts/predict_new_vids.py + +* ``eval.save_vids_after_training`` (*bool, default: false*): save out an mp4 file with predictions + overlaid after running inference; used in scripts/train_hydra.py and scripts/predict_new_vids.py + +* ``eval.colormap`` (*str, default: cool*): colormap options for labeled videos; options include + sequential colormaps (viridis, plasma, magma, inferno, cool, etc) and diverging colormaps (RdBu, + coolwarm, Spectral, etc) + +* ``eval.confidence_thresh_for_vid`` (*float, default: 0.9*): predictions with confidence below this + value will not be plotted in the labeled videos + +* ``eval.hydra_paths`` (*list, default: []*): absolute paths to hydra output folders for use with + scripts/predict_new_vids.py (see :ref:`inference ` docs) and + scripts/create_fiftyone_dataset.py (see :ref:`FiftyOne ` docs) + +* ``eval.fiftyone.dataset_name`` (*str, default: test*): name of the FiftyOne dataset -* ``dali.base.train.sequence_length`` - number of unlabeled frames per batch in ``regression`` and ``heatmap`` models (i.e. "base" models that do not use temporal context frames) -* ``dali.base.predict.sequence_length`` - batch size when predicting on a new video with a "base" model -* ``dali.context.train.batch_size`` - number of unlabeled frames per batch in ``heatmap_mhcrnn`` model (i.e. "context" models that utilize temporal context frames); each frame in this batch will be accompanied by context frames, so the true batch size will actually be larger than this number -* ``dali.context.predict.sequence_length`` - batch size when predicting on a new video with a "context" model +* ``eval.fiftyone.model_display_names`` (*list, default: [test_model]*): shorthand name for each of + the models specified in ``hydra_paths`` diff --git a/docs/source/user_guide_advanced/context_frames.rst b/docs/source/user_guide_advanced/context_frames.rst index 04b0d805..c59db2f7 100644 --- a/docs/source/user_guide_advanced/context_frames.rst +++ b/docs/source/user_guide_advanced/context_frames.rst @@ -1,3 +1,5 @@ +.. _mhcrnn: + ######################## Temporal Context Network ######################## diff --git a/lightning_pose/utils/scripts.py b/lightning_pose/utils/scripts.py index 10fe13d3..c2e64909 100644 --- a/lightning_pose/utils/scripts.py +++ b/lightning_pose/utils/scripts.py @@ -756,5 +756,5 @@ def export_predictions_and_labeled_video( ys_arr=ys_arr, mask_array=mask_array, filename=labeled_mp4_file, - colormap=colormap=cfg.eval.get("colormap", "cool") + colormap=cfg.eval.get("colormap", "cool") ) diff --git a/scripts/configs/config_default.yaml b/scripts/configs/config_default.yaml index 67c67f9c..b263dc31 100644 --- a/scripts/configs/config_default.yaml +++ b/scripts/configs/config_default.yaml @@ -27,9 +27,9 @@ data: training: # select from one of several predefined image/video augmentation pipelines - # default- resizing only - # dlc- imgaug pipeline implemented in DLC 2.0 package - # dlc-top-down- dlc augmentations plus vertical and horizontal flips + # default: resizing only + # dlc: imgaug pipeline implemented in DLC 2.0 package + # dlc-top-down: dlc augmentations plus vertical and horizontal flips imgaug: dlc # batch size of labeled data during training train_batch_size: 16 @@ -148,36 +148,36 @@ losses: prob_threshold: 0.05 eval: - # paths to the hydra config files in the output folder, OR absolute paths to such folders. - # used in scripts/predict_new_vids.py and scripts/create_fiftyone_dataset.py - hydra_paths: [" "] # predict? used in scripts/train_hydra.py predict_vids_after_training: true + # str with an absolute path to a directory containing videos for prediction. + # set to null to skip automatic video prediction from train_hydra.py script + # used in scripts/train_hydra.py and scripts/predict_new_vids.py + test_videos_directory: null # save labeled .mp4? used in scripts/train_hydra.py and scripts/predict_new_vids.py save_vids_after_training: false + # matplotlib sequential or diverging colormap name for prediction visualization + # sequential options: viridis, plasma, magma, inferno, cool, etc. + # diverging options: RdBu, coolwarm, Spectral, etc. + colormap: "cool" + # confidence threshold for plotting a vid + confidence_thresh_for_vid: 0.90 + + # paths to the hydra config files in the output folder, OR absolute paths to such folders. + # used in scripts/predict_new_vids.py and scripts/create_fiftyone_dataset.py + hydra_paths: [" "] + fiftyone: - # will be the name of the dataset (Mongo DB) created by FiftyOne. for video dataset, we will append dataset_name + "_video" + # will be the name of the dataset (Mongo DB) created by FiftyOne dataset_name: test # if you want to manually provide a different model name to be displayed in FiftyOne model_display_names: ["test_model"] # whether to launch the app from the script (True), or from ipython (and have finer control over the outputs) launch_app_from_script: false - remote: true # for LAI, must be False address: 127.0.0.1 # ip to launch the app on. port: 5151 # port to launch the app on. - # str with an absolute path to a directory containing videos for prediction. - # set to null to skip automatic video prediction from train_hydra.py script - # used in scripts/train_hydra.py and scripts/predict_new_vids.py - test_videos_directory: null - # matplotlib sequential or diverging colormap name for prediction visualization - # sequential options: viridis, plasma, magma, inferno, cool, etc. - # diverging options: RdBu, coolwarm, Spectral, etc. - colormap: "cool" - # confidence threshold for plotting a vid - confidence_thresh_for_vid: 0.90 - callbacks: anneal_weight: attr_name: total_unsupervised_importance