Integrate DVC (or MLFlow) to track model experiments #20

aazuspan · 2023-11-29T22:52:59Z

We're training a lot of different models with different datasets, architectures, hyperparameters, etc, and it's tough to track results across so many permutations. #19 attempts to improve reproducibility by storing all relevant parameters in dataset file names, but that's not going to be scalable for models and predicted outputs with dozens of parameters.

Tools like DVC and MLFlow track experiments by recording inputs (datasets, scripts, parameters, etc.) with their associated outputs (metrics, models, images, etc.). With DVC in particular (I'm not as familiar with MLFlow), you can set up workflows to run entirely through the tool, so that everything from the parameters used to create the training dataset to the final model are all automatically linked. However, in order to connect input parameters to output files, DVC requires scripts to produce outputs synchronously, rather than submitting tasks that are run asynchronously and downloaded later, which is how we currently collect our sampling and inference data. While it would be possible to adapt our workflow to force synchronous execution by waiting for Earth Engine tasks to complete and downloading the outputs programatically, that would require a substantial redesign and mean that multiple datasets couldn't easily be collected concurrently.

To avoid that limitation and allow asynchronous data creation, my tentative plan is to collect data outside of DVC, using the API developed in #19 to link dataset parameters with the output files. During training, we can manually tell DVC the dataset parameters and it will log them alongside the model and metrics. This is slightly less reliable as we're responsible for making sure the training data is consistent with its creation parameters, but should be more flexible and avoid slowing things down with synchronous execution. Because DVC will track all data and model parameters with the associated model, we should be able to remove the ModelRun class added in #19 to achieve the same goal using model filenames.

Currently, the training notebook is followed by two other notebooks that download NAIP imagery as TFRecords from a test region and generate a map to allow qualitative comparisons between model runs. In order to fit this into the DVC workflow, I think we should 1) move the 03_export_naip notebook into a Python script, with the understanding that it will be run once to produce a test region that can be used to evaluate every model run, and 2) run inference on that test region automatically as part of the training process, logging the resulting map as an artifact with DVC, so that each model run will include both quantitative metrics and a qualitative map.

The text was updated successfully, but these errors were encountered:

aazuspan added the enhancement New feature or request label Nov 29, 2023

aazuspan self-assigned this Nov 29, 2023

aazuspan mentioned this issue Nov 29, 2023

Data refactor #19

Merged

aazuspan mentioned this issue Jan 3, 2024

Source control not auto-refreshing in VS Code #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate DVC (or MLFlow) to track model experiments #20

Integrate DVC (or MLFlow) to track model experiments #20

aazuspan commented Nov 29, 2023

Integrate DVC (or MLFlow) to track model experiments #20

Integrate DVC (or MLFlow) to track model experiments #20

Comments

aazuspan commented Nov 29, 2023