New features #26

eckelsjd · 2024-11-05T04:32:34Z

eckelsjd
Nov 5, 2024
Maintainer

This discussion board is to discuss new improvements and features for general implementation details.

Integrating the xarray package Dataset class in place of the amisc.typing.Dataset class. This may provide readability and performance improvements for passing large, labeled input arrays into System.predict for example. We essentially are currently using raw dicts as labeled datasets, where keys map variable names to their values -- which means we have to manually loop over each variable in the dictionary every time we process the data. I believe xarray handles a lot of this overhead internally (and in a much better way).
Implement a Latin Hypercube Sampling TrainingData object -- not only because it would be generally useful, but also because SparseGrid is fairly complicated and it would be nice to demo a much easier version of the TrainingData interface.
There is currently no way to normalize latent coefficients for field quantities. You can normalize raw field quantities before compression, but then the surrogate gets trained on whatever latent coefficients result -- these may be wildly out of proportion and may be hard to train on. So it would be useful to additionally be able to specify norm methods for each latent coefficient after compression.
The SparseGrid object chooses to refine latent coefficients in a nearly full tensor-product fashion, which scales very poorly with the number of latent coefficients. Of course, this only applies when you are using a field quantity as an input and training over the latent coefficients. But this could lead to some unreasonably large numbers of model evaluations in a single iteration. For example, refining just 10 latent coefficients from 1 grid point each to 3 grid points each (a single refinement step for SparseGrid!) would require $3^4\approx 59,000$ model evaluations, which is sure to brick even the most modest of user functions. A nice feature would be slowing down this growth for field quantity inputs with an extra expand_latent_method in SparseGrid.
The ability to update variable PDFs during training, such as to follow an MCMC chain for calibrating unknown parameters. This would allow a form of "local" refinement capability as the Leja objective would cluster points under areas of maximum density.
Implement a Gaussian process or neural network Interpolator to demonstrate the extensibility of amisc. Would also be good to have a method like these that can utilize pre-existing training data (Lagrange requires starting from scratch every time to ensure optimal placement of training data in the tensor-product grid).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New features #26

{{title}}

Replies: 0 comments

Select a reply

New features #26

eckelsjd Nov 5, 2024 Maintainer

Replies: 0 comments

eckelsjd
Nov 5, 2024
Maintainer