You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This discussion board is to discuss new improvements and features for general implementation details.
Integrating the xarray package Dataset class in place of the amisc.typing.Dataset class. This may provide readability and performance improvements for passing large, labeled input arrays into System.predict for example. We essentially are currently using raw dicts as labeled datasets, where keys map variable names to their values -- which means we have to manually loop over each variable in the dictionary every time we process the data. I believe xarray handles a lot of this overhead internally (and in a much better way).
Implement a Latin Hypercube Sampling TrainingData object -- not only because it would be generally useful, but also because SparseGrid is fairly complicated and it would be nice to demo a much easier version of the TrainingData interface.
There is currently no way to normalize latent coefficients for field quantities. You can normalize raw field quantities before compression, but then the surrogate gets trained on whatever latent coefficients result -- these may be wildly out of proportion and may be hard to train on. So it would be useful to additionally be able to specify norm methods for each latent coefficient after compression.
The SparseGrid object chooses to refine latent coefficients in a nearly full tensor-product fashion, which scales very poorly with the number of latent coefficients. Of course, this only applies when you are using a field quantity as an input and training over the latent coefficients. But this could lead to some unreasonably large numbers of model evaluations in a single iteration. For example, refining just 10 latent coefficients from 1 grid point each to 3 grid points each (a single refinement step for SparseGrid!) would require $3^4\approx 59,000$ model evaluations, which is sure to brick even the most modest of user functions. A nice feature would be slowing down this growth for field quantity inputs with an extra expand_latent_method in SparseGrid.
The ability to update variable PDFs during training, such as to follow an MCMC chain for calibrating unknown parameters. This would allow a form of "local" refinement capability as the Leja objective would cluster points under areas of maximum density.
Implement a Gaussian process or neural network Interpolator to demonstrate the extensibility of amisc. Would also be good to have a method like these that can utilize pre-existing training data (Lagrange requires starting from scratch every time to ensure optimal placement of training data in the tensor-product grid).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This discussion board is to discuss new improvements and features for general implementation details.
xarray
packageDataset
class in place of theamisc.typing.Dataset
class. This may provide readability and performance improvements for passing large, labeled input arrays intoSystem.predict
for example. We essentially are currently using rawdicts
as labeled datasets, where keys map variable names to their values -- which means we have to manually loop over each variable in the dictionary every time we process the data. I believexarray
handles a lot of this overhead internally (and in a much better way).TrainingData
object -- not only because it would be generally useful, but also becauseSparseGrid
is fairly complicated and it would be nice to demo a much easier version of theTrainingData
interface.SparseGrid
object chooses to refine latent coefficients in a nearly full tensor-product fashion, which scales very poorly with the number of latent coefficients. Of course, this only applies when you are using a field quantity as an input and training over the latent coefficients. But this could lead to some unreasonably large numbers of model evaluations in a single iteration. For example, refining just 10 latent coefficients from 1 grid point each to 3 grid points each (a single refinement step forSparseGrid
!) would requireexpand_latent_method
inSparseGrid
.Interpolator
to demonstrate the extensibility ofamisc
. Would also be good to have a method like these that can utilize pre-existing training data (Lagrange requires starting from scratch every time to ensure optimal placement of training data in the tensor-product grid).Beta Was this translation helpful? Give feedback.
All reactions