Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multiple precisions in CeedVector #948

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Conversation

nbeams
Copy link
Contributor

@nbeams nbeams commented Apr 22, 2022

In this branch, I have attempted to start collecting the result of many discussions, particularly with @jeremylt and @jedbrown, about expanding CeedVectors to be able to contain multiple precisions. This PR is meant as a place for further discussion and refinement, whether or not we end up merging from it.

Summary of changes so far (as of opening of the draft PR, April 22):

  • Vector access functions (Get/Set/Take, as well as Sync) have changed to *Generic at the backend level, that is, the functions that get added to the CeedVector_private struct. They now take void pointers instead of CeedScalar, and an additional parameter for the data type. (These can hopefully be used within other backend functions in the future where we allow mixed precision.) This has been changed in ref, cuda-ref, and hip-ref.
  • At the user level, these functions remain the same, but they now simply call the Generic version with CEED_SCALAR_TYPE as the type parameter in their implementations in interface/ceed-vector.c
  • At the user level, additional versions for specific precisions have been added (named *FP32 and *FP64). These also just call the Generic version "under the hood"

in the hip-ref backend ONLY so far:

  • A first implementation of all the Generic functions, aimed at implementing the previously-discussed multiprecision vector access rules for validity
  • The new CeedScalarArray struct is used to store data

In fd75fdf I added a user-interface-level function for checking the current array status, inspired by the way MFEM returns memory status flags. I was using this in my own local tests. We may or may not want to keep something like this in the future.

Related issue: #778

Copy link
Member

@jedbrown jedbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm slow on this. It's looking really good. I made a couple comments that I hope will help keep CEED_SCALAR_TYPE out of library code. My intent is to use C11 _Generic to dispatch to the appropriate *FP32 or *FP64 when available, otherwise fall back to a default that the user can select by including ceed-f64.h or ceed-f32.h.

I think we should be consistent, and either call the functions *F32 or name the header ceed-fp32.h; they're currently *FP32 and ceed-f32.h.

Comment on lines 148 to 156
CEED_SCALAR_FP64,
/// Total number of allowed scalar precision types
CEED_NUM_PRECISIONS,
} CeedScalarType;
/// Struct for holding data in multiple precisions for mixed-precision-enabled
/// backends
typedef struct {
void *values[CEED_NUM_PRECISIONS]; // Size equals CEED_NUM_PRECISIONS
} CeedScalarArray;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This automatically gets the right sizes. Can we move CeedScalarArray to ceed-impl.h or backend.h?

/// backends
typedef struct {
void *values[CEED_NUM_PRECISIONS]; // Size equals CEED_NUM_PRECISIONS
} CeedScalarArray;
/// Base scalar type for the library to use: change which header is
/// included to change the precision.
#include "ceed-f64.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would allow a user to include the type-specific header before ceed.h, and thus be able to select the default at file granularity.

Suggested change
#include "ceed-f64.h"
#ifndef CEED_SCALAR_TYPE
# include "ceed-f64.h"
#endif

Do you think we can avoid referencing CEED_SCALAR_TYPE in library code? I think that's the eventual goal.

Comment on lines +299 to +304
int CeedVectorSetArray(CeedVector vec, CeedMemType mem_type,
CeedCopyMode copy_mode,
CeedScalar *array) {
return CeedVectorSetArrayGeneric(vec, mem_type, CEED_SCALAR_TYPE, copy_mode,
(void **) array);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have ceed.h use a macro to make CeedVectorSetArray map to either CeedVectorSetArrayFP32 or CeedVectorSetArrayFP64 so that we don't need an actual symbol in the library for this, and thus can remove this use of CEED_SCALAR_TYPE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so I've spent a bit of time looking into this, because I didn't realize removal of CEED_SCALAR_TYPE was a goal.

What would be the benefit of this? Just that the user wouldn't have to re-compile the library in order to reset the default precision?

One thing I am not sure about is how it would (or wouldn't) affect the bindings for other languages. I remember the CeedGetScalarType interface function was added specifically for the Julia bindings, for example. We also use CEED_SCALAR_TYPE in the Python bindings right now.

I went through and made a list of all the functions in headers that take CeedScalar or CeedScalar * in the prototype, which would need to have versions for every allowed precision if CeedScalar is not set at library compile time (as is the case if no CEED_SCALAR_TYPE/no re-compiling). They can be broken into several groups:

  1. Functions that we would likely already be modifying or adding multiple versions of for different precisions:
  • the Get/Set/Take/RestoreArray functions for Vectors, obviously, as well as SetValue, Norm, Scale, and AXPY (which we haven't yet decided what to do with, but are clear candidates for multiple precisions).
  • basis creation: CreateTensorH1, CreateH1, CreateHdiv : makes sense that if we allow different computation and/or input/output precisions for Basis objects (mixed-precision setting), we'd maybe want versions allowing the user to create the object with quadrature/interp/gradient data in the precision that will be used in computation. So that makes sense
  • other basis stuff: GetQRef, GetQWeights, GetInterp, GetInterp1D, GetGrad, GetGrad1D, GetDiv, GaussQuadrature, LobattoQuadrature, QRFactorization, SymmetricSchurDecomposition, SimultaneousDiagonalization -- most of these probably would also make sense to have versions for different precisions in a mixed-precision setting, but we are definitely starting to "explode" the number of added functions at this point (especially if we expand to more precisions in the future...)
  • the definition of QFunctionUser function pointer prototype -- we already need a solution for mixed-precision QFunctions anyway
  1. Miscellaneous other stuff:
  • Two preconditioning Operator functions: MultigridLevelCreateTensorH1 and MultigridLevelCreateH1 -- not sure about those
  • a few functions in ceed/backend.h: BasisGetCollocatedGrad, HouseholderApplyQ, TensorContractApply, MatrixMultiply. These are in backend.h instead of ceed.h like the rest -- this may affect how easy it is to switch versions without re-compiling?
    But at any rate, we'd need "generic" versions of most/all these functions, as well, for the library to use internally, to avoid re-compilation if we change CeedScalar, right? Though for many of the computational routines, we'd still need separate implementations for each precision, I think, unlike the vector access functions...
  • We would also need to change the definitions of some of the objects (private structs), like Basis and QFunctionContext, which include CeedScalar types.

use a macro to make CeedVectorSetArray map to either CeedVectorSetArrayFP32 or CeedVectorSetArrayFP64 so that we don't need an actual symbol in the library for this

Hmm, I guess I had thought we would define different versions of all the above functions from ceed.h in ceed-f32 and ceed-f64 with the corresponding types -- to have a version that would still work with C99, too. Unless we want to require C11? I thought not? I guess a macro would achieve the same thing.

I'm not sure how any of these options with affect the bindings for other languages, though, as mentioned previously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought CEED_SCALAR_TYPE was a stop-gap until we could get rid of it with decent ergonomics. The foreign language bindings are fine (simpler even) because they'll just call to the correct type-specialized one.

I don't know that CreateH1 and other quadrature/basis stuff need specializations for f32 (they can easily be obtained by casting the f64 specification). Doing the setup in f64 and casting the result when it starts operating on problem-sized data is more accurate at equivalent cost. They would need specializations if we add __float128 support sometime. But not urgent in any case.

I think the internal stuff like tensor contraction needs specializations anyway to actually do the basis apply work in the selected precision.

IMO, we can't require C11 _Generic (not least because we need to support C++ callers). The ceed-f{32,64}.h headers would just define CeedScalar so people can be oblivious to precision. Foreign language callers wouldn't touch it. C11 _Generic or C++ templates would enable callers to do mixed precision stuff without explicitly acknowledging it when calling libceed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm still missing something. What's the point of getting rid of CEED_SCALAR_TYPE (which defines CeedScalar at compile time) unless we want to select the type of CeedScalar without re-compiling the library? In which case, we'd need multiple versions of and/or a generic version of any function with CeedScalar in its prototype whether or not we intend to expose it to users, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if CeedScalar isn't known at compile time, we couldn't use it internally for computation anywhere in the library...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I envision that everyone will be content using f64 to compute basis data, except a rare few who want to experiment with __float128. Is this something we could reasonably roll into this PR or should we shoot to merge this with libceed-build-time CeedScalar and make that next step in a future PR?

Copy link
Contributor Author

@nbeams nbeams May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider with the basis data, I guess, is that we may also want the storage to be CeedScalarArray type, so if, for example, the compute precision doesn't match CeedBasisScalar, we would have a convenient place to store the converted array and re-use every time the basis is applied.

Is this something we could reasonably roll into this PR or should we shoot to merge this with libceed-build-time CeedScalar and make that next step in a future PR?

Not sure. I feel like this will already be a big PR -- I haven't even added/finalized any tests for making sure the multiprecision vectors are working, and once hip-ref is more or less finalized, I will add the same changes for cuda-ref (and applicable ones for ref).

We might also want to add in this PR the extra versions/changes for the other vector functions: SetValue, Norm, Scale, and AXPY...

I should also highlight a few other discussion points in this draft that I haven't pointed out yet; I'll do that now.

There's a merge conflict with main now thanks to the SyncArray update PR, making rebasing less straightforward (when I tried it previously). If we're okay with squash-merging this eventually, it will be fine, but that may also be something to consider...unless I start a "clean" branch with the finalized Vector changes, delete this branch, and we continue the PR from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question about the possible CeedBasisScalar, this would be defined at libCEED compile time, right? And could not be changed without re-compiling? And only things that we don't switch to CeedBasisScalar would need to be available in multiple versions in order to select the correct one with ceed-f32.h?

(Also, I agree about the naming being consistent -- from your much earlier review comment. I like FP better, so I'd vote for ceed-fp32.h, but would it be too much of a hassle to change the header names at this point since we've already released versions with the current names?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly right in the first paragraph.

I think our norms and axpy are mostly just used for testing without PETSc or MFEM -- libCEED doesn't intend to be a complete solvers library so any real caller has a library for that stuff. We could just use one precision (CeedBasisScalar or maybe there's a better name for this concept that it's not problem-sized) or we could make it dispatch by having the internal computation always work in CeedBasisScalar, but promote f32 inputs to AXPY and demote outputs of Norm (rather than plumbing both variants into backends).

I don't especially care on the names, just consistency. There may not be many direct includes in the wild. f32 and f64 are the Rust type names, fwiw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't want any unintuitive surprises for users...for example, in the norm implementation for CPU backends (done in the interface), we call GetArrayRead followed by using that data to compute the norm. But in a multiprecision case, what precision should be used to get the array? If the vector only has float data and CeedBasisScalar (or whatever we want to call it) is double, for example, calling GetArrayRead in CeedBasisScalar would do unnecessary conversion (and maybe allocation, too) in order to do the computation in CeedBasisScalar when the expected behavior would be to use float.

So, maybe it would be useful to make the backend versions of Norm, SetValue, Scale, and AXPY be Generic like the Get/Set/Take/Sync functions, and have different interface versions which call the generic function with the correct precision? (And also move the Norm implementation into the ref backend instead of interface-level.)

Then what if the interface level had, as an example for CeedVectorNorm:

  • CeedVectorNormFP32: calls the generic version with CEED_SCALAR_FP32
  • CeedVectorNormFP64: calls the generic version with CEED_SCALAR_FP64
  • CeedVectorNorm : result is always CeedBasisScalar (for the prototype, known at compile time), but computation happens in whatever precision the function deems "most likely to be intended" by the call (if only one precision is currently valid for the vector, use that precision; if two or more precisions are valid, pick one based on some heuristic: most precise? Least, for better performance?)

Of course, I guess that would ruin the plan to have *FP32/*FP64/etc. functions defined as their corresponding "regular" versions in the ceed-fp32 or ceed-fp64 headers as a way to set CeedScalar without recompiling the library, if CeedVectorNorm already had a distinct use beyond being a stand-in for one of the specific precision versions. So maybe we shouldn't do that. But I think the intended behavior of Norm based on whatever data is currently valid in the vector is unclear...

Then there is the issue with AXPY: in the case where the input vectors do not have the same valid precisions, what should we do? Should we specify what precision we want to use, or let the function pick based on some heuristic?
In the same vein, we would also need to decide how we want PointwiseMult to behave in the case of multiple precisions, even thought it doesn't take a scalar in its prototype, with the same questions as AXPY.

For Scale, I'd assume we'd want the behavior to be to scale all currently valid precisions in the vector, so that's easy. It could take the CeedBasisScalar type to avoid needing multiple versions.

(We really probably should have a different name for that type...ideas? CeedBaseScalar? CeedDataScalar?)

Comment on lines +20 to +29
static inline int CeedScalarTypeGetSize_Hip(Ceed ceed, CeedScalarType prec,
size_t *size) {
switch(prec) {
case CEED_SCALAR_FP32:
*size = sizeof(float);
break;
case CEED_SCALAR_FP64:
*size = sizeof(double);
break;
default:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in the backend for now while testing with hip-ref, but is this something that could be useful/necessary to have at an interface level for all backends, rather than each ref backend duplicating it?

Comment on lines +1073 to +1089
/**
@brief Check the current status of the CeedVector's data arrays.

This function sets unsigned int parameters indicating for which precisions
the data for mem_type is not NULL, for borrowed and owned arrays, and also
which precisions are currently valid.
The values can be checked bitwise, such that, e.g.,
(valid_status & (1 << CEED_SCALAR_FP32)) will be true if the FP32 array is
currently valid, and false otherwise.

@param vec CeedVector for which to check status
@param mem_type Mem type for which to check status
@param valid_status Status indicator for valid data
@param borrowed_status Status indicator for borrowed arrays
@param owned_status Status indicator for owned arrays
**/
int CeedVectorCheckArrayStatus(CeedVector vec, CeedMemType mem_type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this function to the interface level, thinking it would be used by tests for multiprecision vector functionality (and potentially, users may want this feature).
It also has a backend implementation, so it's added to the CeedVector_private struct. Since this affects the user interface, I wanted to make sure and point it out to see what people thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this, we may also want an interface function for convenient "pretty" printing of status for all vector arrays, or we could implement such a function in the tests directly.

@nbeams nbeams force-pushed the natalie/multiprec-vec branch from 703f404 to 112a04a Compare August 26, 2022 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants