MCT 2.10
Same as version used in E3SM1 and CESM2.
Performance improvements at scale (from Pat Worley):
Reduce complexity of GSMap:active_pes, used in cpl7 init.
Use swapm variant of MPI_AlltoAllV in rearrange_ calls in sMatAvMult_SMPlus_. Fix params of swampw to give reasonable performance.
Add 2 more-scalable version of peLocs_ algorithm. Use one with potentially more memory.
Speed up the Router::initp_ algorithm.
other: set arrays in new Avs to zero, don't use MPI_RSEND to get around bad implemenations.
Update threading and vectorization of MatVecMull. Make sure bufs are allocated in rearrange_.
Add more functions to mpi-serial: MPI_Errhandler_set, MPI_Intercomm_merge, mpi_type_create_hvector, MPI_Get_Version subroutine
Move mpi-serial to a git subtree