You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Interpolation routines (e.g. hpface, dhpface-, etc.) constitute the bulk of the computation in update_gdof and update_Ddof, which can be expensive (although they aren't as much of a bottleneck with the OMP version). These interpolation routines could be accelerated significantly (>10x, see below) with relatively straightforward optimizations. This isn't high-priority for me right now but I'm making the issue to outline potential improvements and in case anyone wants to implement it before I get to it (@ac1512, you might be interested in similar optimizations for your anisotropic refinement work since IIRC the projections are rather expensive in that case).
Here's a few ways in which they could be optimized:
Ordering of the loops on the assembled projection matrix is not optimal (indices need to be swapped)
Test functions are currently pulled back in the inner-most loop (which means all the pullbacks are applied to all test functions, redundantly for each trial function). The pulled-back test and trial functions coincide so could instead be computed once and stored (and importantly, not redundantly computed, this alone should result in a significant speedup). The best way to do this would be to apply the pullback to all shape functions at once (inside the quadrature point loop) using optimized BLAS3 routines (DGEMM).
Integration could likely be accelerated by forming a matrix with dimensions pulled-back shape functions by quadrature points (so each column has values of pulled-back shape functions at different quadrature points) and then simply multiplying the matrix and its transpose. Calling optimized BLAS3 routines for the assembly instead of explicitly forming the product (as we are doing now) will likely be much faster.
For a p=3 hexahedral mesh, the assembly (integration) in hpface takes ~10-40x longer than the matrix inversion indicating a >10x improvement can likely be achieved.
The text was updated successfully, but these errors were encountered:
Interpolation routines (e.g.
hpface
,dhpface-
, etc.) constitute the bulk of the computation inupdate_gdof
andupdate_Ddof
, which can be expensive (although they aren't as much of a bottleneck with the OMP version). These interpolation routines could be accelerated significantly (>10x, see below) with relatively straightforward optimizations. This isn't high-priority for me right now but I'm making the issue to outline potential improvements and in case anyone wants to implement it before I get to it (@ac1512, you might be interested in similar optimizations for your anisotropic refinement work since IIRC the projections are rather expensive in that case).Here's a few ways in which they could be optimized:
DGEMM
).For a p=3 hexahedral mesh, the assembly (integration) in
hpface
takes ~10-40x longer than the matrix inversion indicating a >10x improvement can likely be achieved.The text was updated successfully, but these errors were encountered: