-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete rewrite of the legacy fortran solver #38
base: main
Are you sure you want to change the base?
Conversation
I didn't take the time before submitting this PR to go extensively through the discussions in Issue. Just found out about #10 and the fascinating discussion therein. I don't have enough perspective yet to fully grasp all the points that have been raised (in particular those by @rouson) although I agree that both legacy and "modern" versions of the codes should be provided. By legacy, I mean pretty much what I wrote in this PR, i.e. a piece of code that could be written by someone (e.g. an undergrad or grad student) with minimal programming experience who just started to learn about fortran, while "modern" would be the same algorithm leveraging more recent constructs (e.g. array syntax, For this particular problem however, I'm not sure how to adapt the on-the-fly computation of the residual (which really is one of the bottleneck) to the array syntax version. In such an implementation, the nested
Evaluation of the residual still however requires
which (if my understanding is correct) would allocate a temporary array and traverse it once more. I'm not sure how to make this more efficient (in terms of data spatial and temporal locality) other than evaluating this residual only every
my understanding is that the on-the-fly evaluation of the residual with PS: The only option I can think off for PPS: If I use |
Hej,
I've stumbled on this repo while looking for some benchmarks. I really like the idea. Given that
stdlib
is reaching some level of maturity, I believe that it is a good time to resurrect the benchmarks. Let me contribute my two cents for the Poisson benchmark.Updates
legacy.f90
: Rewrote the reference Jacobi solver using only legacy fortran constructs. Note that the evaluation of the residual is now done on the fly instead ofmaxval(abs(phi-phi_prime))
after thei, j
loops which actually performs allocation and loops once more over all the entries ofphi-phi_prime
. I've also fixed the definition ofdx
so that, asm
changes, we actually solve the discretized form of the same continuous problem (i.e. the Poisson equation on a unit square).naive.f90
/optimized.f90
: I took the liberty to remove both of these implementations. Thanks to the on-the-fly computation of the residual, the solver inlegacy.f90
is drastically faster (even when I kept the original definition ofdx
to have a fair comparison). To give you an idea, if I keepdx = 0.01_dp
andm = 300
, here are the timing on my laptop:legacy.f90
: Converged in 426 580 iterations, wall-clock time is 35 seconds.naive.f90
: Converged in 426 326 iterations, wall-clock time is 85 seconds.optimized.f90
: Converged in 426 580 iterations, wall-clock time is 44 seconds.jacobi/
: I moved all the solvers in thejacobi
folder if, in the not-so-distant future, we want to add other solvers for this benchmark (e.g. Gauss-Seidel, conjugate gradient, etc).If you like what you see, I can take some time over the next few days to port the same changes to the
c
andpython
implementations. I do have a few questions though:python
, this could include for instance a purepython
implementation with loops (which should really be avoided), a vectorized implementation usingnumpy
and possibly one implementation usingnumba
and/orcython
?Julia
version of the solver (both the relatively naïve one and another version using one of the Julia package providing the Jacobi solver)?