You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It yields rather slow code. That is because GHC doesn't do the "obvious" trie-based match optimisation, which means we end up with a linear chain of comparisons
if op == "newByteArray#" then ,,,
else if op == "newPinnedByteArray#" then ...
...
in Core.
I took a profile (on nofib"s bernoulli, if that matters) and ...ByteArray.evalPrimOp takes about 10% of time and allocation. Here is an excerpt of the profile
I think a bit of focus on optimising evalPrimOp may well speed up the interpreter by 50%. One way to do so would perhaps be to use a HashMap or Trie to do the lookup.
Why optimise anyway? Because at the moment a single run of NoFib's bernoulli benchmark takes about half an hour when the compiled program takes just 0.1s. That's quite a deal breaker for an exhaustive benchmark run of all 11* benchmarks.
The text was updated successfully, but these errors were encountered:
And it also resulted in a great speedup, now my reduced benchmark case takes 35s instead of 55s. Great work!
Another perhaps superior suggestion might be to intern all primop names and use an IntMap/Array, but the profile suggests that *.evalPrimOp is no longer a bottle-neck, so why bother.
While the implementation of, e.g.,
...ByteArray.evalPrimOp
is rather direct and elegant at the momentIt yields rather slow code. That is because GHC doesn't do the "obvious" trie-based match optimisation, which means we end up with a linear chain of comparisons
in Core.
I took a profile (on
nofib
"sbernoulli
, if that matters) and...ByteArray.evalPrimOp
takes about 10% of time and allocation. Here is an excerpt of the profileI think a bit of focus on optimising
evalPrimOp
may well speed up the interpreter by 50%. One way to do so would perhaps be to use aHashMap
or Trie to do the lookup.Why optimise anyway? Because at the moment a single run of NoFib's
bernoulli
benchmark takes about half an hour when the compiled program takes just 0.1s. That's quite a deal breaker for an exhaustive benchmark run of all 11* benchmarks.The text was updated successfully, but these errors were encountered: