-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the performance of parsing and evaluation speed #200
Comments
Using branch
|
Here is the flamegraph of the invocation: https://static.domenkozar.com/hnix.svg To reproduce:
Then invoke:
|
maybe @mrkkrp could chip in here :) |
@domenkozar Can you give me TLDR? So it got slower, if so, how much slower? There is only one thing that can make it slower, that's |
@mrkkrp Megaparsec improved performance by about 10% on this particular case. We are wondering how to bridge the gap between original Nix implementation (0.16s) and HNix (1.26s) and majority of time is spend in Megaparsec as observed in the flamegraph linked above. If you have any tips for what could be the underlying causes we'd be very grateful :) |
Ah good! I was fearing that I screwed up with version 7 :-P
I think my suggestions are already applied in the code... Would be good to make it as fast as C, but I don't think I know how to do it. |
I did some profiling and it seems that the majority of allocated memory is never used. Digging deeper, it seemed that most of the wasted space is from things to do with parse errors where |
It seems currently the slowdown is about 10x even for bigger files (running on 2e4e5e0 with
|
@mpickering Can you tell me how to reproduce whatever you did to come to this conclusion? |
I ran three profiles.
I then looked at the code and found this closure by grepping the dumped STG. The closure was to do with error messages which led me to my hypothesis. |
I’ll get to playing with this but not in December. I think I have an idea how to improve things a bit. |
I tried to play with this but I think it doesn't really use my local version of megaparsec. I changed megaparsec = import "/home/mark/projects/programs/haskell/megaparsec/default.nix"
{ mkDerivation=self.mkDerivation; inherit (super) base bytestring case-insensitive containers criterion deepseq hspec hspec-expectations mtl parser-combinators QuickCheck scientific stdenv text transformers weigh; };
# super.megaparsec_7_0_4; And generated this in my { mkDerivation, base, bytestring, case-insensitive, containers
, criterion, deepseq, hspec, hspec-expectations, mtl
, parser-combinators, QuickCheck, scientific, stdenv, text
, transformers, weigh
}:
mkDerivation {
pname = "megaparsec";
version = "7.0.4";
src = ./.;
libraryHaskellDepends = [
base bytestring case-insensitive containers deepseq mtl
parser-combinators scientific text transformers
];
testHaskellDepends = [
base bytestring case-insensitive containers hspec
hspec-expectations mtl parser-combinators QuickCheck scientific
text transformers
];
benchmarkHaskellDepends = [
base containers criterion deepseq text weigh
];
enableLibraryProfiling = true;
enableExecutableProfiling = true;
doCheck = false;
homepage = "https://github.com/mrkkrp/megaparsec";
description = "Monadic parser combinators";
license = stdenv.lib.licenses.bsd2;
} This is how I tested it:
But no changes in local version of megaparsec seem to affect how |
I'd suggest skipping Nix, as you'll always have to recompile the whole megaparsec and it will make your feedback loop really slow. I've upstreamed most of hnix dependencies to stackage LTS 13, except |
OK, I guess I’ll return to this later. |
Still it is not clear to me why my local version of Megaparsec was not used. I tried with both new cabal commands and the old ones. |
I'd use something like
where I bumped megaparsec version just to be sure I see 7.0.5 in:
|
Ah OK, it looks the optimization I had in mind is just not an optimization :-D |
A short update on this one now that we can actually evaluate derivations. Hnix is about 100x slower than nix on evaluating nixpkgs.hello. Here is a flamechart https://www.speedscope.app/#profileURL=https://gist.githubusercontent.com/layus/2dfb3f18b12c1daec65a69d34ee29639/raw/10d33407cbd78a4a2ea3ffe89ac5db3b91c31de7/hnix.eventlog.json Parsing does not seem to be our bottleneck (3s) but it will need to be reduced if we want to compare in perf with nix. |
Renamed topic to be calmer and more discoverable ⬆️, because people would keep showing up for it. And I would keep pointing to this discussion until it would be more effective to reopen the conversation anew. Probably we need to make flame graphs formation an occurring procedure. Current processes:
Other news: |
No matter how I try to replicate the case of #200 (comment) flamegraph - mine report is >20x faster at evaluating mentioned
takes under 1s, while the one you posted shows 30s. And mine in the timeline shows the project essentially does a one-pass eval. What I do wrong 🤣. |
Possibly I have a slow machine :-). Or I missed something else, who knows ? Thanks for trying. Did you manage to produce your own speedscope ? https://mpickering.github.io/posts/2019-11-07-hs-speedscope.html |
I am not too confident about what I am doing, but it seems that I recompiled some dependencies with profiling enabled. Does that make any sense ? |
Try |
Yep. You probably punting me that I need to recompile the deps. I know that. With Cabal I already recompiled most of that stack for profiling yesterday. But Cabal still keeps threatening me that it would rebuild a list of additional packages for profiling, but does not rebuild them however I try to force it into it. Since Cabal is a sort of reproducible tool, Cabal already built the stack and since I can not force rebuild - I decided that is some left-over detection message bug. So I also used perf/dwarf debugger to look-up more/get more into about the state. And So that is why I tried Well, Cabal I used was also |
Well, the main difference between your & mine flamegraps seems that for me the GC, profile overhead, and system interrupts are grouped - that is probably why graphs on the time grid seem different. |
Nice ! I am no more the only one to have produced such a graph ;-). Are you working on something like https://arewefastyet.com ? :-D
Well, a lot of things could explain the difference in the graphs. hs-speedscope update, ghc debugging updates, or even a slight difference in the profiling options. PS: Thanks @sorki for spotting the missing |
No, just tried to debug the things. I also found a setup-specific bug and situation in the RTS GHC options, but refactoring the RTS options in the GHC is a bit too big yak shave. Debug symbols report was probably due to Cabal by default probably includes only exported symbols in the module. |
Wanted also to generate the reversed flamegraph, to point to the function that are the most time-consuming. But found a better way. Speedscope has "Sandwich" mode that allows ordering by "Self" - which shows the sum of the function implementation running time.
So the summary resulting command is:
The The list of the most time-consuming functions by themself: But this list is also not needed, people can do better then I did here. |
Overall I am interested in instance Monad m => MonadValue (Judgment s) (InferT s m) where
defer = id
demand = flip ($)
inform j f = f (pure j)
instance ( MonadAtomicRef m
, MonadCatch m
, Typeable m
, MonadReader (Context m (StdValue m)) m
, MonadThunkId m
)
=> MonadValue (StdValue m) m where
defer = fmap Pure . thunk
demand (Pure v) f = force v (flip demand f)
demand (Free v) f = f (Free v)
inform (Pure t) f = Pure <$> further t f
inform (Free v) f = Free <$> bindNValue' id (flip inform f) v
instance (MonadThunkId m, MonadAtomicRef m, MonadCatch m)
=> MonadValue (Symbolic m) m where
defer = fmap ST . thunk
demand (ST v) f = force v (flip demand f)
demand (SV v) f = f (SV v) And the use of it in the source code is mostly superfluous. It looks just like an argument flipping function, and a lot of its use of the |
⬆️ Look for The deal is roughly: tl;dr: The thunk forcing strictifies the The longer story:
The proper errors get thrown in the project with throwError
:: forall s e m a . (Framed e m, Exception s, MonadThrow m) => s -> m a
throwError err = do
context <- asks (view hasLens)
traceM "Throwing error..."
throwM $ NixException (NixFrame Error (toException err) : context) There is forceThunk
:: forall m v a
. (MonadVar m, MonadThrow m, MonadCatch m, Show (ThunkId m))
=> NThunkF m v
-> (v -> m a)
-> m a
forceThunk (Thunk n active ref) k = do
eres <- readVar ref
case eres of
Computed v -> k v
Deferred action -> do
nowActive <- atomicModifyVar active (True, )
if nowActive
then throwM $ ThunkLoop $ show n
else do
v <- catch action $ \(e :: SomeException) -> do
_ <- atomicModifyVar active (False, )
throwM e
_ <- atomicModifyVar active (False, )
writeVar ref (Computed v)
k v traceM :: Monad m => String -> m ()
traceM = const (pure ()) So all exception info got stored as a String. The move from up-down seems:
|
So, I looked into the |
In the end I talked about properly packing the contexts. If I understand what is context correctly (and I do not, because there are several different contexts in the project), the evaluation contexts form a 1 single tree, where the top node is Maybe context tree expands from AST, and so context tree position gives (the internal to it) AST tree position, abstract syntax tree expands from the source code, and so abstract syntax tree position gives (the internal to it) source code position (... and the HNix source code is the automata not on the source code, but on the context which allows attaching the abstract syntax objects to it and on them the program computations grow new context, and so on those abstract syntax appends context-abstract-source tree only grows through grafting, as a Free monad, which HNix already uses to express Nix runtime, and computations in the runtime are what grows on the contexts and grow the context, and because computations are very volatile things - they are fully strictified, to be properly garbage collected volatile on their end, so all they togather form a 4 layered "sphere"(multidimensional tree), but because computations can not be lazy (too much overhead) and need to be strict, maybe people tend to mix-up everything and strictify not only the 4-th layer, but - everything, so there is no longer single source-abstract-context tree to rely on, but anyway the chunks of it get passed around and get copied over in functions, because it is still needed). But as I said - it is just thinking and I mix together at least two-three types of contexts. Do basic polymorphisation in the code, goes well, so at least would be easier to switch from where Strings are used to the Text. |
Here's the bar:
The text was updated successfully, but these errors were encountered: