-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10-60X slower compared to java regexes in some cases #12
Comments
I can replicate the problem, and spent the best part of a day trying to fix it, but struggled to make sense of the output of a number of low-level Java profiling tools. I'll dump my state here in case others want to pick it up. Various attempts at profiling (SIQUIT, hprof, Yourkit, /profilez) show that Machine.step and Machine.add are the bottleneck, is as to be expected.
No particular statement seems to be to blame, but the profilers all show the effects of skewing due to unwinding the stack at GC safepoints, which typically means loop back-edges. I was unable to build the "honest-profiler" for OpenJDK: https://github.com/RichardWarburton/honest-profiler The program allocates very little memory (80MB in 10 seconds), so I don't believe GC is a factor. It's almost pure computation on an already-allocated data structure.
and the Go code was:
|
I'm wondering the same thing as well and was surprised...the Java regex is significantly faster...5x. I have 3 patterns that are compiled then cached....then search for a match on all of them: --re2j before caching: --java before caching: --re2j with cache: |
Use this: http://www.brendangregg.com/perf.html#FlameGraphs
Then you will have safepoint-free performance profile. You will probably also want to set Using GetStackTraceAsync (Honest Profiler, etc) is a nice idea, but it's still more intrusive than perf. |
I was looking for a faster regex package ...but I can confirm this one is much slower than plain Also, it is not fully compatible with java's character classes like |
@dagnelies I did a few modification that did bring quite a big perfomance boost here |
Original report: https://groups.google.com/forum/#!topic/re2j-discuss/8c3L06m6wbY
pattern is : ".d|e|cart|jinjian|kk."
String is: "aajinjianaksdjflaajinjianaksdjflaajinjianaksdjfl"
Java takes: 330ms
re2j takes: 4257ms
Thanks for your time to take a look at it.
This came up also on prometheus/jmx_exporter#23 @bbaja42
The text was updated successfully, but these errors were encountered: