Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] kv and grok filters timeout handling performance regression #11302

Open
colinsurprenant opened this issue Nov 12, 2019 · 2 comments
Open

Comments

@colinsurprenant
Copy link
Contributor

We have recently discovered a substantial performance regression introduced by #10976 related to the use of the JRuby Timeout.timeout which surfaces in both the kv and grok filters when enabling timeouts (which are enabled by default). Some information on this regression is available in logstash-plugins/logstash-filter-grok#152.

The performance regression affects the kv filter starting at v4.4.0 and the grok filter starting at v4.1.0. The performance regression is compounded when using multiple fallback patterns in grok in particular.

The best temporary solution is to disable the timeout handling by setting timeout_millis => 0 in both kv and grok. Downgrading to a previous version is not recommended because these were producing random crashes in logstash. We understand that this workaround is not ideal as it can also result in pathological patterns to execute for an unbounded period. Please read our guide about grok patterns https://www.elastic.co/blog/do-you-grok-grok which provides tips on how to avoid such slow or pathological patterns.

We are actively into solving this issue, we will report back with progress.

So far we believe the problem is with the JRuby Timeout.timeout implementation. @jsvd found the following:

I believe I found a way to replicate the slowdown. This is due to jruby using an [executor sized by number of processors]?(https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/ext/timeout/Timeout.java#L64) to kill threads (or just yield) when the thread finished before the timeout.

require 'timeout'

NUM_THREADS = java.lang.Runtime.getRuntime.availableProcessors() * 5

threads = NUM_THREADS.times.map do |tnum|
  Thread.new do
    loop do
      Timeout.timeout(0.1) { 1 } rescue Timeout::Error
    end
  end
end

threads.each(&:join)
jstack 58640 | grep Timeout.yieldWithTimeout | wc -l
40

Example thread stacktrace:

"Ruby-0-Thread-34: script.rb:1" #56 daemon prio=5 os_prio=31 cpu=1262.47ms elapsed=27.15s tid=0x00007fd108819000 nid=0x13103 waiting on condition  [0x000070001312f000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.1/Native Method)
	- parking to wait for  <0x0000000700b72e88> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.1/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.1/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.1/AbstractQueuedSynchronizer.java:917)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.1/AbstractQueuedSynchronizer.java:1240)
	at java.util.concurrent.locks.ReentrantLock.lock(java.base@11.0.1/ReentrantLock.java:267)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.remove(java.base@11.0.1/ScheduledThreadPoolExecutor.java:1046)
	at java.util.concurrent.ThreadPoolExecutor.remove(java.base@11.0.1/ThreadPoolExecutor.java:1751)
	at org.jruby.ext.timeout.Timeout.killTimeoutThread(Timeout.java:192)
	at org.jruby.ext.timeout.Timeout.yieldWithTimeout(Timeout.java:129)
	at org.jruby.ext.timeout.Timeout.timeout(Timeout.java:99)
	at org.jruby.ext.timeout.Timeout.timeout(Timeout.java:75)
	at org.jruby.ext.timeout.Timeout.timeout(Timeout.java:70)
	at org.jruby.ext.timeout.Timeout$INVOKER$s$timeout.call(Timeout$INVOKER$s$timeout.gen)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:182)
	at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:191)
	at script.invokeOther3:timeout(script.rb:8)
	at script.RUBY$block$\=script\,rb$2(script.rb:8)
@colinsurprenant
Copy link
Contributor Author

In progress PR for grouped timeouts logstash-plugins/logstash-filter-grok#153 which should help mitigate the problem.

@colinsurprenant
Copy link
Contributor Author

not that v4.2.0 of the grok filter was published with the new timeout_scope option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant