Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The HikariPool intermittently drops to zero, because of single thread connection-adder it's causing HikariCP to fail in creating new connections #2274

Open
001mertiya opened this issue Dec 17, 2024 · 1 comment

Comments

@001mertiya
Copy link

001mertiya commented Dec 17, 2024

Hi @brettwooldridge,

We are encountering issues where the total number of connections gradually decreases from the maximum pool size to zero. This situation is exacerbated by a single-threaded connection adder, which is causing HikariCP to fail in renewing or creating new connections.

2024-11-19 10:53:55.674 ERROR [MGQL Discovery Service,d27a993960a2d692,d27a993960a2d692] 1 --- [   scheduling-1] o.h.engine.jdbc.spi.SqlExceptionHelper   : wm-jdbc-pool-1 - Connection is not available, request timed out after 60000ms.
2024-11-19 10:53:55.677 ERROR [MGQL Discovery Service,d27a993960a2d692,d27a993960a2d692] 1 --- [   scheduling-1] c.w.s.k.kstar.services.DBMonitorService  : DBMonitor failed

org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:467) ~[spring-orm-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:595) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:382) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137) ~[spring-tx-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:174) ~[spring-data-jpa-2.7.18.jar!/:2.7.18]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at jdk.proxy2/jdk.proxy2.$Proxy301.count(Unknown Source) ~[na:na]
at com.walmart.store.kgraph.kstar.services.DBMonitorService.check(DBMonitorService.java:52) ~[mgql-registry-core-0.524.0.jar!/:0.524.0]
	at jdk.internal.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
	at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:798) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.cloud.sleuth.instrument.scheduling.TraceSchedulingAspect.traceBackgroundThread(TraceSchedulingAspect.java:70) ~[spring-cloud-sleuth-instrumentation-3.1.10.jar!/:3.1.10]
	at jdk.internal.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:634) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:624) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:72) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:707) ~[spring-aop-5.3.31.jar!/:5.3.31]
	at com.walmart.store.kgraph.kstar.services.DBMonitorService$$EnhancerBySpringCGLIB$$154d6a6b.check(<generated>) ~[mgql-registry-core-0.524.0.jar!/:0.524.0]
	at jdk.internal.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.3.31.jar!/:5.3.31]
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.31.jar!/:5.3.31]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
	at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:37) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.internal.SessionImpl.connection(SessionImpl.java:516) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:152) ~[spring-orm-5.3.31.jar!/:5.3.31]
	at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:421) ~[spring-orm-5.3.31.jar!/:5.3.31]
	... 49 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: wm-jdbc-pool-1 - Connection is not available, request timed out after 60000ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) ~[HikariCP-4.0.3.jar!/:na]
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) ~[HikariCP-4.0.3.jar!/:na]
	at com.walmart.platform.data.walmartjdbc.jdbc.WmtDataSource.getConnection(WmtDataSource.java:36) ~[walmart-jdbc-connector-1.0.42.jar!/:na]
	at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
	... 53 common frames omitted

When the total connections reach 0, all requests to the connection pool fail due to connection not available and the following exception is thrown:

org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60002ms.

Here's the details for our application:

socket timeout: 77 seconds
login timeout: 85 seconds
hikari connection timeout: 60 seconds

Hikari version: 4.0.3
mssql-jdbc version: 12.6.3.jre8
msal4j version: 1.15.1

Issue Details:

We have observed multiple instances where connections are gradually drained, resulting in no available connections in the pool to handle incoming traffic. Consequently, requests are left waiting for connections and ultimately time out. Ideally, Hikari should maintain a minimum number of connections, with the default minimum set at 10. This issue has been noted in a limited number of pods and may be triggered by network problems or server degradation. Total, idle and actrive connections are zero for hours, as you can see in the below graph. To address this situation, we had to terminate the affected pods.

Screenshot 2024-12-17 at 5 22 41 PM

Captured a thread dump from one of the impacted pods for analysis. It appears that a single-threaded connection adder is responsible for adding connections to the pool when the count falls below the minimum idle threshold. However, this adder became unresponsive while waiting to establish a connection. Below is the complete stack trace for reference.

"wm-jdbc-pool-1 connection adder" #130175 daemon prio=5 os_prio=0 cpu=12.23ms elapsed=13998.58s tid=0x000070cf721939f0 nid=0x1fc8a waiting on condition [0x000070cf68dfc000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.8.1.101/Native Method)
- parking to wait for <0x00000006d5c03260> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.8.1.101/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.8.1.101/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.8.1.101/ForkJoinPool.java:3465)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.8.1.101/ForkJoinPool.java:3436)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.8.1.101/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@17.0.8.1.101/CompletableFuture.java:2072)
at com.microsoft.sqlserver.jdbc.SQLServerMSAL4JUtils.getSqlFedAuthToken(SQLServerMSAL4JUtils.java:96)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:6022)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:5989)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:5823)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:322)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:130)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:6881)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:5428)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:5360)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7748)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4410)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3847)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:3372)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:3197)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1971)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1263)
at com.walmart.platform.data.walmartjdbc.jdbc.WalmartJDBCDriver.connect(WalmartJDBCDriver.java:73)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:121)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
at com.zaxxer.hikari.pool.HikariPool.access$100(HikariPool.java:71)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:726)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:712)
at java.util.concurrent.FutureTask.run(java.base@17.0.8.1.101/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1.101/ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1.101/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17.0.8.1.101/Thread.java:833)

The call to future.get() is a blocking call that waits for the result and if it takes too long or fails to complete, it blocks the current thread indefinitely.

We had a call with mssql-jdbc developers and they agreed that calls to MSAL authentication library done from the mssql-jdbc java driver code use future.get(), potentially leading to hangs in case the library does not receive timely response from the server. All these calls done as part of getFedAuthToken need to have timeouts imposed. And they are working on it as part of this PR.

I was considering that Hikari should be resilient enough to handle these situations, this can happen with any other underlying JDBC driver. It would be beneficial for Hikari to execute the newPoolEntry in a new thread and make the operation time bound. @brettwooldridge What are your thoughts on this?

@001mertiya
Copy link
Author

@brettwooldridge , we encountered the same issue again this week with the same error, where the process is stuck at the same point with a single-threaded connection adder. What are your thoughts on this approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant