You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are encountering issues where the total number of connections gradually decreases from the maximum pool size to zero. This situation is exacerbated by a single-threaded connection adder, which is causing HikariCP to fail in renewing or creating new connections.
2024-11-19 10:53:55.674 ERROR [MGQL Discovery Service,d27a993960a2d692,d27a993960a2d692] 1 --- [ scheduling-1] o.h.engine.jdbc.spi.SqlExceptionHelper : wm-jdbc-pool-1 - Connection is not available, request timed out after 60000ms.
2024-11-19 10:53:55.677 ERROR [MGQL Discovery Service,d27a993960a2d692,d27a993960a2d692] 1 --- [ scheduling-1] c.w.s.k.kstar.services.DBMonitorService : DBMonitor failed
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:467) ~[spring-orm-5.3.31.jar!/:5.3.31]
at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400) ~[spring-tx-5.3.31.jar!/:5.3.31]
at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373) ~[spring-tx-5.3.31.jar!/:5.3.31]
at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:595) ~[spring-tx-5.3.31.jar!/:5.3.31]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:382) ~[spring-tx-5.3.31.jar!/:5.3.31]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119) ~[spring-tx-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.dao.support.PersistenceExceptionTranslationInterceptor.invoke(PersistenceExceptionTranslationInterceptor.java:137) ~[spring-tx-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.data.jpa.repository.support.CrudMethodMetadataPostProcessor$CrudMethodMetadataPopulatingMethodInterceptor.invoke(CrudMethodMetadataPostProcessor.java:174) ~[spring-data-jpa-2.7.18.jar!/:2.7.18]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241) ~[spring-aop-5.3.31.jar!/:5.3.31]
at jdk.proxy2/jdk.proxy2.$Proxy301.count(Unknown Source) ~[na:na]
at com.walmart.store.kgraph.kstar.services.DBMonitorService.check(DBMonitorService.java:52) ~[mgql-registry-core-0.524.0.jar!/:0.524.0]
at jdk.internal.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:798) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.cloud.sleuth.instrument.scheduling.TraceSchedulingAspect.traceBackgroundThread(TraceSchedulingAspect.java:70) ~[spring-cloud-sleuth-instrumentation-3.1.10.jar!/:3.1.10]
at jdk.internal.reflect.GeneratedMethodAccessor88.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:634) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:624) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:72) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762) ~[spring-aop-5.3.31.jar!/:5.3.31]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:707) ~[spring-aop-5.3.31.jar!/:5.3.31]
at com.walmart.store.kgraph.kstar.services.DBMonitorService$$EnhancerBySpringCGLIB$$154d6a6b.check(<generated>) ~[mgql-registry-core-0.524.0.jar!/:0.524.0]
at jdk.internal.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.3.31.jar!/:5.3.31]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.31.jar!/:5.3.31]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:37) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.internal.SessionImpl.connection(SessionImpl.java:516) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:152) ~[spring-orm-5.3.31.jar!/:5.3.31]
at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:421) ~[spring-orm-5.3.31.jar!/:5.3.31]
... 49 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: wm-jdbc-pool-1 - Connection is not available, request timed out after 60000ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:696) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) ~[HikariCP-4.0.3.jar!/:na]
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) ~[HikariCP-4.0.3.jar!/:na]
at com.walmart.platform.data.walmartjdbc.jdbc.WmtDataSource.getConnection(WmtDataSource.java:36) ~[walmart-jdbc-connector-1.0.42.jar!/:na]
at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108) ~[hibernate-core-5.6.15.Final.jar!/:5.6.15.Final]
... 53 common frames omitted
When the total connections reach 0, all requests to the connection pool fail due to connection not available and the following exception is thrown:
org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60002ms.
We have observed multiple instances where connections are gradually drained, resulting in no available connections in the pool to handle incoming traffic. Consequently, requests are left waiting for connections and ultimately time out. Ideally, Hikari should maintain a minimum number of connections, with the default minimum set at 10. This issue has been noted in a limited number of pods and may be triggered by network problems or server degradation. Total, idle and actrive connections are zero for hours, as you can see in the below graph. To address this situation, we had to terminate the affected pods.
Captured a thread dump from one of the impacted pods for analysis. It appears that a single-threaded connection adder is responsible for adding connections to the pool when the count falls below the minimum idle threshold. However, this adder became unresponsive while waiting to establish a connection. Below is the complete stack trace for reference.
"wm-jdbc-pool-1 connection adder" #130175 daemon prio=5 os_prio=0 cpu=12.23ms elapsed=13998.58s tid=0x000070cf721939f0 nid=0x1fc8a waiting on condition [0x000070cf68dfc000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.8.1.101/Native Method)
- parking to wait for <0x00000006d5c03260> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.8.1.101/LockSupport.java:211)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.8.1.101/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.8.1.101/ForkJoinPool.java:3465)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.8.1.101/ForkJoinPool.java:3436)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.8.1.101/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@17.0.8.1.101/CompletableFuture.java:2072)
at com.microsoft.sqlserver.jdbc.SQLServerMSAL4JUtils.getSqlFedAuthToken(SQLServerMSAL4JUtils.java:96)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:6022)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:5989)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:5823)
at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:322)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:130)
at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:42)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:6881)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:5428)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:5360)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7748)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4410)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:3847)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:3372)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:3197)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:1971)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1263)
at com.walmart.platform.data.walmartjdbc.jdbc.WalmartJDBCDriver.connect(WalmartJDBCDriver.java:73)
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:121)
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
at com.zaxxer.hikari.pool.HikariPool.access$100(HikariPool.java:71)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:726)
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:712)
at java.util.concurrent.FutureTask.run(java.base@17.0.8.1.101/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1.101/ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1.101/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17.0.8.1.101/Thread.java:833)
The call to future.get() is a blocking call that waits for the result and if it takes too long or fails to complete, it blocks the current thread indefinitely.
We had a call with mssql-jdbc developers and they agreed that calls to MSAL authentication library done from the mssql-jdbc java driver code use future.get(), potentially leading to hangs in case the library does not receive timely response from the server. All these calls done as part of getFedAuthToken need to have timeouts imposed. And they are working on it as part of this PR.
I was considering that Hikari should be resilient enough to handle these situations, this can happen with any other underlying JDBC driver. It would be beneficial for Hikari to execute the newPoolEntry in a new thread and make the operation time bound. @brettwooldridge What are your thoughts on this?
The text was updated successfully, but these errors were encountered:
@brettwooldridge , we encountered the same issue again this week with the same error, where the process is stuck at the same point with a single-threaded connection adder. What are your thoughts on this approach?
Hi @brettwooldridge,
We are encountering issues where the total number of connections gradually decreases from the maximum pool size to zero. This situation is exacerbated by a single-threaded connection adder, which is causing HikariCP to fail in renewing or creating new connections.
When the total connections reach 0, all requests to the connection pool fail due to connection not available and the following exception is thrown:
org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 60002ms.
Here's the details for our application:
socket timeout: 77 seconds
login timeout: 85 seconds
hikari connection timeout: 60 seconds
Hikari version: 4.0.3
mssql-jdbc version: 12.6.3.jre8
msal4j version: 1.15.1
Issue Details:
We have observed multiple instances where connections are gradually drained, resulting in no available connections in the pool to handle incoming traffic. Consequently, requests are left waiting for connections and ultimately time out. Ideally, Hikari should maintain a minimum number of connections, with the default minimum set at 10. This issue has been noted in a limited number of pods and may be triggered by network problems or server degradation. Total, idle and actrive connections are zero for hours, as you can see in the below graph. To address this situation, we had to terminate the affected pods.
Captured a thread dump from one of the impacted pods for analysis. It appears that a single-threaded connection adder is responsible for adding connections to the pool when the count falls below the minimum idle threshold. However, this adder became unresponsive while waiting to establish a connection. Below is the complete stack trace for reference.
The call to future.get() is a blocking call that waits for the result and if it takes too long or fails to complete, it blocks the current thread indefinitely.
We had a call with mssql-jdbc developers and they agreed that calls to MSAL authentication library done from the mssql-jdbc java driver code use future.get(), potentially leading to hangs in case the library does not receive timely response from the server. All these calls done as part of getFedAuthToken need to have timeouts imposed. And they are working on it as part of this PR.
I was considering that Hikari should be resilient enough to handle these situations, this can happen with any other underlying JDBC driver. It would be beneficial for Hikari to execute the newPoolEntry in a new thread and make the operation time bound. @brettwooldridge What are your thoughts on this?
The text was updated successfully, but these errors were encountered: