Direct Memory grows from 5MB-1GB leading to application crash #3558

abhishek-sharma-20 · 2024-12-22T05:57:12Z

We have a routing reactive application which take a request from client and then routes this request to a backend. We originally had a non-reactive solution now we are moving to reactive solution. We have a flow in which backend response is large (1MB-5MB) in these cases when running with 2-3TPS for a duration of 30mins to 1 hour we see application getting restarted automatically . On investigating we saw it is getting restarted because application is consuming more memory and hitting memory limit of pod and leading to restart. On observing memory usage patterns we saw direct memory was keep on growing from 5MB - 1024 MB and then leading to application restart.

Expected Behavior

Application should be able to support large response size and should not consume more memory than expected.

Actual Behavior

Application is consuming more memory than expected and leading to application restart.

Steps to Reproduce

###HttpConfig.java

@Bean
    public ConnectionProvider connectionProvider() {
        return ConnectionProvider.builder("httpConn")
                .maxConnections(20)
                .metrics(true)
                .build();
    }

    @Bean
    LoopResources loopResources() {
        return LoopResources.create("loop", 100, true);
    }

    @Bean
    public PooledByteBufAllocator byteBufAllocator() {
        return PooledByteBufAllocator.DEFAULT;
    }

    public SslContext armSSLContext() {
        SslContext sslContext = null;
        String keyStoreFile = System.getProperty("javax.net.ssl.keyStore");
        char[] keyStorePassword = System.getProperty("javax.net.ssl.keyStorePassword").toCharArray();
        try (InputStream keyStoreStream = Files.newInputStream(Paths.get(keyStoreFile))) {
            KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
            keyStore.load(keyStoreStream, keyStorePassword);
            KeyManagerFactory keyManagerFactory = KeyManagerFactory.getInstance("SunX509");
            keyManagerFactory.init(keyStore, keyStorePassword);

            KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
            trustStore.load(new FileInputStream(keyStoreFile), keyStorePassword);

            TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance("SunX509");
            trustManagerFactory.init(trustStore);

            sslContext = SslContextBuilder.forClient().keyManager(keyManagerFactory)
                    .trustManager(trustManagerFactory)
                    .build();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return sslContext;
    }

###HttpWebClient.java

 public WebClient reactiveWebClientBuilder() {
        HttpClient httpClient = HttpClient.create(connectionProvider)
                .resolver(DefaultAddressResolverGroup.INSTANCE)
                .secure(spec -> spec.sslContext(armSSLContext))
                .compress(true)
                .metrics(true, s->s)
                .option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 0)
                .option(ChannelOption.TCP_NODELAY, true)
                .doOnConnected(conn -> conn.addHandlerLast(new ReadTimeoutHandler(
                        (int) TimeUnit.MILLISECONDS.toSeconds(connectionTimeout.intValue())
                )))
                .runOn(loopResources);
        return WebClient.builder()
            .clientConnector(new ReactorClientHttpConnector(httpClient))
            .codecs(configurer -> configurer.defaultCodecs().maxInMemorySize(armProperties.getHttp().getByteBufferSize() *1024 * 1024)) // 6MB
            .build();
    }

###ResponseController

 @GetMapping("/response")
    public Mono<ResponseEntity<String>> getResponse(@RequestParam int size) {
        return reactiveWebClientBuilder.method(HttpMethod.GET)
                .uri(uriBuilder -> uriBuilder.scheme("https").host(armProp.getHttp().getHostname())
            .port(armProp.getHttp().getPort()).path("/path").build())
                .contentType(MediaType.APPLICATION_JSON)
                .headers(this::addHttpHeaders).retrieve()
                .onStatus(httpStatusCode -> HttpStatus.valueOf(httpStatusCode.value())
                        .isError(), t -> Mono.empty()).toEntity(String.class);
    }

Above are our classes used for this app and below is JVM arguments used for this application
-Xmx3072M -Xss256K -Xms3072M -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 -XX:MaxGCPauseMillis=500 -XX:MetaspaceSize=64m -XX:MaxMetaspaceSize=512m -XX:+DisableExplicitGC -XX:MaxJavaStackTraceDepth=15 -Dspring.config.location=optional:classpath:/,optional:classpath:/config/ -Dorg.springframework.boot.logging.LoggingSystem=none

Possible Solution

Your Environment

reactor-netty-core: 1.2.0
netty: 4.1.111.Final
spring-framework: 6.1.13
spring-boot: 3.2.9

Below are system details

System Linux (5.4.0-200-generic)
OpenJDK 17.0.13
Kubernetes pod with 3 core CPU and 4 GB RAM

Reactor version(s) used:
Other relevant libraries versions (eg. netty, ...):
JVM version (java -version):
OS and version (eg. uname -a):

The text was updated successfully, but these errors were encountered:

violetagg · 2024-12-30T13:47:10Z

@abhishek-sharma-20 Is it possible that you see this issue spring-projects/spring-framework#29772 ?

abhishek-sharma-20 · 2024-12-31T06:45:05Z

Hi @violetagg We did more analysis and we are seeing same issue in VMs as well, in VMs as we have shared memory hence we don't see any crashed but we observed reactor_netty_bytebuf_allocator_used_direct_memory gauge is going till 1.3 GB

violetagg · 2025-01-02T09:12:30Z

@abhishek-sharma-20 Can you specify your limit configuration for direct memory?
In the shared issue above there is a link where it is explained how direct memory limit is calculated #2590 (comment)

abhishek-sharma-20 · 2025-01-02T10:03:37Z

Hi @violetagg We don't have any configuration for direct memory we have a memory limit for each pod which is 4GB. Our application starts with 3GB of heap size and when we run it on load direct memory reaches 1GB leading to pod memory limit break (3GB heap + 1GB direct memory) and pod restart

Additional we have loop-resource=300 and each response size is 5MB which equate to 300*5MB =1.5GB(at peak) which is what we see direct memory used how ever if direct memory is getting pooled this should not reach 1.5GB as per our understanding (please correct)

violetagg · 2025-01-02T10:22:31Z

@abhishek-sharma-20 Then isn't this expected?

violetagg · 2025-01-02T10:30:27Z

Additional we have loop-resource=300 and each response size is 5MB which equate to 300*5MB =1.5GB(at peak) which is what we see direct memory used how ever if direct memory is getting pooled this should not reach 1.5GB as per our understanding (please correct)

You do not have sequential requests/responses (I assume)? The direct memory is used not only for the concrete data that comes with the request/response but also for internal implementation (TLS handshake, parsing HTTP etc.). Spring Framework uses it for decoding/encoding.

abhishek-sharma-20 · 2025-01-03T12:23:42Z

Yes That is correct we wanted to understand that once the load(number of request or size of response) is reduced will the direct memory be released or not? In our case even after reduced load direct memory is not reduced.

We are also running performance test with less number of loop resources and will share more insights once we are done with test

violetagg · 2025-01-03T12:30:50Z

Yes That is correct we wanted to understand that once the load(number of request or size of response) is reduced will the direct memory be released or not? In our case even after reduced load direct memory is not reduced.

Check this https://projectreactor.io/docs/netty/release/reference/http-client.html#metrics

This will not be reduced reactor.netty.bytebuf.allocator.used.direct.memory, but this will be reduced reactor.netty.bytebuf.allocator.active.direct.memory

abhishek-sharma-20 added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Dec 22, 2024

violetagg self-assigned this Dec 30, 2024

violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct Memory grows from 5MB-1GB leading to application crash #3558

Direct Memory grows from 5MB-1GB leading to application crash #3558

abhishek-sharma-20 commented Dec 22, 2024

violetagg commented Dec 30, 2024

abhishek-sharma-20 commented Dec 31, 2024

violetagg commented Jan 2, 2025 •

edited

Loading

abhishek-sharma-20 commented Jan 2, 2025 •

edited

Loading

violetagg commented Jan 2, 2025

violetagg commented Jan 2, 2025 •

edited

Loading

abhishek-sharma-20 commented Jan 3, 2025

violetagg commented Jan 3, 2025

Direct Memory grows from 5MB-1GB leading to application crash #3558

Direct Memory grows from 5MB-1GB leading to application crash #3558

Comments

abhishek-sharma-20 commented Dec 22, 2024

Expected Behavior

Actual Behavior

Steps to Reproduce

Possible Solution

Your Environment

violetagg commented Dec 30, 2024

abhishek-sharma-20 commented Dec 31, 2024

violetagg commented Jan 2, 2025 • edited Loading

abhishek-sharma-20 commented Jan 2, 2025 • edited Loading

violetagg commented Jan 2, 2025

violetagg commented Jan 2, 2025 • edited Loading

abhishek-sharma-20 commented Jan 3, 2025

violetagg commented Jan 3, 2025

violetagg commented Jan 2, 2025 •

edited

Loading

abhishek-sharma-20 commented Jan 2, 2025 •

edited

Loading

violetagg commented Jan 2, 2025 •

edited

Loading