Build Efficient HTTP Clients in Java

2025-04-25

#Java#Spring Boot#HTTP Clients#Performance

Introduction

In the world of modern microservices, efficient inter-service communication is crucial.
Most Java-based services rely on HTTP clients like WebClient or RestTemplate to interact with other systems, APIs, or internal services.

However, a surprisingly common anti-pattern is still found in many production systems:
creating a new HTTP client instance for every request.

At first glance, this might seem harmless — but under the hood, it creates serious performance problems:

Increased memory usage and garbage collection pressure

Unnecessary connection pools per request

Higher load on proxies and load balancers

Unstable performance under high traffic

This pattern often appears in code written by junior developers or even experienced engineers
who copy snippets from tutorials, blog posts, or ChatGPT — without fully understanding the implications.

Many of these examples demonstrate creating a WebClient or RestTemplate inline:
and that code, unfortunately, ends up in production environments unchanged and unoptimized.

In this article, we'll walk through:

Why creating new HTTP clients per request is dangerous
How to configure and reuse clients efficiently using a shared ConnectionProvider
How to align your client settings with Linux kernel parameters

This level of tuning is often overlooked — but it makes a critical difference in real-world production environments.

Why creating new HTTP clients per request is dangerous

At first glance, instantiating a new WebClient or RestTemplate inside each method may seem like a safe and straightforward approach.

public class MyService {

    public String fetchData() {
        WebClient client = WebClient.create("https://example.com");
        return client.get()
                     .retrieve()
                     .bodyToMono(String.class)
                     .block();
    }
}

It works, it’s easy to copy-paste, and many online tutorials demonstrate this pattern.
However, in production systems — especially under high load — this practice quickly becomes problematic.

Here are the core reasons why:

1. Excessive object creation leads to memory pressure and GC overhead

Each time a new HTTP client is created, it also creates a new underlying connection pool, thread pool, and associated memory structures.
When done repeatedly, this results in:

Increased memory consumption
More frequent garbage collection cycles
Higher CPU usage due to object churn

The JVM has to allocate and reclaim memory constantly, which negatively affects throughput and responsiveness — especially if the GC is not tuned for this pattern.

2. Unnecessary connection pools destroy performance

HTTP clients like WebClient or HttpClient typically create a connection pool per instance.
Creating a new client for each request means:

You don’t reuse existing TCP connections
You pay the full cost of connection establishment every time (handshakes, SSL/TLS, etc.)
Proxies and load balancers are overwhelmed by short-lived connections

This adds significant latency to every request and causes random slowdowns that are hard to debug in distributed environments.

3. No centralized control over timeouts and connection limits

When you create a client inline, it usually uses default settings: default timeouts, unbounded connection lifetimes, no eviction policies, and so on.

Without centralized configuration:

Timeouts are inconsistent across services
Limits like maxConnections or idleTimeout are not enforced
Observability and tuning become almost impossible

4. It doesn’t scale — especially in containerized environments

In Kubernetes every container has limited resources.
Spinning up new connection pools for each request quickly leads to:

Socket exhaustion
Increased file descriptor usage
Unexpected latency spikes due to OS-level throttling

The more microservices you run, the worse this gets.
What works in local dev breaks in staging or prod.

In short: creating a new HTTP client per request is like opening a new database connection every time you query the DB.
It might work, but it's absolutely not how you build scalable, production-ready software.

How to configure and reuse clients efficiently using a shared ConnectionProvider

The right way to manage HTTP clients in Spring Boot is to configure them as singleton beans and reuse them across the application.
This ensures consistent connection settings, avoids unnecessary object creation, and allows for connection reuse via a shared connection pool.

The key to this setup is using a shared ConnectionProvider, which underpins the client's connection pool behavior.

A ConnectionProvider is a Reactor Netty abstraction that controls how connections are created, reused, and managed for a reactive HTTP client.
It handles things like:

Connection pooling
Maximum number of concurrent connections
Idle timeout and connection lifetime
Background eviction of expired or idle connections

By default, if you don't explicitly configure a ConnectionProvider, each WebClient creates its own instance with default settings — meaning a separate connection pool per client.

Instead of letting each WebClient instance create its own connection pool, you should create a single, shared provider and inject it wherever needed.
This allows multiple clients to share the same connection pool, which leads to much better performance and resource usage.

Step 1: Define a shared ConnectionProvider

private final ConnectionProvider connectionProvider = ConnectionProvider.builder("shared-conn")
    .maxConnections(4096)                        // Maximum number of connections in the pool
    .pendingAcquireTimeout(Duration.ofMillis(5000)) // How long to wait for an available connection
    .maxIdleTime(Duration.ofSeconds(10))        // Idle time before a connection is closed
    .maxLifeTime(Duration.ofMinutes(2))         // Total lifetime of a connection
    .evictInBackground(Duration.ofSeconds(120)) // How often to clean up expired connections
    .build();

This configuration is especially effective in high-load environments like Kubernetes,
where resources are shared and connection overhead must be minimized.

You can of course adjust the limits depending on expected traffic:
for moderately loaded services, maxConnections = 1024 is usually sufficient.

We'll go deeper into connection sizing, resource limits, and how to align these settings with system-level constraints —
in the next section: "How to align your client settings with Linux kernel parameters".

Step 2: Create a reusable WebClient bean

@Configuration
public class WebClientConfig {

    private final ConnectionProvider connectionProvider = ConnectionProvider.builder("shared-conn")
        .maxConnections(4096)
        .pendingAcquireTimeout(Duration.ofMillis(5000))
        .maxIdleTime(Duration.ofSeconds(10))
        .maxLifeTime(Duration.ofMinutes(2))
        .evictInBackground(Duration.ofSeconds(120))
        .build();

    @Bean
    public WebClient webClient(WebClient.Builder builder) {
        return builder
            .clientConnector(new ReactorClientHttpConnector(
                HttpClient.create().connectionProvider(connectionProvider)))
            .baseUrl("https://example.com")
            .build();
    }
}

This creates a single WebClient instance with a shared connection pool that is reused across your service.
You can also define multiple WebClients with different base URLs but the same connection provider.

Step 3: Inject the WebClient in your service

@Service
public class MyService {

    private final WebClient webClient;

    public MyService(WebClient webClient) {
        this.webClient = webClient;
    }

    public String fetchData() {
        return webClient.get()
                        .uri("/data")
                        .retrieve()
                        .bodyToMono(String.class)
                        .block();
    }
}

Now your client is:

Configured once
Reused across requests
Connected to a shared pool with optimized connection reuse
Tuned to work efficiently with the underlying Linux kernel

This approach dramatically improves memory efficiency, reduces latency,
and gives you full control over how connections are managed at scale.

The same principle applies to other HTTP clients as well —
for example, RestTemplate can also be configured with a shared HttpComponentsClientHttpRequestFactory that reuses a pooled HttpClient.
No matter what client you use, connection reuse and centralized configuration are essential for production-grade performance.

How to align your client settings with Linux kernel parameters

Even the most optimized HTTP client configuration can run into problems if it doesn’t respect the limits of the underlying operating system.
When running on Linux — especially inside containers — your connection pool settings should be aligned with kernel-level networking parameters.
Otherwise, your service may appear slow, flaky, or even start dropping requests under load.

Understand the environment your service is running in

Before tuning your client, make sure you understand:

How many CPU cores and threads are available
What ulimit settings (like nofile) apply to your process
Which kernel parameters control connection queues and sockets

You can get this information via:

ulimit -n               # max open file descriptors  
sysctl net.core.somaxconn  
sysctl net.ipv4.tcp_max_syn_backlog

Key Linux kernel parameters to consider

net.core.somaxconn
Controls the maximum length of the TCP backlog queue (default is often too low — like 128).
If your client makes many outgoing connections or accepts a high number of incoming ones, this must be increased (e.g. 4096 or higher).
ulimit -n (nofile)
Sets the maximum number of open file descriptors per process.
Every TCP connection consumes a file descriptor — so your connection pool size must not exceed this limit.
tcp_max_syn_backlog
Defines the maximum number of unacknowledged TCP connection attempts.
Important for services exposed externally or with high connection churn.

Aligning client settings with kernel-level limits

If your ConnectionProvider is configured with maxConnections = 4096,
but your Linux ulimit -n is set to 1024, you're going to hit unexpected failures —
like Too many open files, refused connections, or degraded performance under load.

To avoid this:

Ensure your maxConnections is lower than or equal to the available file descriptor limit (ulimit -n)
Consider increasing net.core.somaxconn if your service handles spikes or burst traffic
Monitor connection metrics — such as active connections, pending acquisitions, and socket errors

Tuning only the Java layer is not enough —
for real production resilience, your configuration must be aligned with how the OS and container runtime behave under pressure.

Production Tip: Calculating client limits in containerized environments

In real-world Kubernetes environments, multiple containers often share the same physical node.
Even if each container is isolated, they all rely on the same Linux kernel, which enforces global limits like file descriptors and connection backlogs.

To avoid resource contention or unexpected socket failures, your maxConnections and related settings must be calculated conservatively based on available host resources.

Step-by-step guide to sizing

Estimate expected concurrent connections per service
Multiply expected concurrency by number of replicas: connections_per_instance * replica_count
Check the host’s global limits
Run on the node:

   ulimit -n                        # Max file descriptors per process  
   cat /proc/sys/fs/file-max       # Max open files for the whole system  
   sysctl net.core.somaxconn

Apply safe per-service limits

A rough formula:

   maxConnections = min(  
    (ulimit -n) * 0.8,  
    (cpu_cores_available) * 200  
   )

For most services, maxConnections = 256–1024 is a safe starting point.

General recommendations

Avoid scheduling too many pods per node — distribute load horizontally
Monitor metrics like active connections, timeouts, and pending acquisitions
Prefer lower limits with shared pools instead of oversizing per pod
Validate limits via load testing — don’t rely on theory alone

Remember: even the best WebClient setup will fail under pressure if you exceed OS-level boundaries.
Tuning the Java layer is essential, but aligning with the host kernel is what makes it resilient in production.

Conclusion

Building efficient REST clients isn’t just about using the right library — it’s about understanding how your application interacts with the underlying system.

Creating a new WebClient or RestTemplate per request might work in development, but it falls apart at scale.
Without connection reuse and proper tuning, your service will consume more memory, increase latency, and degrade under load — often in ways that are hard to detect until it’s too late.

By using a shared ConnectionProvider, aligning with kernel-level constraints, and setting realistic connection pool limits,
you’ll ensure that your microservices behave predictably — even under production pressure.

Don’t just copy code from StackOverflow or ChatGPT.
Understand how it works, tune it for your environment, and build systems that scale.

If you found this article helpful, feel free to reach out or share it with your team —
especially those who are still instantiating clients inside @Service methods 😉