[BUG] McpStreamableServerSession does not close server-side socket when client disconnects, causing CLOSE-WAIT leak and thread pool exhaustion

## Description

When using the MCP Java SDK's Streamable HTTP server transport (via `spring-ai-starter-mcp-server-webmvc`), the server-side socket is not properly closed after the client disconnects (sends TCP FIN). This causes connections to remain in `CLOSE-WAIT` state indefinitely, each holding a Tomcat worker thread. Under moderate load, the entire Tomcat thread pool is exhausted within seconds, making the server completely unresponsive to any new requests including health checks.

## Environment

- Spring AI: 1.1.4
- MCP Java SDK: (bundled with Spring AI 1.1.4)
- Java: JDK 25
- Server: Tomcat 10.1.34 (WAR deployment via Spring Boot 3.4.1)
- Transport: Streamable HTTP (`spring.ai.mcp.server.protocol=STREAMABLE`)
- OS: Linux (Kubernetes pod, 2 CPU / 4GB RAM)

## Configuration

```yaml
spring:
  ai:
    mcp:
      server:
        type: SYNC
        protocol: STREAMABLE
        streamable-http:
          mcp-endpoint: /mcp
          keep-alive-interval: 0s
```

## Steps to Reproduce

1. Deploy an MCP Server with Streamable HTTP transport (WebMVC, SYNC mode)
2. Have an external MCP client send requests to `POST /mcp` (initialize + tools/call)
3. Client receives the tool response and closes the TCP connection (sends FIN)
4. Repeat with multiple clients (or a single client with retry logic)
5. Observe server-side socket states with `ss -tnp | grep 8080`

## Observed Behavior

After the client closes the connection:

- Server-side socket enters `CLOSE-WAIT` and is **never** closed
- The Tomcat worker thread handling that request is never released back to the pool
- Under load from a single upstream LB doing health-check retries, all 150 Tomcat threads are exhausted within ~30 seconds
- New connections (including K8s readiness probes) queue in the TCP backlog and time out

```
$ ss -tlnp | grep 8080
LISTEN 151    150    *:8080    *:*

$ ss -tnp | grep 8080 | head -5
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:47140
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:42756
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:47446
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:50138
CLOSE-WAIT 115  0  [::ffff:10.125.87.86]:8080  [::ffff:10.125.87.4]:43160

$ ss -tnp | grep 8080 | wc -l
150

$ curl --max-time 5 http://localhost:8080/health
curl: (28) Failed to connect to localhost port 8080: Connection timed out
```

All 150 connections are from the same upstream IP (load balancer), all in CLOSE-WAIT.

## Expected Behavior

When the client closes the TCP connection (sends FIN), the server should:

1. Detect the peer shutdown (e.g., via IOException on write, or checking `SocketChannel.read() == -1`)
2. Close the SSE stream / Reactor Sink associated with that session
3. Remove the session from the internal session map
4. Close the server-side socket
5. Release the Tomcat thread back to the pool

## Root Cause Analysis

The MCP Streamable HTTP transport opens an SSE stream for each session. When the client disconnects:
- The server-side `Sinks.Many` has no subscribers, but the stream is never terminated
- The Servlet async context is never completed
- The socket remains open on the server side (only client sent FIN)
- Tomcat's NIO connector holds the thread waiting for the async context to complete

## Impact

- **Severity: Critical** — renders the server completely unresponsive
- Makes rolling deployments impossible in production (new pods get flooded by retrying clients immediately after startup)
- K8s readiness probes fail → pod marked unhealthy → never enters service
- No automatic recovery — requires pod restart AND stopping upstream traffic simultaneously

## Workaround

Set Tomcat connection timeout to force-close idle connections:

```yaml
server:
  tomcat:
    connection-timeout: 30000
    keep-alive-timeout: 30000
    max-connections: 200
    threads:
      max: 200
```

This allows Tomcat to reclaim CLOSE-WAIT connections after 30 seconds, but is not a proper fix — it just limits the damage window.

## Suggested Fix

The Streamable HTTP transport provider should register a listener for client disconnect events. In the WebMVC integration:

```java
// When setting up the async response for SSE:
asyncContext.addListener(new AsyncListener() {
    @Override
    public void onComplete(AsyncEvent event) {
        cleanupSession(sessionId);
    }
    @Override
    public void onTimeout(AsyncEvent event) {
        cleanupSession(sessionId);
    }
    @Override
    public void onError(AsyncEvent event) {
        cleanupSession(sessionId);
    }
    // ...
});
```

Or detect write failures when attempting to send data to the client and trigger session cleanup.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] McpStreamableServerSession does not close server-side socket when client disconnects, causing CLOSE-WAIT leak and thread pool exhaustion #1021

Description

Environment

Configuration

Steps to Reproduce

Observed Behavior

Expected Behavior

Root Cause Analysis

Impact

Workaround

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] McpStreamableServerSession does not close server-side socket when client disconnects, causing CLOSE-WAIT leak and thread pool exhaustion #1021

Description

Description

Environment

Configuration

Steps to Reproduce

Observed Behavior

Expected Behavior

Root Cause Analysis

Impact

Workaround

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions