Why Stateful Continuations in AI Agents Make Transport Layers Critical: HTTP vs WebSocket Benchmarks

Anirudh Mendiratta explores how stateful continuation can dramatically improve performance for agentic coding workflows compared to stateless HTTP APIs.

Apr. 11, 2026 at 2:57pm

A highly detailed, three-dimensional illustration depicting various cybernetic hardware components glowing with neon cyan and magenta lights, representing the advanced technology infrastructure that enables efficient AI-powered coding workflows.Glowing digital infrastructure powers the next generation of AI-driven coding workflows, optimizing for speed and efficiency.Chicago Today

Anirudh Mendiratta examines how the transport layer has become a critical concern for AI agents that engage in multi-turn, tool-heavy workflows. He finds that stateless HTTP APIs suffer from linear payload growth as conversation context accumulates, while stateful WebSocket APIs that cache context server-side can reduce client-sent data by over 80% and improve end-to-end execution time by 15-29%. The performance benefits scale with workflow complexity, making transport layer architecture a first-order concern for the next generation of AI coding assistants.

Why it matters

As AI coding agents become a daily workflow tool for many organizations, the transport layer has emerged as a critical factor in their performance. Stateless HTTP APIs struggle to handle the growing context of agentic coding loops, leading to ballooning payloads and latency. Stateful WebSocket designs that cache conversation history on the server can dramatically reduce overhead, especially for complex workflows involving many tool calls. This architectural innovation has implications beyond a single protocol, highlighting the importance of minimizing redundant data transmission for AI agents at scale.

The details

Anirudh Mendiratta's benchmarks show that WebSocket mode for OpenAI's Responses API can reduce the client-sent data per agentic coding task by 82% compared to the standard HTTP API. This is because WebSocket maintains server-side state, allowing the client to only send incremental updates rather than the full conversation history on each turn. For a typical 10-turn coding task with GPT-5.4, HTTP sent 176KB per task while WebSocket sent just 32KB. This translated to a 29% faster end-to-end execution time. The effect was model-independent, with GPT-4o-mini also seeing a 15% speed improvement via WebSocket. The key innovation is the stateful architecture, not the WebSocket protocol itself - any approach that avoids retransmitting the full context can achieve similar gains.

  • In February 2026, OpenAI introduced WebSocket mode for their Responses API.
  • OpenAI reports over 1.6 million weekly active users on Codex as of March 2026.

The players

OpenAI

An artificial intelligence research company that has developed influential language models like GPT and Codex, which power many AI coding assistants.

Anirudh Mendiratta

The author of the article, who conducted benchmarks comparing HTTP and WebSocket performance for agentic coding workflows.

Got photos? Submit your photos here. ›

What’s next

As more AI coding agents adopt WebSocket or similar stateful continuation approaches, it will be interesting to see how the performance benefits scale across the industry and impact overall developer productivity.

The takeaway

The transport layer has emerged as a critical architectural concern for the next generation of AI coding assistants. Stateful continuation designs that avoid redundant context transmission can deliver significant performance improvements, especially for complex agentic workflows involving many tool calls. This innovation highlights the importance of minimizing data overhead as AI agents become ubiquitous in developer workflows.