The Agentic Cloud: How Infrastructure Is Being Rewired for AI Agents

The cloud was built for a world where an app receives a request, processes it, and sends a response. AI agents do none of those things tidily. They run for hours, maintain state across dozens of tool calls, spawn sub-tasks in parallel, and persist across sessions that resume days later. Two April 2026 announcements — Cloudflare's Agents Week and OpenChoreo's CNCF-accepted 1.0 — mark the point where infrastructure began catching up to that reality.

The Request-Response Cloud Hits Its Limits

The standard cloud compute model — spin up a container, handle a request, tear it down — works well for stateless web applications. It breaks when workloads need to persist across multiple tool invocations, hold conversation context over hours, or manage identity and credentials across different services. These are not edge cases for agents; they are the default behavior.

At Cloudflare's first Agents Week (April 13–20, 2026), CTO Dane Knecht and VP of Product Rita Kozlov framed the scale directly: if even a fraction of the world's knowledge workers each run a few agents in parallel, the infrastructure must handle tens of millions of simultaneous sessions. The one-app-serves-many-users model that cloud platforms were built on does not map to that demand pattern.

The same week, Microsoft shipped Agent Framework 1.0 for .NET and Python, AWS announced new Bedrock AgentCore features, and OpenChoreo — an open-source internal developer platform built on Kubernetes — entered the CNCF Sandbox with its 1.0 release. These are not isolated product launches. They are coordinate moves toward a different compute substrate.

Cloudflare's Agent Primitives: Six Products, One Thesis

Cloudflare's announcements during Agents Week were organized around a single thesis: agents need a different kind of compute, and the containerless, serverless platform the company built for Workers over eight years happens to be well-suited for it. Here is what shipped.

Product	What It Does	Agent Problem Solved
Sandboxes (GA)	Persistent, isolated environments with a real shell, filesystem, and background processes	Agents need a computer that stays on and picks up where it left off — not a fresh container per invocation
Dynamic Workers	Execute AI-generated code in sandboxed isolates with millisecond startup	Code agents write at runtime needs safe, fast execution without provisioning
Artifacts	Git-compatible versioned storage for agent code and data (tens of millions of repos)	Agents that write code need a home for that code, with version control built in
Durable Object Facets	Each dynamically-created Worker gets its own isolated SQLite database	Agent-generated apps need stateful persistence without shared database contention
Outbound Workers for Sandboxes	Programmable, zero-trust egress proxy injecting credentials without exposing tokens to untrusted code	Agents that call external APIs need identity-aware routing that does not leak credentials
Workflows (rearchitected)	Durable, resumable multi-step execution with built-in retry and state management	Agent pipelines that span many steps need fault tolerance without manual orchestration

What connects these is not a buzzword. It is a shift from ephemeral, stateless compute to persistent, identity-aware, stateful compute — exactly what agents require. Sandboxes give agents a machine that stays on. Artifacts give them a Git-native workspace. Durable Object Facets give each dynamically-spawned app its own database. Outbound Workers give them zero-trust networking. The pieces fit together as a stack, not a bundle.

OpenChoreo 1.0: Platform Engineering Meets the Agentic Era

While Cloudflare rebuilds the compute layer, OpenChoreo addresses a different problem: how teams manage and deploy agent workloads on Kubernetes. OpenChoreo, the open-source evolution of WSO2's Choreo SaaS platform, shipped its 1.0 release and was accepted into the CNCF Sandbox in January 2026 — less than a year after its first commit. The project now has 785 contributors across 240 organizations and 694 GitHub stars.

OpenChoreo's architecture is organized around four planes, each with a specific responsibility:

Experience Plane: A Backstage-powered developer portal, CLI, GitOps interface, and — notably — MCP servers that let AI agents interact with the platform as first-class participants. Agents can create and deploy components, manage configurations, and reason about platform state.
Control Plane: Translates high-level abstractions (components, APIs, environments, pipelines, namespaces) into Kubernetes manifests. Programmable without forking or writing low-level controllers.
Data Plane: Enforces isolation between projects, traffic policies, and security boundaries. These are guarantees the platform makes, not configurations teams must manage.
Observability Plane: Feeds metrics, logs, and traces back through the same abstractions developers already understand, not raw Kubernetes primitives.

The key differentiator is the built-in SRE agent, which uses LLMs to analyze logs, metrics, and traces and surface likely root causes. This is not an add-on to an existing platform. The platform was designed to treat AI agents — both the ones developers use and the ones it runs — as first-class participants from day one.

Sameera Jayasoma, Distinguished Engineer at WSO2, put it plainly: "OpenChoreo is being built to treat AI agents as first-class participants." The project enters a space alongside KubriX (launched August 2025), which takes a similar approach of assembling established tools (Argo CD, Backstage, Kyverno) into a ready-to-use internal developer platform. The competition signals that the market for Kubernetes-based IDPs with agent integration is real, not hypothetical.

The Problem These Products Are Solving

Both Cloudflare and OpenChoreo are responding to production failures that existing infrastructure was not designed to handle. A VentureBeat investigation published April 25, 2026, documented four recurring failure patterns in enterprise AI deployments that standard monitoring does not catch:

Context degradation: The model reasons over stale or incomplete retrieval data. The output looks polished, but the grounding is gone. Detection happens weeks later — through downstream consequences, not system alerts.
Orchestration drift: Agentic pipelines fail not because one component breaks, but because the sequence of interactions between retrieval, inference, tool use, and action diverges under real-world load.
Silent partial failure: One component underperforms without crossing an alert threshold. The system degrades behaviorally before it degrades operationally. By the time the signal reaches a postmortem, the erosion has been happening for weeks.
Automation blast radius: A single misinterpretation propagates across steps, systems, and business decisions. The cost is organizational, not just technical.

These failures share a common root: the infrastructure was built to answer the question "is the service up?" rather than "is the service behaving correctly?" That gap is precisely what the agentic cloud products are trying to close — by giving agents persistent state (Sandboxes, Durable Objects), identity-aware networking (Outbound Workers), and resumable, fault-tolerant execution (Workflows).

Exceptions: When the Old Cloud Still Wins

Not every workload needs agentic infrastructure. Three scenarios where the traditional model remains the better fit:

Scenario	Why Traditional Cloud Suffices	Where the Line Blurs
Single-turn API calls (chatbots, summarization, translation)	Stateless request-response is exactly right. No persistence needed.	If the chatbot accumulates context over a session, it drifts into agentic territory.
Batch inference jobs (nightly scoring, bulk classification)	Finite duration, no interactivity, no tool calls. Containers handle this well.	If the batch pipeline spawns sub-agents or calls external tools, it needs orchestration.
Small teams, early experiments	Overhead of agentic infrastructure (MCP servers, SRE agents, identity proxies) is not justified at low scale.	Once agents touch production data or external APIs, credential management becomes necessary regardless of scale.

The agentic cloud is not a replacement for existing compute. It is a complementary layer that activates when workloads cross from stateless request handling into persistent, multi-step, identity-bound execution.

Decision Framework: Which Agentic Infrastructure Fits

Decision Point	Cloudflare Agent Cloud	OpenChoreo + Kubernetes	Traditional Cloud (Containers/Serverless)
Team already on Kubernetes	Adopt as supplementary edge layer	Strong fit — drop-in IDP with agent integration	Keep for existing workloads
Agent needs full OS (install packages, run terminal)	Sandboxes provide exactly this	Limited — relies on standard Kubernetes pods	Requires VMs or privileged containers
Agent generates code dynamically	Dynamic Workers + Artifacts (Git-compatible storage)	CI Plane with Argo Workflows handles build/test	No native solution — custom CI/CD needed
Agent calls external APIs with credentials	Outbound Workers (zero-trust egress proxy)	Service mesh + secrets management (Kyverno, Vault)	Standard secrets injection, limited runtime isolation
Need multi-step, fault-tolerant pipelines	Workflows (durable, resumable)	Argo Workflows + GitOps reconciliation	Custom orchestration (Step Functions, Temporal)
Platform team wants Backstage integration	Not natively provided	Built-in (OpenChoreo console = Backstage)	Manual Backstage setup and maintenance
Want SRE agent for root cause analysis	Not provided out of the box	Built-in LLM-powered SRE agent	Manual observability stack setup

What Comes Next

This is not a prediction. It is an observation of what has already shipped. Cloudflare's Agents Week produced six production primitives for agent workloads. OpenChoreo entered CNCF with a platform that treats agents as first-class participants. Microsoft, AWS, and Google all released agent framework updates in the same month. The infrastructure layer is being rewired.

Three things to track over the next quarter:

Behavioral telemetry: The VentureBeat piece identified a monitoring gap: current tools answer "is the service up?" but not "is it behaving correctly?" Expect new observability products that track context integrity, orchestration fidelity, and semantic drift as first-class metrics.
Agent identity and credential management: Cloudflare's Outbound Workers solve this at the proxy layer. Expect the service-mesh ecosystem (Istio, Linkerd) to add similar identity-aware egress capabilities.
OpenChoreo vs. KubriX convergence: Both are Kubernetes IDPs with agent integration. Both use Backstage. The market is too small for two dominant players. Watch for consolidation or clear differentiation.

The cloud was designed for applications that serve users. Agents are not users, and they do not behave like applications. The infrastructure that recognizes that difference — with persistent compute, identity-aware networking, and built-in state management — is the infrastructure that will run the next generation of production workloads.

The Agentic Cloud: How Infrastructure Is Being Rewired for AI Agents

The Request-Response Cloud Hits Its Limits

Cloudflare's Agent Primitives: Six Products, One Thesis

OpenChoreo 1.0: Platform Engineering Meets the Agentic Era

The Problem These Products Are Solving

Exceptions: When the Old Cloud Still Wins

Decision Framework: Which Agentic Infrastructure Fits

What Comes Next

Topics

More

Follow

The Request-Response Cloud Hits Its Limits

Cloudflare's Agent Primitives: Six Products, One Thesis

OpenChoreo 1.0: Platform Engineering Meets the Agentic Era

The Problem These Products Are Solving

Exceptions: When the Old Cloud Still Wins

Decision Framework: Which Agentic Infrastructure Fits

What Comes Next

Related Stories

Spec-Driven Development: How AGENTS.md Makes AI Coding Reliable

Cloud Misconfigurations in AI Workloads: Exposed Vectors

AI Agent Traps: The Web Itself as Attack Vector

Topics

More

Follow