Predli Blog - WebMCP Doesn’t Look Revolutionary. That’s Why It Might Be.

Introduction

‍

In February, Google quietly introduced an early preview of WebMCP in Chrome Canary - a new browser capability that allows websites to expose structured actions directly to AI agents. It wasn’t a flashy launch, and most people won’t encounter it unless they deliberately enable preview flags and experiment in controlled environments.

At Predli, we spent time testing these early implementations to understand whether this was just another experimental feature or something more structural. The demos themselves are modest. A conversational interface. A few exposed tools. A response that looks similar to what we’ve seen from tool-enabled assistants before.

It doesn’t feel like a breakthrough. But the longer you experiment with it, the clearer it becomes that the interface is not the story. The shift is happening underneath - in how the browser exposes capabilities, how agents discover them, and how interaction moves from interpreting pages to executing declared actions. Importantly, this shift does not necessarily require rebuilding existing sites, but can begin by exposing the actions that already exist.

‍

The web was built for humans. WebMCP introduces a frontend for agents.

‍

For decades, the web has been designed around human interaction. HTML structures content for readability, CSS shapes visual hierarchy, and JavaScript enables interactions that match how people navigate and interpret information. Even accessibility standards, while essential, are still framed around human needs.

When machines needed to interact with web services, APIs emerged as a parallel interface. They made systems integrable, but they were never designed as a native interface for autonomous agents. APIs assume prior knowledge of endpoints, authentication flows, documentation, and developer intent. In practice, they are a developer interface, not an agent interface.

It’s tempting to frame WebMCP as just another API standard, but that misses what’s changing. APIs expose endpoints; capability layers expose affordances.

As a result, AI systems attempting to operate on the web have often relied on brittle strategies: scraping HTML, parsing unstructured text, inferring possible actions from context, or depending on hardcoded integrations. This has led to an ecosystem of one-off integrations rather than a web that agents can reliably operate on.

WebMCP introduces a different possibility. Instead of forcing agents to interpret pages or rely on bespoke integrations, environments can expose capabilities in a structured way that software can understand and use. Rather than documenting endpoints, they declare affordances.

One way to understand this shift is to think of WebMCP as a parallel frontend - not for humans, but for agents. Where a human sees buttons and forms, an agent sees declared actions and schemas. The UI remains for us, but a structured interaction layer begins to exist alongside it.

This is subtle in demos but meaningful in architecture. Agents no longer need to guess what’s possible or map natural language to undocumented endpoints. They interact with a purpose-built interface that encodes what can be done and how.

‍

Agents have been guessing

‍

Today, most agentic browsing still relies on DOM parsing, screenshots, or simulated clicks. It works, but it is fragile. A small UI change can break an automation flow, and agents spend significant compute trying to interpret interfaces that were never designed for them.

By allowing websites to define actions explicitly, WebMCP replaces guesswork with a contract. That reliability depends on schemas being maintained with the same rigor as APIs. Instead of asking an agent to figure out which button submits a form, the page can declare the action and its parameters. The interaction becomes less about interpreting pixels and more about executing defined capabilities.

This does not remove the human interface. It adds a parallel layer - one that software can use without pretending to be human.

‍

Token efficiency

‍

One of the more immediate technical implications of this shift is its effect on token usage. When agents interact with traditional web pages, they often process large amounts of irrelevant or redundant information: HTML markup, navigation elements, verbose field names, and natural-language instructions. Even structured APIs frequently use payloads designed for developer readability rather than model efficiency.

In agent loops, where context is reconstructed across multiple steps, this overhead compounds. The model repeatedly ingests large contexts, infers structure from loosely defined inputs, and generates verbose outputs that must be parsed downstream. This increases cost, latency, and instability.

Capability-driven interaction changes this dynamic. When an environment exposes machine-readable schemas, the agent no longer needs to interpret full pages or infer structure from natural language. Instead of reading a page to determine what actions are possible, it receives a compact description of available capabilities and their parameters.

In multi-step workflows, this can significantly reduce token usage by eliminating redundant context reconstruction. The gain is not only cost-related. Lower token load improves latency, reduces context overflow risk, and makes planning more stable. The agent spends less time reconstructing the world and more time acting within it.

‍

Our experiment

‍

To understand how much effort agent-readiness actually requires, we built a small demo page and experimented with capability exposure directly in the browser. The page itself was intentionally simple, a mock interface resembling a typical operational workflow, and when we first loaded it, WebMCP detected no usable tools. From the agent’s perspective, it was just another web page: fully functional for humans, but opaque to structured interaction.

Rather than modifying the backend or rebuilding the page, we registered a set of tools directly in the browser using the Model Context API. By declaring actions, defining input schemas, and linking them to existing frontend functions, we were able to expose capabilities that the agent could discover and invoke. The interface did not change visually, and no new UI elements were introduced. Yet from the agent’s perspective, the environment had shifted from something to be interpreted to something it could operate within.

What made this particularly striking was how little was required. The actions we registered were straightforward - creating a campaign, notifying a sales team, resolving a signal - and each was described through a schema that defined the expected parameters. Once registered, the agent could invoke these actions deterministically instead of attempting to infer intent from layout or text. Even without deep technical expertise, it was possible to expose structured actions at runtime, which suggests that agent-readiness may not require a full platform rewrite. In many cases, it may begin with making existing actions explicit.

This small experiment changed how we think about adoption. The transition from human-only interfaces to agent-usable environments does not have to be disruptive. It can start incrementally, by exposing the capabilities that already exist.

‍

Where this matters

‍

While still experimental, the implications are easiest to understand through familiar workflows rather than abstract scenarios. Many of the tasks we automate today rely on brittle scripts that simulate human interaction - clicking buttons, parsing layouts, and navigating interfaces that were never designed for machines. When those interfaces change, the automation breaks.

Consider booking flows. Automating reservations today often involves fragile DOM selectors or visual automation tools. If booking actions were exposed as structured capabilities, an agent could interact with them directly, reducing failure points when interfaces evolve. The interaction would no longer depend on where a button is located, but on whether the action is declared.

The same applies to e-commerce. Agents currently scrape product pages, interpret availability, and navigate checkout flows designed for humans. If product queries, configuration options, and purchase actions were exposed as capabilities, agents could operate within defined constraints rather than attempting to reconstruct intent from markup. The result would be more reliable interactions and fewer edge cases caused by layout changes.

In many scenarios, the goal is not full automation but smoother handoffs. An agent might gather options, prefill forms, or prepare a transaction, while a human reviews and confirms the final step. This makes the handoff more reliable and far less sensitive to interface changes.

Customer portals offer another example. Tasks such as retrieving invoices, updating details, or managing subscriptions are typically buried in layered interfaces. Exposing these actions in a structured way would allow agents to perform them reliably without simulating navigation. This does not introduce new functionality; it makes existing functionality operable.

These examples are not speculative. They describe workflows that already exist, but are currently mediated through interfaces designed exclusively for humans. Capability exposure simply allows those same workflows to be used in a different way.

‍

Reliability through structure

‍

Traditional APIs are static and require documentation. Capability layers are discoverable at runtime. That difference may seem small, but it changes how agents operate.

Instead of relying on a predefined toolset, agents can reason over available capabilities in a given environment. Tool selection becomes a planning task rather than a configuration task. As schemas constrain inputs and outputs, planning becomes more reliable and execution more predictable. Validation can happen before actions are taken, and failure modes become clearer.

These properties are particularly important in enterprise contexts, where uncontrolled automation is not acceptable and actions must be auditable and policy-compliant. They also enable controlled collaboration between humans and agents, where actions can be prepared by software but gated by human review or approval.

‍

Still experimental

‍

At the moment, WebMCP is still experimental. Meaningful testing requires Chrome Canary and preview setups, and the current ecosystem is small. Most examples resemble controlled demos rather than real-world deployments.

That context matters. The tooling is rough, interoperability is limited, and capability exposure is far from standardized.

Still, even in this early state, the direction is clear. The browser starts to look less like a document viewer and more like an execution environment. Instead of just rendering pages, it begins to expose what can be done on a site in a structured way that software can interpret.

The demos may be modest. The interaction model underneath is not.

‍

Beyond SEO

‍

We’ve spent the last two decades optimizing the web so humans can find and understand pages. Search Engine Optimization focuses on visibility, relevance, crawlability, and structure - all in service of human discovery.

Efforts like LLMs.txt focus on making information easier for models to access and understand. WebMCP adds a different layer, enabling agents to take action once that information has been found.

As agents become more capable and more embedded in workflows, another layer of optimization begins to matter: whether an environment can be reliably used once it has been found.

This includes exposing machine-readable capabilities, providing clear schemas, minimizing ambiguity in actions, and designing interactions that are predictable and token-efficient. A page that ranks highly for humans but is opaque to agents may become less useful in agent-mediated workflows. This doesn’t replace SEO. It expands the surface from discovery to operability.

‍

The shift is architectural

‍

WebMCP does not look revolutionary in a demo. The interface is familiar. The flows resemble what we’ve seen before. But infrastructure rarely announces itself with spectacle.

What changes here is the substrate. When environments expose capabilities instead of forcing agents to infer them, when interactions become structured and token-efficient, and when the browser begins to mediate not just rendering but execution, the constraints that have limited AI systems begin to loosen.

The web has long been navigable by humans and integrable by developers. It is now starting to become usable by agents.

That shift is easy to miss at first glance. It becomes harder to ignore the longer you experiment with it.

‍

Learn more