Protocol8 min readFebruary 27, 2026

The Agent Protocol Layer

A protocol specification for agent-to-world interaction.

Version 0.1 (Draft) · RFC

1. The Problem

AI agents can think. They can plan, reason, and make decisions. What they cannot do is act.

Every agent today hits the same wall: the last mile between intent and execution. An agent can decide to book a flight, but it has no standardized way to prove who it represents, pay for the ticket, or receive the confirmation. So developers build bespoke integrations for each system, each payment provider, each communication channel. Every connection is custom. Nothing is reusable.

MCP solved part of this. It gave agents a standard way to access tools and data sources. But MCP operates within the model's context: it connects agents to information. What's missing is the layer that connects agents to the real world, the layer where money moves, messages get sent, and actions have consequences.

MCP is the nervous system. The Agent Protocol Layer is the hands.

┌─────────────────────────────────────────────────┐
│                   AI Agent                       │
│         (Claude, GPT, Gemini, local)             │
└──────────────────────┬──────────────────────────┘
                       │
              ┌────────▼────────┐
              │       MCP       │  ← Tools, data, context
              └────────┬────────┘
                       │
          ┌────────────▼────────────┐
          │  Agent Protocol Layer   │  ← Identity, payments,
          │   (this spec)           │    permissions, comms,
          └────────────┬────────────┘    actions
                       │
    ┌──────────┬───────┼───────┬──────────┐
    ▼          ▼       ▼       ▼          ▼
  Banks    Email    APIs    Forms    Databases
         SMS/WhatsApp    Gov Systems

2. The Five Primitives

Every agent-to-world interaction reduces to five primitives. They are orthogonal (each solves one problem), composable (they combine for complex workflows), and universal (they apply regardless of agent framework or provider).

2.1 Identity

Problem: How does a system know that an agent acts on behalf of a specific person or organization?

An agent presents a signed identity token that binds three things: the principal (the human or org), the agent (the software instance), and the scope (what the agent is authorized to do). Systems verify the token without contacting the principal.

AgentIdentityToken {
  principal:   "did:web:federicodeponte.com"
  agent:       "did:agent:openchat-v4:instance-8f3a"
  scope:       ["payments:read", "payments:write:max_500_eur"]
  issued:      "2026-02-27T10:00:00Z"
  expires:     "2026-02-27T22:00:00Z"
  signature:   <principal_private_key_signature>
}

Built on DIDs (Decentralized Identifiers, W3C standard). No centralized identity provider required.

2.2 Permissions

Problem: What can an agent do, and what is off-limits?

Permissions are declarative, granular, and revocable in real-time. A principal defines a permission policy. The agent carries it. Any receiving system can verify it.

PermissionPolicy {
  allow: [
    "email:send:to=*@company.com",
    "calendar:read",
    "payments:execute:max_amount=200,currency=EUR"
  ]
  deny: [
    "email:send:to=*@competitor.com",
    "data:delete:*"
  ]
  requires_confirmation: [
    "payments:execute:amount>100"
  ]
  ttl: "8h"
}

The requires_confirmation field introduces a human-in-the-loop checkpoint. Agents operate autonomously within bounds, but flag high-stakes actions for approval.

2.3 Payments

Problem: How does an agent spend money within defined limits?

Agents need to transact, not just request transactions. The protocol defines a payment intent that any payment processor can fulfill, without the agent ever touching raw credentials.

PaymentIntent {
  from:        <agent_identity_token>
  to:          "merchant:stripe:acct_1234"
  amount:      49.99
  currency:    "EUR"
  purpose:     "Flight booking LH1234, 2026-03-15"
  policy_ref:  <permission_policy_hash>
  max_amount:  200.00
}

The payment processor verifies the agent's identity, checks the permission policy, and executes if everything passes. The agent never sees card numbers, bank accounts, or credentials. Payment processors implement a standard interface; agents don't need to know which one is behind it.

2.4 Communication

Problem: How does an agent send and receive messages across channels?

An agent shouldn't need a different integration for email, SMS, WhatsApp, and USSD. The protocol defines a unified message envelope. Channel adapters handle delivery.

MessageEnvelope {
  from:        <agent_identity_token>
  to:          "channel:email:user@example.com"
  body:        "Your flight LH1234 is confirmed for March 15."
  reply_to:    "channel:webhook:https://agent.example.com/inbox"
  channel_preference: ["whatsapp", "sms", "email"]
  metadata: {
    thread_id:  "booking-8f3a-flight"
    priority:   "normal"
  }
}

The channel_preference array lets agents try the best channel first and fall back gracefully. The reply_to field lets any system send structured responses back to the agent, regardless of the originating channel.

2.5 Actions

Problem: How does an agent interact with systems that weren't built for agents?

Most of the world's systems expose forms, not APIs. Government portals, university applications, insurance claims, booking systems. Actions separate intent from execution: the agent declares what it wants to do, and a registered action adapter for that target handles how.

ActionRequest {
  target:      "https://portal.gov.example/visa-application"
  type:        "form_submission"
  identity:    <agent_identity_token>
  fields: {
    "full_name":     "Federico De Ponte",
    "passport_no":   "<from_secure_vault>",
    "purpose":       "Business"
  }
  attachments: ["doc:vault:passport_scan_2026"]
  confirm_before_submit: true
}

Action adapters are the protocol's extension point. Anyone can register an adapter for a target system. The adapter registry maps target URLs to known adapters, with versioning and trust scores:

AdapterRegistry {
  "portal.gov.example/visa-application": {
    adapter:    "did:adapter:gov-forms-eu:v2.1"
    method:     "browser_automation"
    trust:      "verified"      // community-verified, auditable
    last_tested: "2026-02-20"
  }
}

This is the hardest primitive. Browser automation is fragile; forms change without notice; CAPTCHAs exist. The protocol doesn't pretend to solve all of this. What it does is standardize the interface so that solutions are reusable. An adapter that handles German government portals today works for every agent tomorrow, not just the one that built it.

3. How They Compose

The primitives are designed to combine. Here's a complete workflow.

You tell your agent to book the cheapest Berlin-to-Lisbon flight on March 15. The agent presents its identity token to the airline's booking system. The airline verifies: this agent represents you, it's authorized to act. The agent finds a €49.99 fare, checks its own permission policy (max €500, no confirmation needed under €200), and submits the booking through a registered adapter that handles the airline's form, seat selection, passenger details, terms acceptance. The airline returns a payment intent. The agent's payment gateway fulfills it through Stripe; the agent never sees your card number. Transaction is logged with a policy reference for your audit trail. Thirty seconds later, you get a WhatsApp message with your booking confirmation and receipt.

No custom integration. The airline accepted a standard identity token. The agent used a registered adapter. The payment went through a standard gateway. Five primitives, one workflow, every step auditable.

Same protocol, different world

A farmer in Kenya has an agent negotiate seed prices across three suppliers.

Identity: cooperative membership credential
Permissions: max 50,000 KES per transaction
Action: adapter for supplier's USSD ordering system
Payment: M-Pesa, not Stripe
Comms: confirmation via SMS, not WhatsApp

Different adapters. Same five primitives. Same protocol.

The farmer didn't need someone to build an app for her. She needed the protocol to exist.

4. Security Model

A protocol that moves money and submits government forms on someone's behalf requires an explicit threat model.

Token security. Identity tokens are short-lived (default: 12 hours), scoped to specific capabilities, and signed with the principal's private key. A stolen token is usable only within its scope and TTL. Tokens include a jti (unique ID) for revocation; principals can revoke any active token through their identity provider.

Replay protection. Every request includes a monotonic nonce and timestamp. Receiving systems reject duplicate nonces and requests older than a configurable window (default: 5 minutes). Payment intents additionally require idempotency keys, so a replayed payment request returns the original result, not a duplicate charge.

Adapter trust. Action adapters run code against third-party websites. The adapter registry uses a trust model with three tiers: unverified (anyone can publish), community (peer-reviewed, open source), verified (audited by a recognized security firm). Agents can set minimum trust levels in their permission policy. High-stakes actions (payments, legal submissions) default to verified adapters only.

Credential isolation. Agents never access raw credentials. Payments go through tokenized gateways. Form fields marked as sensitive (passport numbers, SSNs) are retrieved at execution time from an encrypted vault; the agent sees a reference, not the value. The adapter receives the decrypted value in a sandboxed runtime that prevents exfiltration.

Audit trail. Every primitive interaction produces a signed log entry: who (identity), what (action), when (timestamp), result (success/failure), and policy_ref (which permission authorized it). The principal can audit their agent's full activity history.

5. What Exists vs. What's Missing

Capability	Exists Today	Gap
Agent-to-tool	MCP, function calling	Solved. Not the problem.
Agent identity	Nothing standard	No way for an agent to prove delegation
Agent payments	Manual API keys per provider	No delegated, scoped, auditable payment
Agent comms	Twilio, SendGrid (human-oriented)	No unified agent-native message envelope
Agent permissions	Per-platform, bespoke	No portable, verifiable permission policy
Agent actions	Browser automation (fragile, siloed)	No shared adapter registry or standard intent format

The tools exist in pieces. Email APIs, payment processors, identity providers. What's missing is the protocol that unifies them under a single, agent-native interface. Today's integrations are human-oriented: they assume a person is clicking buttons, entering credentials, confirming transactions. The Agent Protocol Layer makes these interactions first-class for software agents.

6. Why Open

The usual argument for open protocols is interoperability. That's true but insufficient. The real argument is about who agents can serve.

A proprietary protocol has a business model problem: it needs to charge for access. That means gatekeeper economics. Every adapter, every payment rail, every communication channel becomes a revenue opportunity for the protocol owner. The result: agents work well for enterprises that can afford the connectors, and poorly for everyone else.

This matters because agent capabilities are about to become commodity. The models are converging. Within two years, the differentiator between agent platforms will not be intelligence, it will be reach: how many systems can your agent interact with? A proprietary protocol fragments reach by design. An open protocol maximizes it.

The protocol spec and reference implementations are Apache 2.0. The business model sits above the protocol: managed infrastructure for teams that don't want to run their own identity resolvers and payment gateways, plus enterprise compliance tooling. The protocol gets adopted because it's free. The company gets paid because operating infrastructure is work most teams don't want to do.

7. Next Steps

This is a draft specification. The primitives are defined; the implementations are not.

What's needed

1. Reference implementation of Identity + Permissions (the foundation everything else builds on)
2. Adapter registry with at least three verified adapters as proof of concept
3. Payment gateway for two providers (Stripe + one mobile money provider)
4. Communication gateway for email + one messaging platform
5. SDK in Python and TypeScript
6. Security audit of the identity, permission, and credential isolation model

The protocol works when the farmer in Kenya and the enterprise in Frankfurt use the same five primitives. That only happens if it's open, and if it's built by more than one person.

Licensed under Apache 2.0

Why Claude Code is my new best friend →Clawdbot killed my Hinge date →

All posts