The Agent Protocol Layer
A protocol specification for agent-to-world interaction.
Version 0.1 (Draft) · RFC
1. The Problem
AI agents can think. They can plan, reason, and make decisions. What they cannot do is act.
Every agent today hits the same wall: the last mile between intent and execution. An agent can decide to book a flight, but it has no standardized way to prove who it represents, pay for the ticket, or receive the confirmation. So developers build bespoke integrations for each system, each payment provider, each communication channel. Every connection is custom. Nothing is reusable.
MCP solved part of this. It gave agents a standard way to access tools and data sources. But MCP operates within the model's context: it connects agents to information. What's missing is the layer that connects agents to the real world, the layer where money moves, messages get sent, and actions have consequences.
MCP is the nervous system. The Agent Protocol Layer is the hands.
┌─────────────────────────────────────────────────┐
│ AI Agent │
│ (Claude, GPT, Gemini, local) │
└──────────────────────┬──────────────────────────┘
│
┌────────▼────────┐
│ MCP │ ← Tools, data, context
└────────┬────────┘
│
┌────────────▼────────────┐
│ Agent Protocol Layer │ ← Identity, payments,
│ (this spec) │ permissions, comms,
└────────────┬────────────┘ actions
│
┌──────────┬───────┼───────┬──────────┐
▼ ▼ ▼ ▼ ▼
Banks Email APIs Forms Databases
SMS/WhatsApp Gov Systems2. The Five Primitives
Every agent-to-world interaction reduces to five primitives. They are orthogonal (each solves one problem), composable (they combine for complex workflows), and universal (they apply regardless of agent framework or provider).
2.1 Identity
Problem: How does a system know that an agent acts on behalf of a specific person or organization?
An agent presents a signed identity token that binds three things: the principal (the human or org), the agent (the software instance), and the scope (what the agent is authorized to do). Systems verify the token without contacting the principal.
AgentIdentityToken {
principal: "did:web:federicodeponte.com"
agent: "did:agent:openchat-v4:instance-8f3a"
scope: ["payments:read", "payments:write:max_500_eur"]
issued: "2026-02-27T10:00:00Z"
expires: "2026-02-27T22:00:00Z"
signature: <principal_private_key_signature>
}Built on DIDs (Decentralized Identifiers, W3C standard). No centralized identity provider required.
2.2 Permissions
Problem: What can an agent do, and what is off-limits?
Permissions are declarative, granular, and revocable in real-time. A principal defines a permission policy. The agent carries it. Any receiving system can verify it.
PermissionPolicy {
allow: [
"email:send:to=*@company.com",
"calendar:read",
"payments:execute:max_amount=200,currency=EUR"
]
deny: [
"email:send:to=*@competitor.com",
"data:delete:*"
]
requires_confirmation: [
"payments:execute:amount>100"
]
ttl: "8h"
}The requires_confirmation field introduces a human-in-the-loop checkpoint. Agents operate autonomously within bounds, but flag high-stakes actions for approval.
2.3 Payments
Problem: How does an agent spend money within defined limits?
Agents need to transact, not just request transactions. The protocol defines a payment intent that any payment processor can fulfill, without the agent ever touching raw credentials.
PaymentIntent {
from: <agent_identity_token>
to: "merchant:stripe:acct_1234"
amount: 49.99
currency: "EUR"
purpose: "Flight booking LH1234, 2026-03-15"
policy_ref: <permission_policy_hash>
max_amount: 200.00
}The payment processor verifies the agent's identity, checks the permission policy, and executes if everything passes. The agent never sees card numbers, bank accounts, or credentials. Payment processors implement a standard interface; agents don't need to know which one is behind it.
2.4 Communication
Problem: How does an agent send and receive messages across channels?
An agent shouldn't need a different integration for email, SMS, WhatsApp, and USSD. The protocol defines a unified message envelope. Channel adapters handle delivery.
MessageEnvelope {
from: <agent_identity_token>
to: "channel:email:user@example.com"
body: "Your flight LH1234 is confirmed for March 15."
reply_to: "channel:webhook:https://agent.example.com/inbox"
channel_preference: ["whatsapp", "sms", "email"]
metadata: {
thread_id: "booking-8f3a-flight"
priority: "normal"
}
}The channel_preference array lets agents try the best channel first and fall back gracefully. The reply_to field lets any system send structured responses back to the agent, regardless of the originating channel.
2.5 Actions
Problem: How does an agent interact with systems that weren't built for agents?
Most of the world's systems expose forms, not APIs. Government portals, university applications, insurance claims, booking systems. Actions separate intent from execution: the agent declares what it wants to do, and a registered action adapter for that target handles how.
ActionRequest {
target: "https://portal.gov.example/visa-application"
type: "form_submission"
identity: <agent_identity_token>
fields: {
"full_name": "Federico De Ponte",
"passport_no": "<from_secure_vault>",
"purpose": "Business"
}
attachments: ["doc:vault:passport_scan_2026"]
confirm_before_submit: true
}Action adapters are the protocol's extension point. Anyone can register an adapter for a target system. The adapter registry maps target URLs to known adapters, with versioning and trust scores:
AdapterRegistry {
"portal.gov.example/visa-application": {
adapter: "did:adapter:gov-forms-eu:v2.1"
method: "browser_automation"
trust: "verified" // community-verified, auditable
last_tested: "2026-02-20"
}
}This is the hardest primitive. Browser automation is fragile; forms change without notice; CAPTCHAs exist. The protocol doesn't pretend to solve all of this. What it does is standardize the interface so that solutions are reusable. An adapter that handles German government portals today works for every agent tomorrow, not just the one that built it.
3. How They Compose
The primitives are designed to combine. Here's a complete workflow.
You tell your agent to book the cheapest Berlin-to-Lisbon flight on March 15. The agent presents its identity token to the airline's booking system. The airline verifies: this agent represents you, it's authorized to act. The agent finds a €49.99 fare, checks its own permission policy (max €500, no confirmation needed under €200), and submits the booking through a registered adapter that handles the airline's form, seat selection, passenger details, terms acceptance. The airline returns a payment intent. The agent's payment gateway fulfills it through Stripe; the agent never sees your card number. Transaction is logged with a policy reference for your audit trail. Thirty seconds later, you get a WhatsApp message with your booking confirmation and receipt.
No custom integration. The airline accepted a standard identity token. The agent used a registered adapter. The payment went through a standard gateway. Five primitives, one workflow, every step auditable.
Same protocol, different world
A farmer in Kenya has an agent negotiate seed prices across three suppliers.
- Identity: cooperative membership credential
- Permissions: max 50,000 KES per transaction
- Action: adapter for supplier's USSD ordering system
- Payment: M-Pesa, not Stripe
- Comms: confirmation via SMS, not WhatsApp
Different adapters. Same five primitives. Same protocol.
The farmer didn't need someone to build an app for her. She needed the protocol to exist.
4. Security Model
A protocol that moves money and submits government forms on someone's behalf requires an explicit threat model.
Token security. Identity tokens are short-lived (default: 12 hours), scoped to specific capabilities, and signed with the principal's private key. A stolen token is usable only within its scope and TTL. Tokens include a jti (unique ID) for revocation; principals can revoke any active token through their identity provider.
Replay protection. Every request includes a monotonic nonce and timestamp. Receiving systems reject duplicate nonces and requests older than a configurable window (default: 5 minutes). Payment intents additionally require idempotency keys, so a replayed payment request returns the original result, not a duplicate charge.
Adapter trust. Action adapters run code against third-party websites. The adapter registry uses a trust model with three tiers: unverified (anyone can publish), community (peer-reviewed, open source), verified (audited by a recognized security firm). Agents can set minimum trust levels in their permission policy. High-stakes actions (payments, legal submissions) default to verified adapters only.
Credential isolation. Agents never access raw credentials. Payments go through tokenized gateways. Form fields marked as sensitive (passport numbers, SSNs) are retrieved at execution time from an encrypted vault; the agent sees a reference, not the value. The adapter receives the decrypted value in a sandboxed runtime that prevents exfiltration.
Audit trail. Every primitive interaction produces a signed log entry: who (identity), what (action), when (timestamp), result (success/failure), and policy_ref (which permission authorized it). The principal can audit their agent's full activity history.
5. What Exists vs. What's Missing
| Capability | Exists Today | Gap |
|---|---|---|
| Agent-to-tool | MCP, function calling | Solved. Not the problem. |
| Agent identity | Nothing standard | No way for an agent to prove delegation |
| Agent payments | Manual API keys per provider | No delegated, scoped, auditable payment |
| Agent comms | Twilio, SendGrid (human-oriented) | No unified agent-native message envelope |
| Agent permissions | Per-platform, bespoke | No portable, verifiable permission policy |
| Agent actions | Browser automation (fragile, siloed) | No shared adapter registry or standard intent format |
The tools exist in pieces. Email APIs, payment processors, identity providers. What's missing is the protocol that unifies them under a single, agent-native interface. Today's integrations are human-oriented: they assume a person is clicking buttons, entering credentials, confirming transactions. The Agent Protocol Layer makes these interactions first-class for software agents.
6. Why Open
The usual argument for open protocols is interoperability. That's true but insufficient. The real argument is about who agents can serve.
A proprietary protocol has a business model problem: it needs to charge for access. That means gatekeeper economics. Every adapter, every payment rail, every communication channel becomes a revenue opportunity for the protocol owner. The result: agents work well for enterprises that can afford the connectors, and poorly for everyone else.
This matters because agent capabilities are about to become commodity. The models are converging. Within two years, the differentiator between agent platforms will not be intelligence, it will be reach: how many systems can your agent interact with? A proprietary protocol fragments reach by design. An open protocol maximizes it.
The protocol spec and reference implementations are Apache 2.0. The business model sits above the protocol: managed infrastructure for teams that don't want to run their own identity resolvers and payment gateways, plus enterprise compliance tooling. The protocol gets adopted because it's free. The company gets paid because operating infrastructure is work most teams don't want to do.
7. Next Steps
This is a draft specification. The primitives are defined; the implementations are not.
What's needed
- 1. Reference implementation of Identity + Permissions (the foundation everything else builds on)
- 2. Adapter registry with at least three verified adapters as proof of concept
- 3. Payment gateway for two providers (Stripe + one mobile money provider)
- 4. Communication gateway for email + one messaging platform
- 5. SDK in Python and TypeScript
- 6. Security audit of the identity, permission, and credential isolation model
The protocol works when the farmer in Kenya and the enterprise in Frankfurt use the same five primitives. That only happens if it's open, and if it's built by more than one person.
Licensed under Apache 2.0