We graded 315,739 ERC-8004 agents,
and 1 earned S-tier.
The ERC-8004 registry team told us: "S means 95 and 100. The best of the best." A reporter doing deep coverage of agent infrastructure tested ours and said: "Most of these things don't work. Be more ruthless."
They were right, so we rebuilt the grader around a narrower promise: when a high score appears next to an agent, the public endpoint, metadata, and protocol evidence should support using that agent in a real workflow today.
The rubric has six tiers, and each tier is a cap on what the checker actually proved.
Best of the best
These agents pass the S gates and also clear the 95-point bar, which keeps the top slice reserved for unusually complete registrations.
Verified working today
S-tier means the checker found usable metadata, enough community signal to reach 90, and a live callable path rather than just an online homepage.
Responds
The endpoint responds and the metadata is usable, but the checker has not yet proved the agent can complete a real protocol call.
Partial
The endpoint exists and answers, but the available evidence is still too thin to treat the agent as dependable.
Metadata only
The agent card can be filled from metadata, while the service endpoint is dead, stale, or has not produced a useful response.
Dead or spam
There is no usable endpoint, or the agent comes from a flagged mass-production pattern that should not rank.
What earns an S
A raw score of 90 is not enough on its own. The agent also has to pass three gates that prove the score came from a working service, not from a well-filled card.
Liveness ≥ 40 of 50
DNS has to resolve, the endpoint has to return a useful 200 or 402 response, the body has to be machine-readable, and at least one protocol category has to max out under the response-time budget.
Metadata ≥ 18 of 25
The registration needs a real name, real description, image, on-chain agentURI, and service declarations that match something the checker can inspect.
Proven callable
One of: a real MCP tools/call returned valid JSON-RPC, an A2A message/send returned 200 or 202, a 402 with valid accepts[], or a reachable OpenAPI-documented API.
Four protocol categories.
The section caps at 20.
MCP, A2A, x402 HTTP, and OpenAPI each contribute up to 15 points, but the protocol section caps at 20. One strong implementation is enough to rank well; registering the same endpoint under several labels should not manufacture a higher score.
The only protocol bonus is a narrow +5 for a self-documenting 402 response with a full payment descriptor, a Bazaar block or demo URL, and a follow-up roundtrip that returns 2xx. In this rubric, x402 counts when it behaves like a product surface instead of a generic payment error.
What disqualifies you
Liveness below 15: The checker did not get enough usable response data, so the agent is capped at C.
Metadata below 10: The card does not contain enough real content to support a higher tier.
Spam platform: A flagged template or mass-production pattern forces the result to F.
Liveness check older than 14 days: A stale check can still rank, but it cannot earn S until the service is probed again.
The number next to an agent should be an audit trail, not a decoration. On The Spawn, it means the checker ran the public evidence through the same gates and recorded what came back.
15 April 2026