Testing Index
Empirical measurement of what happens between “agent fetches URL” and “user sees output” — retrieval mechanism behavior, content transformation, architectural constraints — where the fetch-to-output pipeline may pass through multiple opaque layers and for platforms that don’t document these details. Implements a two-track approach: interpreted captures model self-perception and output variance, and raw produces citable data for the Agent Docs Spec.
Blogs
| Post | Focus |
|---|---|
| Field Notes from a Yelper: Guerrilla Testing Agents | Methodology evolution: what broke, what changed, and letting data lead |
Testing Documentation Structure
| Section | Purpose |
|---|---|
| Methodology | Testing approach details and constraints |
| Interpreted vs Raw | Two-track values and measurements |
| Findings: Interpreted | What the model reports vs what it received, run variation |
| Findings: Raw | Metrics extracted programmatically - reproducible, spec-ready |
| Friction Note | Known issues, gaps, or edge cases encountered during testing |
Results Summary
| Platform | Key Finding | Focus |
|---|---|---|
| Anthropic Claude API | Character-based truncation at ~100KB of rendered content |
Baseline reference; establishing two-track methodology |
| Anysphere Cursor | Agent-routed fetch with undocumented truncation - 28KB–240KB+; high cross-session variance |
Reverse-engineering opaque, closed consumer tools |
| Cognition Windsurf Cascade | Testing in progress | Reverse-engineering partially documented, closed consumer tools |
| Google Gemini API | Hard limit: 20 URLs per request; supports PDF and JSON |
Identifying architectural constraints and format support |
| Microsoft GitHub Copilot | Agent-routed fetch_webpage - relevance-ranked excerpts, no fixed ceiling detected and/or curl - byte-perfect full retrieval |
Separating retrieval mechanism from retrieval quality through tool-call visibility |
| OpenAI Web Search | Tool invocation conditional and model-dependent; differs by API surface | Comparing behavior across API endpoints |
Agent Ecosystem Testing