Testing Index
Empirical measurement of what happens between “agent fetches URL” and “user sees output” - retrieval mechanism behavior, content transformation, and architectural constraints. Observes the URL-to-response pipeline through layers platforms don’t disclose. Documents output variation with a two-track approach: interpreted captures agent self-perception, raw produces citable data for the Agent-Friendly Documentation Spec.
Blogs
| Post | Focus |
|---|---|
| Field Notes from a Yelper: Guerrilla Testing Agents | Methodology evolution: what broke, what changed, and letting data lead |
Documentation Structure
| Section | Purpose |
|---|---|
| Methodology | Testing approach details, track design, constraints |
| Interpreted vs Raw | Observations, implications for agent devs, docs teams |
| Findings: Interpreted | Agentic retrieval, reasoning, reporting |
| Findings: Raw | Agentic write behavior, programmatically extracted metrics |
| Friction Note | Known issues, gaps, or edge cases encountered during testing |
Results Summary
More analysis in Platform Comparisons. Platform links lead to testing methodologies.
| Platform | Key Finding | Focus |
|---|---|---|
| Anthropic Claude API | Char-based truncation at ~100 KB of rendered content | Baseline reference; establishing two-track methodology |
| Anysphere Cursor | Agent-routed fetch with undocumented truncation 28 KB–240 KB+; high cross-session variance |
Reverse-engineering opaque, closed consumer tools |
| Cognition Windsurf Cascade | Two-stage chunking-pipeline, no fixed ceiling; retrieval completeness agent and source size-dependent; read-write asymmetry; @web redundant with a URL |
Three-track design; truncation testing partially documented lossy architecture |
| Google Gemini API | Hard limit: 20 URLs per request; supports PDF and JSON | Identifying architectural constraints, format support |
| Microsoft GitHub Copilot | Agent-routed fetch_webpage→relevance-ranked excerpts, no fixed ceiling detected vs curl byte-perfect full retrieval |
Separating retrieval mechanism from retrieval quality through toolchain visibility |
| OpenAI Codex | testing in progress | Surface Comparison |
| OpenAI Web Search | Tool invocation conditional, agent-dependent; differs by API surface | Comparing behavior across different APIs |
Agent Ecosystem Testing