Testing Index

Empirical measurement of what happens between “agent fetches URL” and “user sees output” - retrieval mechanism behavior, content transformation, and architectural constraints. Observes the URL-to-response pipeline through layers platforms don’t disclose. Documents output variation with a two-track approach: interpreted captures agent self-perception, raw produces citable data for the Agent-Friendly Documentation Spec.

Blogs

Post	Focus
Field Notes from a Yelper: Guerrilla Testing Agents	Methodology evolution: what broke, what changed, and letting data lead

Documentation Structure

Section	Purpose
Methodology	Testing approach details, track design, constraints
Interpreted vs Raw	Observations, implications for agent devs, docs teams
Findings: Interpreted	Agentic retrieval, reasoning, reporting
Findings: Raw	Agentic write behavior, programmatically extracted metrics
Friction Note	Known issues, gaps, or edge cases encountered during testing

Results Summary

More analysis in Platform Comparisons. Platform links lead to testing methodologies.

Platform	Key Finding	Focus
Anthropic Claude API	Char-based truncation at ~100 KB of rendered content	Baseline reference; establishing two-track methodology
Anysphere Cursor	Agent-routed fetch with undocumented truncation 28 KB–240 KB+; high cross-session variance	Reverse-engineering opaque, closed consumer tools
Cognition Windsurf Cascade	Two-stage chunking-pipeline, no fixed ceiling; retrieval completeness agent and source size-dependent; read-write asymmetry; `@web` redundant with a URL	Three-track design; truncation testing partially documented lossy architecture
Google Gemini API	Hard limit: 20 URLs per request; supports PDF and JSON	Identifying architectural constraints, format support
Microsoft GitHub Copilot	Agent-routed `fetch_webpage`→relevance-ranked excerpts, no fixed ceiling detected vs `curl` byte-perfect full retrieval	Separating retrieval mechanism from retrieval quality through toolchain visibility
OpenAI Codex	testing in progress	Surface Comparison
OpenAI Web Search	Tool invocation conditional, agent-dependent; differs by API surface	Comparing behavior across different APIs