Agent Ecosystem Testing

Methodology


Turn-by-Turn

Chat-based measurement through interaction, without direct code instrumentation

The Cascade testing framework is the third in a series of chat-based agent testing frameworks in this collection, following Cursor and Copilot. Each platform has surfaced a different relationship between user-facing fetch syntax and actual agent behavior - a pattern that directly shapes this framework’s design.


Syntax Comparison

An evolving relationship between fetch syntax and agent behavior

Across three platforms, the role of explicit web fetch directives has shifted in a consistent direction. This pattern: explicit syntax → autonomous behavior → documented-but-effect-unknown, is the central methodological question Cascade testing inherits from the two prior frameworks. The three-track design below exists specifically to isolate @web as a variable, rather than assume its effect in either direction.

Platform Syntax Testing Reveal
Cursor @Web context attachment, undocumented Inclusion unnecessary as capability autonomous
when testing began; WebFetch, mcp_web_fetch called regardless of @Web syntax
Copilot None documented No user-invocable syntax exists; testing identified
fetch_webpage in agent output
Cascade @web directive,
partially documented
Does invoking @web change retrieval behavior,
ceiling, tool chain, chunking?

Architecture Comparison

Testing a closed consumer application vs an open API

Rather than target specific API endpoints with documented interfaces, Cascade testing targets a consumer application with proprietary chat behavior and a partially documented tool layer. Cascade’s web fetch implementation surfaces three named tools: read_url_content for direct URL fetch, view_content_chunk for paginating large documents via DocumentId, and search_web for query-based lookup, reported by Cascade itself during runs. These tools are partially documented, without many details. Compare with the methodology of the Claude API testing, in which fetch behavior is directly inspectable via tool_result.

Aspect Cursor Copilot Cascade
User Fetch Syntax @Web context attachment None @web directive
Tools Observed WebFetch, mcp_web_fetch fetch_webpage
and/or curl
read_url_content, search_web, view_content_chunk
Repeatability Medium Low, agent routing variance Low, approval interaction may
affect routing
Questions Does MCP override @Web?
Does agent auto-chunk?
Does fetch_webpage have a consistent ceiling? Does it vary by agent? Does @web change the ceiling, tool chain, or chunking behavior?

Track Design

  Interpreted Explicit Raw
Question What does Cascade report back without steering? Does it accurately perceive truncation? Does adding @web change truncation limits, tool chain, or chunking behavior? What does read_url_content
actually return?
Where exactly does truncation occur?
Method Fetch URL, report measurements;
no @web
Identical to interpreted track, prefixed with @web Fetch URL, return output verbatim; no @web; verification script extracts measurements
Measurements Model estimates: “appears truncated at ~X chars,” “Markdown seems complete” Same as interpreted track; compared against implicit baseline Character count via len(), token count via tiktoken, exact truncation point,
last 50 characters
Repeatability Low, approval interaction may affect routing Medium, @web may stabilize tool selection Medium, same URL should yield consistent read_url_content output
Best For Understanding DX; surfacing approval-gated fetch, does approval-gating affect routing consistency? @web effect on ceiling - does @web change or repeat the Cursor finding that the directive is redundant? Auto-pagination behavior - does view_content_chunk paginate automatically, or only when prompted?

Limitations: variations between runs; read_url_content requires approval before fetch executes, approval interaction itself may influence routing and logged per run while view_content_chunk pagination via DocumentId only partially observable through agent thought panel and self-report.