Methodology
-
Chat-based measurement through interaction, without direct code instrumentation
The Cascade testing framework is the third in a series of chat-based agent testing frameworks in this collection, following Cursor and Copilot. Each platform has surfaced a different relationship between user-facing fetch syntax and actual agent behavior — a pattern that directly shapes this framework’s design.
-
An evolving relationship between fetch syntax and agent behavior
Across three platforms, the role of explicit web fetch directives has shifted in a consistent direction. This pattern — explicit syntax → autonomous behavior → documented-but-effect-unknown — is the central methodological question Cascade testing inherits from the two prior frameworks. The three-track design below exists specifically to isolate
@webas a variable, rather than assume its effect in either direction.Platform User-Facing Syntax What Testing Revealed Cursor @Webcontext attachmentDirect invocation was unnecessary as capability had become autonomous by the time testing began; backend mechanisms WebFetch,mcp_web_fetchinvoke regardless of@WebsyntaxCopilot None documented No user-invocable syntax exists; testing surfaced the undocumented fetch_webpagetool from agent outputCascade @webdirective, documentedDoes invoking @webchange retrieval behavior —
ceiling, tool chain, chunking? -
Testing a closed consumer application vs an open API
Rather than target specific API endpoints with documented interfaces, Cascade testing targets a consumer application with proprietary chat behavior and a partially documented tool layer. Cascade’s web fetch implementation surfaces three named tools —
read_url_contentfor direct URL fetch,view_content_chunkfor paginating large documents viaDocumentId, andsearch_webfor query-based lookup — reported by Cascade itself during runs. While these tools are referenced in documentation, they don’t include many details. Compare to Claude API testing, in which fetch behavior is directly inspectable viatool_result.Aspect Cursor Copilot Cascade User Fetch Syntax @Webcontext attachmentNone @webdirectiveTools Observed WebFetch,mcp_web_fetchfetch_webpage
and/orcurlread_url_content,search_web,view_content_chunkRepeatability Medium Low — model routing variance Low — approval interaction may
affect routingQuestions Does MCP override @Web?
Does agent auto-chunk?Does fetch_webpagehave a consistent ceiling? Does it vary by model?Does @webchange the ceiling, tool chain, or chunking behavior? -
Measuring with three complementary tracks
Interpreted Track Raw Track Explicit Track Question What does Cascade report back without steering? Does it accurately perceive truncation? What does read_url_contentactually return?
Where exactly does truncation occur?Does adding @webchange truncation limits, tool chain, or chunking behavior?Method Fetch URL, report measurements;
no@webFetch URL, return output verbatim; no @web; verification script extracts measurementsIdentical to interpreted track, prefixed with @webMeasurements Model estimates: “appears truncated at ~X chars,” “Markdown seems complete” Character count via len(), token count viatiktoken, exact truncation point,
last 50 charactersSame as interpreted track; compared against implicit baseline Repeatability Low — approval interaction may affect routing Medium — same URL should yield consistent read_url_contentoutputMedium — @webmay stabilize tool selectionBest For Understanding DX; surfacing approval-gated fetch — does approval-gating affect routing consistency? Auto-pagination behavior - does view_content_chunkpaginate automatically, or only when prompted?@webeffect on ceiling - does@webchange or repeat the Cursor finding that the directive is redundant?Known limitations: interpreted and explicit tracks vary between runs;
read_url_contentrequires approval before fetch executes — approval interaction itself may influence routing and logged per run;view_content_chunkpagination viaDocumentIdonly partially observable through model output
Agent Ecosystem Testing