Methodology
Turn-by-Turn
Chat-based measurement through interaction, without direct code instrumentation
The Cascade testing framework is the third in a series of chat-based agent testing frameworks in this collection, following Cursor and Copilot. Each platform has surfaced a different relationship between user-facing fetch syntax and actual agent behavior - a pattern that directly shapes this framework’s design.
Syntax Comparison
An evolving relationship between fetch syntax and agent behavior
Across three platforms, the role of explicit web fetch directives has shifted in a consistent
direction. This pattern: explicit syntax → autonomous behavior → documented-but-effect-unknown,
is the central methodological question Cascade testing inherits from the two prior frameworks. The
three-track design below exists specifically to isolate @web as a variable, rather than assume its
effect in either direction.
| Platform | Syntax | Testing Reveal |
|---|---|---|
| Cursor | @Web context attachment, undocumented |
Inclusion unnecessary as capability autonomous when testing began; WebFetch, mcp_web_fetch called regardless of @Web syntax |
| Copilot | None documented | No user-invocable syntax exists; testing identifiedfetch_webpage in agent output |
| Cascade | @web directive,partially documented |
Does invoking @web change retrieval behavior,ceiling, tool chain, chunking? |
Architecture Comparison
Testing a closed consumer application vs an open API
Rather than target specific API endpoints with documented interfaces, Cascade testing targets a consumer
application with proprietary chat behavior and a partially documented tool layer. Cascade’s web fetch
implementation surfaces three named tools: read_url_content for direct URL fetch, view_content_chunk
for paginating large documents via DocumentId, and search_web for query-based lookup, reported by
Cascade itself during runs. These tools are partially documented, without many details. Compare with the
methodology of the Claude API testing,
in which fetch behavior is directly inspectable via tool_result.
| Aspect | Cursor | Copilot | Cascade |
|---|---|---|---|
| User Fetch Syntax | @Web context attachment |
None | @web directive |
| Tools Observed | WebFetch, mcp_web_fetch |
fetch_webpageand/or curl |
read_url_content, search_web, view_content_chunk |
| Repeatability | Medium | Low, agent routing variance | Low, approval interaction may affect routing |
| Questions | Does MCP override @Web?Does agent auto-chunk? |
Does fetch_webpage have a consistent ceiling? Does it vary by agent? |
Does @web change the ceiling, tool chain, or chunking behavior? |
Track Design
| Interpreted | Explicit | Raw | |
|---|---|---|---|
| Question | What does Cascade report back without steering? Does it accurately perceive truncation? | Does adding @web change truncation limits, tool chain, or chunking behavior? |
What does read_url_contentactually return? Where exactly does truncation occur? |
| Method | Fetch URL, report measurements; no @web |
Identical to interpreted track, prefixed with @web |
Fetch URL, return output verbatim; no @web; verification script extracts measurements |
| Measurements | Model estimates: “appears truncated at ~X chars,” “Markdown seems complete” | Same as interpreted track; compared against implicit baseline | Character count via len(), token count via tiktoken, exact truncation point,last 50 characters |
| Repeatability | Low, approval interaction may affect routing | Medium, @web may stabilize tool selection |
Medium, same URL should yield consistent read_url_content output |
| Best For | Understanding DX; surfacing approval-gated fetch, does approval-gating affect routing consistency? | @web effect on ceiling - does @web change or repeat the Cursor finding that the directive is redundant? |
Auto-pagination behavior - does view_content_chunk paginate automatically, or only when prompted? |
Limitations: variations between runs;
read_url_contentrequires approval before fetch executes, approval interaction itself may influence routing and logged per run whileview_content_chunkpagination viaDocumentIdonly partially observable through agent thought panel and self-report.
Agent Ecosystem Testing