Methodology

Turn-by-Turn

Chat-based measurement through interaction, without direct code instrumentation

The Cascade testing framework is the third in a series of chat-based agent testing frameworks in this collection, following Cursor and Copilot. Each platform has surfaced a different relationship between user-facing fetch syntax and actual agent behavior - a pattern that directly shapes this framework’s design.

Syntax Comparison

An evolving relationship between fetch syntax and agent behavior

Across three platforms, the role of explicit web fetch directives has shifted in a consistent direction. This pattern: explicit syntax → autonomous behavior → documented-but-effect-unknown, is the central methodological question Cascade testing inherits from the two prior frameworks. The three-track design below exists specifically to isolate @web as a variable, rather than assume its effect in either direction.

Platform	Syntax	Testing Reveal
Cursor	`@Web` context attachment, undocumented	Inclusion unnecessary as capability autonomous when testing began; `WebFetch`, `mcp_web_fetch` called regardless of `@Web` syntax
Copilot	None documented	No user-invocable syntax exists; testing identified `fetch_webpage` in agent output
Cascade	`@web` directive, partially documented	Does invoking `@web` change retrieval behavior, ceiling, tool chain, chunking?

Architecture Comparison

Testing a closed consumer application vs an open API

Rather than target specific API endpoints with documented interfaces, Cascade testing targets a consumer application with proprietary chat behavior and a partially documented tool layer. Cascade’s web fetch implementation surfaces three named tools: read_url_content for direct URL fetch, view_content_chunk for paginating large documents via DocumentId, and search_web for query-based lookup, reported by Cascade itself during runs. These tools are partially documented, without many details. Compare with the methodology of the Claude API testing, in which fetch behavior is directly inspectable via tool_result.

Aspect	Cursor	Copilot	Cascade
User Fetch Syntax	`@Web` context attachment	None	`@web` directive
Tools Observed	`WebFetch`, `mcp_web_fetch`	`fetch_webpage` and/or `curl`	`read_url_content`, `search_web`, `view_content_chunk`
Repeatability	Medium	Low, agent routing variance	Low, approval interaction may affect routing
Questions	Does MCP override `@Web`? Does agent auto-chunk?	Does `fetch_webpage` have a consistent ceiling? Does it vary by agent?	Does `@web` change the ceiling, tool chain, or chunking behavior?

Track Design

	Interpreted	Explicit	Raw
Question	What does Cascade report back without steering? Does it accurately perceive truncation?	Does adding `@web` change truncation limits, tool chain, or chunking behavior?	What does `read_url_content` actually return? Where exactly does truncation occur?
Method	Fetch URL, report measurements; no `@web`	Identical to interpreted track, prefixed with `@web`	Fetch URL, return output verbatim; no `@web`; verification script extracts measurements
Measurements	Model estimates: “appears truncated at ~X chars,” “Markdown seems complete”	Same as interpreted track; compared against implicit baseline	Character count via `len()`, token count via `tiktoken`, exact truncation point, last 50 characters
Repeatability	Low, approval interaction may affect routing	Medium, `@web` may stabilize tool selection	Medium, same URL should yield consistent `read_url_content` output
Best For	Understanding DX; surfacing approval-gated fetch, does approval-gating affect routing consistency?	`@web` effect on ceiling - does `@web` change or repeat the Cursor finding that the directive is redundant?	Auto-pagination behavior - does `view_content_chunk` paginate automatically, or only when prompted?

Limitations: variations between runs; read_url_content requires approval before fetch executes, approval interaction itself may influence routing and logged per run while view_content_chunk pagination via DocumentId only partially observable through agent thought panel and self-report.