Methodology
Turn-by-turn
Chat-based measurement through interaction, without direct code instrumentation
Software instrumentation is the process of adding code to a system to collect data about how it works; while the Cursor chat is public and accessible, the testing approach is different than calling an API to extract measurements programmatically.
Approach Comparison
Testing a closed consumer application vs an open API
Rather than target specific endpoints with documented interfaces,
Cursor testing targets consumer application with proprietary chat
behavior and multiple fetch mechanisms. Cursor’s chat web fetch and MCP
implementations don’t have a public API; MCP servers are user-configured,
implementations vary - mcp-server-fetch, fetch-browser-mcp, third-party,
and are observable through Cursor’s agent behavior, but not
instrumentable. Compare to this collection’s
Claude API Web Fetch testing:
| Aspect | Claude API | Cursor |
|---|---|---|
| Interface | Python API call, response object available |
Chat UI: observable only through output |
| Layers | Single: URL → fetch → return | Two: URL → fetch → @Web* output,then agent interprets |
| Instrumental Access | Full: can inspect ToolResult.content directly |
Partial: can only read agent output or manually copy @Web result |
| Repeatability | High: same URL yields identical API response | Medium: LLM interpretation varies, but@Web raw content should be stable |
| Fetch Mechanisms | One web fetch tool | Multiple: @Web, mcp-server-fetch,fetch-browser-mcp, third party |
| Best Findings | Hard limits, Claude API truncates at ~100 KB | Comparative limits: does MCP override @Web? Does agent auto-chunk? |
*Results logged as “Methods tested:
@Web” reflect prompt, user-facing syntax. However, post-analysis revealed testing misused@Webas a fetch command rather than a context attachment mechanism. The backend mechanismsWebFetch,mcp_web_fetchpossibly invoked autonomously by Cursor regardless of@Web syntax, visit Friction Note for analysis.
Track Design
| Interpreted | Raw | |
|---|---|---|
| Question | What does Cursor report back? Does it accurately perceive truncation? Are there systematic estimation errors?* | What actually came through the @Web command? Where exactly does truncation occur? Is the boundary consistent? |
| Method | Chat prompt asks @Web to fetch URL and report measurements |
Chat prompt asks @Web to fetch URL and return output verbatim, human manually extracts measurements |
| Captures | Cursor and underlying LLM’s interpretation of truncation, completeness | Actual response content from @Web command, post-processing, exact character boundaries |
| Measurements | LLM estimates: “appears truncated,” “approximately X KB,” “markdown seems complete” | Manual: character count via len(), token count via tiktoken, exact truncation point, last 50 characters |
| Repeatability | Varies between runs | Reproducible: same URL fetched multiple times yields consistent content |
| Best For | Understanding DX, identifying perception gaps | Citable baseline measurements for Agent-Friendly Docs Spec |
Approach limitations: general variation between runs; can’t programmatically inspect a surfaced API field; variation expected between MCP server >versions, IDE version, LLM selection; some URLs possibly gated
Cursor-Specific Unknowns
| Question | Details | Approach | Value |
|---|---|---|---|
| Multiple Fetch Mechanisms | @Web native, proprietary mcp-server-fetch configurable; fetch-browser-mcp headless browser; third party servers |
Compare side-by-side on identical URLs | Determines if one mechanism has different limits; unique to Cursor, addresses ecosystem testing gap |
| HTML-to-Markdown Conversion Timing | Does Cursor truncate before or after HTML→markdown conversion? | SC-1-SC-4 measure truncation relative to content structure |
Pre-conversion: lose 40-50% of characters to HTML/CSS overhead Post-conversion: Markdown smaller, but structure may break at boundary |
| Agent Auto-chunking | After truncation, does @Web automatically request next chunk or require manual request? |
OP-4 agent retry pattern: observe unprompted follow-up fetches |
Not well-explored in Claude API testing; key gap in ecosystem methodology, shapes DX with large docs |
| Model Variability | Cursor’s Auto chat default setting; additionally supports Claude’s Opus, Sonnet, Gemini, GPT-5 |
Run tests with one LLM, tracked per run | Isolates fetch behavior from LLM inference variance; differences documented separately |
Agent Ecosystem Testing