Key Findings for Cursor’s Web Fetch Behavior, Cursor-interpreted
Test Workflow
- Run
python web_fetch_testing_framework.py --test {test ID} --track interpreted - Review terminal output
- Copy the provided prompt requesting agent report
@Web* fetch results: character count,
token estimate, truncation status, content completeness, Markdown formatting integrity - Open a new Cursor session, paste prompt into chat window
- Capture agent’s full text response, observations as the interpreted-finding;
gap
between agent’s self report and actual fetch behavior is a finding - Log structured metadata as described in
framework-reference.md - Ensure log results saved to
/results/cursor-interpreted/results.csv
*Results logged as “Methods tested:
@Web” reflect user-facing prompt syntax. Post-analysis revealed testing misused@Webas a fetch command rather than a context attachment. Cursor may autonomously call backend mechanismsWebFetch,mcp_web_fetchregardless of@Web syntax; visit Friction Note for analysis.
Platform Limit Summary
| Limit | Observed |
|---|---|
| Hard Character Limit |
None detected: tested up to 702 KB |
| Hard Token Limit |
None detected: tested up to ~179K tokens, average 33,912 |
| Output Consistency Small |
High variance: 2-3x across sessions, 1.9 KB → 5.6 KB same URL |
| Output Consistency Large |
Highly stable: <1% variance across sessions, 245 KB identical across 3 runs |
| Content Selection Behavior | Non-deterministic for small files; size-dependent |
| Truncation Pattern |
Respects content boundaries when occurs, no mid-sentence cuts |
| JavaScript-heavy SPAs |
Truncation at ~6 KB, ~1.5K tokens; free tier times out, Pro tier truncates cleanly |
| Redirect Chains | Successfully follows, tested 5-level redirect chain |
| Self-reported Completeness | Unreliable: model claims “full content” when returning subset |
Results Details
| Model | Auto |
| Total Tests | 26 |
| Distinct URLs | 13 |
| Input Size Range | 2 KB–256 KB |
| Truncation Detection | Model assertion, verbatim last-50-chars, Markdown integrity |
Cross-run Output Variance
| Test | Category | Run 1 chars |
Run 2 chars |
Run 3 chars |
Variance |
|---|---|---|---|---|---|
BL-1 |
Small - 87 KB | 1,953 | 5,595 | 4,100 | 2.9x |
BL-2 |
Small - 20 KB | 1,953 | 4,200 | 4,350 | 2.2x |
SC-2 |
Large - 80 KB | 702,885 | 702,885 | 702,885 | 1.0x |
OP-4 |
Large - 256 KB | 245,000 | 245,465 | 245,466 | 1.0x |
EC-1 |
SPA - 100 KB | 0 - timeout | 5,857 | null | null |
Truncation Analysis
| # | Finding | Tests | Observed | Spec |
|---|---|---|---|---|
| 1 | JavaScript-heavy SPAs truncation ceiling | EC-1r1 & r2 multiple sizes |
Free tier: timeout - 0 bytes; Pro tier: truncated at 5,857 chars, ~1.5K tokens, clean ending at last link block; suggests ~6KB or ~1.5K token ceiling specifically for SPA endpoints | SPAs truncated aggressively, not completely blocked; free tier timeouts mask Pro tier truncation behavior |
| 2 | Static HTML/Markdown pages have no detected ceiling | BL-1 through OP-4SC-2 - 702 KBOP-4 - 245 KB |
Successfully returned 702,885 characters from SC-2; 245,465 characters from OP-4; no truncation observed on static content |
No practical character ceiling detected for static docs; tested up to 700 KB |
| 3 | Output consistency size-dependent | BL-1BL-2SC-2OP-4 |
Small files, 1-20 KB: 2-3× variance across sessions, 1.9K→5.6K; large files, 80-256 KB: <1% variance, 702.8K identical, 245.5K identical | Fetch behavior reliability depends on size - small docs unreliable, large docs stable |
| 4 | Content selection is non-deterministic for small files, session-dependent | BL-1r1-r4 BL-2r1-r3 |
Identical prompts in different chat sessions produced 1,953 → 5,595 → 4,100 → 5,500 chars on BL-1; new sessions returned larger content than original session |
New chat sessions influence @Web output; conversation state affects fetch behavior |
| 5 | Same logical content, different formats, different sizes | BL-1 HTML vs BL-2 Markdownboth r1 |
Both returned 1,953 chars despite different source format, HTML vs .md; later runs diverged - 5,595 vs 4,200 - suggesting format-dependent processing |
Format affects fetch behavior; may process HTML and Markdown sources differently |
| 6 | Intelligent content filtering, not hard truncation | SC-4,EC-6 |
SC-4, 30 KB page returned 28 KB excluding footer/nav/metadata; EC-6 returned full 71 KB including complex Markdown; always ends at section boundaries |
For static content doesn’t truncate mid-content, filters non-essential structural elements, while preserving docs integrity |
| 7 | Agent’s self-reported completeness diverges from actual content | SC-3BL-1r3-r4 EC-1r2 |
SC-3: Agent reports “no truncation, complete reference” but content cuts mid-references section;EC-1 r2: Agent acknowledges truncation at ~6 KB despite 100 KB expected |
Self-report of content completeness unreliable, agent perceives filtered excerpts as “complete” because internally valid |
| 8 | Redirect chains handled transparently | EC-3 |
5-level redirect chain successfully followed; returned final destination content - 850 chars JSON without truncation | Follows HTTP redirects without user awareness or latency penalty |
| 9 | H2 supported for static content |
SC-2OP-4EC-6BL-3 |
Token counts range BL-1 488 to SC-2 175,721 with no observable limit; successfully returned 61K token document, OP-4, multiple times identically |
For static pages: if token-based, ceiling is extremely high - 200K+; effectively no practical limit |
| 10 | H3 confirmed for static content |
BL-1BL-2SC-2SC-3SC-4r1-r3 |
8 tests matched H3: content selection respects Markdown section boundaries; truncation occurs at header boundaries, code fence closes, list endings |
For static pages, uses intelligent, structure-aware content selection rather than char/token-based cutting |
Size-Dependent Behavior
While the exact bifurcation point is unclear, Cursor behavior shows divergent patterns.
Variance may depend on content type, structure, and size.
| Characteristic | High-Variance Cases | Stable Cases |
|---|---|---|
| Examples | BL-1 87 KBBL-2 20 KBSC-1 40 KB |
SC-2 80 KB - 702 KBOP-4 256 KB - 245 KBEC-6 61 KB |
| Consistency | 2-3× variance across sessions | <1% variance across sessions |
| Session Dependency |
New chat different results |
Reproducible same URL = same content |
| Reliability | Unreliable | Offer more consistency |
Perception Gap
User char/token count comparisons to detect content subsettings, not agent self-report
| Test | Size | Returned | Reported | Gap | Why “Complete” |
|---|---|---|---|---|---|
SC-3Wikipedia |
100 KB+ | 38 KB | “Complete reference” | 62% missing | Clean section boundary masks truncation |
BL-1MongoDB |
87 KB | 1.9K B | “Internally valid” | 95% missing | No mid-sentence cutoff, valid Markdown |
SC-4Markdown Guide |
65 KB | 28 KB | “All syntax sections” | 57% missing | Footer intentionally filtered, excerpt coherent |
Agent Ecosystem Testing