Key Findings for Cursor’s Web Fetch Behavior, Cursor-interpreted

Test Workflow

Run python web_fetch_testing_framework.py --test {test ID} --track interpreted
Review terminal output
Copy the provided prompt requesting agent report @Web* fetch results: character count,
token estimate, truncation status, content completeness, Markdown formatting integrity
Open a new Cursor session, paste prompt into chat window
Capture agent’s full text response, observations as the interpreted-finding; gap
between agent’s self report and actual fetch behavior is a finding
Log structured metadata as described in framework-reference.md
Ensure log results saved to /results/cursor-interpreted/results.csv

*Results logged as “Methods tested: @Web” reflect user-facing prompt syntax. Post-analysis revealed testing misused @Web as a fetch command rather than a context attachment. Cursor may autonomously call backend mechanisms WebFetch, mcp_web_fetch regardless of @Web syntax; visit Friction Note for analysis.

Platform Limit Summary

Limit	Observed
Hard Character Limit	None detected: tested up to 702 KB
Hard Token Limit	None detected: tested up to ~179K tokens, average 33,912
Output Consistency Small	High variance: 2-3x across sessions, 1.9 KB → 5.6 KB same URL
Output Consistency Large	Highly stable: <1% variance across sessions, 245 KB identical across 3 runs
Content Selection Behavior	Non-deterministic for small files; size-dependent
Truncation Pattern	Respects content boundaries when occurs, no mid-sentence cuts
JavaScript-heavy SPAs	Truncation at ~6 KB, ~1.5K tokens; free tier times out, Pro tier truncates cleanly
Redirect Chains	Successfully follows, tested 5-level redirect chain
Self-reported Completeness	Unreliable: model claims “full content” when returning subset

Results Details

Model	`Auto`
Total Tests	26
Distinct URLs	13
Input Size Range	2 KB–256 KB
Truncation Detection	Model assertion, verbatim last-50-chars, Markdown integrity

Cross-run Output Variance

Test	Category	Run 1 chars	Run 2 chars	Run 3 chars	Variance
`BL-1`	Small - 87 KB	1,953	5,595	4,100	2.9x
`BL-2`	Small - 20 KB	1,953	4,200	4,350	2.2x
`SC-2`	Large - 80 KB	702,885	702,885	702,885	1.0x
`OP-4`	Large - 256 KB	245,000	245,465	245,466	1.0x
`EC-1`	SPA - 100 KB	0 - timeout	5,857	null	null

Truncation Analysis

#	Finding	Tests	Observed	Spec
1	JavaScript-heavy SPAs truncation ceiling	`EC-1` r1 & r2 multiple sizes	Free tier: timeout - 0 bytes; Pro tier: truncated at 5,857 chars, ~1.5K tokens, clean ending at last link block; suggests ~6KB or ~1.5K token ceiling specifically for SPA endpoints	SPAs truncated aggressively, not completely blocked; free tier timeouts mask Pro tier truncation behavior
2	Static HTML/Markdown pages have no detected ceiling	`BL-1` through `OP-4` `SC-2` - 702 KB `OP-4` - 245 KB	Successfully returned 702,885 characters from `SC-2`; 245,465 characters from `OP-4`; no truncation observed on static content	No practical character ceiling detected for static docs; tested up to 700 KB
3	Output consistency size-dependent	`BL-1` `BL-2` `SC-2` `OP-4`	Small files, 1-20 KB: 2-3× variance across sessions, 1.9K→5.6K; large files, 80-256 KB: <1% variance, 702.8K identical, 245.5K identical	Fetch behavior reliability depends on size - small docs unreliable, large docs stable
4	Content selection is non-deterministic for small files, session-dependent	`BL-1` r1-r4 `BL-2` r1-r3	Identical prompts in different chat sessions produced 1,953 → 5,595 → 4,100 → 5,500 chars on `BL-1`; new sessions returned larger content than original session	New chat sessions influence `@Web` output; conversation state affects fetch behavior
5	Same logical content, different formats, different sizes	`BL-1` HTML vs `BL-2` Markdown both r1	Both returned 1,953 chars despite different source format, HTML vs `.md`; later runs diverged - 5,595 vs 4,200 - suggesting format-dependent processing	Format affects fetch behavior; may process HTML and Markdown sources differently
6	Intelligent content filtering, not hard truncation	`SC-4`, `EC-6`	`SC-4`, 30 KB page returned 28 KB excluding footer/nav/metadata; `EC-6` returned full 71 KB including complex Markdown; always ends at section boundaries	For static content doesn’t truncate mid-content, filters non-essential structural elements, while preserving docs integrity
7	Agent’s self-reported completeness diverges from actual content	`SC-3` `BL-1` r3-r4 `EC-1` r2	`SC-3`: Agent reports “no truncation, complete reference” but content cuts mid-references section; `EC-1` r2: Agent acknowledges truncation at ~6 KB despite 100 KB expected	*Self-report of content completeness unreliable, agent perceives filtered excerpts as “complete”* because internally valid**
8	Redirect chains handled transparently	`EC-3`	5-level redirect chain successfully followed; returned final destination content - 850 chars JSON without truncation	Follows HTTP redirects without user awareness or latency penalty
9	`H2` supported for static content	`SC-2` `OP-4` `EC-6` `BL-3`	Token counts range `BL-1` 488 to `SC-2` 175,721 with no observable limit; successfully returned 61K token document, `OP-4`, multiple times identically	For static pages: if token-based, ceiling is extremely high - 200K+; effectively no practical limit
10	`H3` confirmed for static content	`BL-1` `BL-2` `SC-2` `SC-3` `SC-4` r1-r3	8 tests matched `H3`: content selection respects Markdown section boundaries; truncation occurs at header boundaries, code fence closes, list endings	For static pages, uses intelligent, structure-aware content selection rather than char/token-based cutting

Size-Dependent Behavior

While the exact bifurcation point is unclear, Cursor behavior shows divergent patterns.
Variance may depend on content type, structure, and size.

Characteristic	High-Variance Cases	Stable Cases
Examples	`BL-1` 87 KB `BL-2` 20 KB `SC-1` 40 KB	`SC-2` 80 KB - 702 KB `OP-4` 256 KB - 245 KB `EC-6` 61 KB
Consistency	2-3× variance across sessions	<1% variance across sessions
Session Dependency	New chat different results	Reproducible same URL = same content
Reliability	Unreliable	Offer more consistency

Perception Gap

User char/token count comparisons to detect content subsettings, not agent self-report

Test	Size	Returned	Reported	Gap	Why “Complete”
`SC-3` Wikipedia	100 KB+	38 KB	“Complete reference”	62% missing	Clean section boundary masks truncation
`BL-1` MongoDB	87 KB	1.9K B	“Internally valid”	95% missing	No mid-sentence cutoff, valid Markdown
`SC-4` Markdown Guide	65 KB	28 KB	“All syntax sections”	57% missing	Footer intentionally filtered, excerpt coherent