Platform Comparisons
| Section | Description |
|---|---|
| Retrieval | How and when an agent fetches content |
| Truncation | What gets lost and whether agents report it |
| Summarization | What happens to content between retrieval and generation |
Retrieval
The web fetch gap isn’t in retrieval, but in what follows: how agents attend to various content types during generation, whether that’s context window handling, chunking losses, or summarization. Platform links lead to each tool’s official documentation.
| Platform | Prompt Syntax | Invocation Pattern | Retrieval Behavior |
|---|---|---|---|
| Claude API web fetch | Enable tool to augment Claude’s context with URL | Mid-generation deterministic: tool requires enablement in API request, includes URL validation and results cache, may or may not provide live web content | Visibility high: only platform where response body includes raw tool result; no JavaScript execution, CSS-heavy pages and/or SPAs often return little to no prose |
| Cursor | No web fetch behavior publicly documented, @Web context attachment redundant, agents don’t correct misuse |
Mid-generation nondeterministic: Auto default setting autonomous LLM and fetch method selection per request |
Visibility low: fetch method not explicitly named, no JavaScript execution, CSS-heavy pages and/or SPAs often return little to no prose; prefers Markdown, content negotation documented with Accept: text/markdown; sends full browser fingerprint: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36 with Chrome client hints Sec-Ch-Ua, Sec-Fetch-* |
| Gemini API URL context | Enable tool to augment Gemini’s context with URL, request requires url_context with full, unnested URLs |
Pre-generation injection deterministic: two-step process, fetches from internal cache, if unsuccessful, then live fetch; documentation includes parsing limitations | Visibility low: retrieved content injected into context without a testable field, retrieval orchestration and generation process opaque; url_context_metadata order nondeterministic, authoritative signal url_retrieval_status, tool_use_prompt_token_count only size proxy |
| GitHub Copilot | No web fetch behavior publicly documented, prompt with URL | Mid-generation nondeterministic: Auto default setting autonomous LLM and fetch method selection per request |
Visibility medium: intermittently tools named via error, fetch_webpage returns relevance-ranked excerpts with elision markers, occasional nonlinear, inaccurate reassembly, curl byte-perfect retrieval, but no prose; content negotiation tool-dependent, presents as a browser, but overclaims User-Agent: Mozilla/5.0, AppleWebKit Accept: full HTML, curl/8.7.1 no preference,Accept: */* |
| OpenAI web search | Chat Completions API augments GPT’s search with URL, Responses API for web_search |
Mid-generation nondeterministic: integration and agent-dependent: static facts and trivial math don’t invoke the tool; Chat Completions search implicit, Responses web_search_preview conditional, control cached/indexed or live content external_web_access |
Visibility low: Responses response.output’s web_search_call names tools, but search context not equal to LLM context window; no JavaScript execution; search_context_size: low/medium/high controls context amount, Chat Completions latency lever consistent, but Responses inconsistent |
| Windsurf Cascade | Web and docs search partially documented, @web directive redundant with URL, agents don’t correct misuse |
Mid-generation deterministic: autonomous two-stage pipeline designed to emulate human browsing and skimming, documentation acknowledges not all pages parseable | Visibility medium: read_url_content returns chunk index with summaries, metadata and requires sequential view_content_chunk calls; curl substitution for CSS-heavy pages, SPAs return ~20–35% of expected size, little or no prose; agents used @web’s web_search as verification once every ~60 turns; presentation transparent about using crawler-scaper, but underdelivers, User-Agent: Colly |
Truncation
Pipelines are lossy by design in attempt to balance token cost, speed, and access to fresh content. Agents intermittently acknowledge architectural constraints, misattribute truncation causes, or self-report completeness when content is incomplete or unusable. Platform links lead to interpreted vs raw track analysis.
| Platform | Truncation Limit | Observations |
|---|---|---|
| Claude API web fetch | ~20,700 chars and/or ~100 KB of rendered content default unset |
max_content_tokens approximate, setting 5,000 returned 17,186 chars, truncation occurs mid-token. Default limit identified in raw track, self-report attributed missing content to JavaScript rendering, masking character limit. |
| Cursor | 28 KB–240 KB+ method-dependent, nondeterministic filtering |
WebFetch MCP ~28 KB, urllib ~72 KB, unknown path 245 KB+, curl no ceiling detected; appears to apply structure-aware content filtering, navigation and CSS stripped, but content selection heuristic presents as complete, so agents don’t report truncation. |
| Gemini API URL context |
No fixed ceiling or silent dropping detected, 20 URLs hard limit per request |
API-layer rejection returns 400 and doesn’t consume tokens; retrieval-layer failure completes the request, but records URL_RETRIEVAL_STATUS_ERROR. Format support inconsistent with documentation: PDF fails, YouTube succeeds, JSON nondeterministic; Google Docs fail consistently. |
| GitHub Copilot |
No fixed ceiling detected, nondeterministic excerpting, tested 6.68M tokens |
Pipeline with fetch_webpage discards whole sections or more granularly before generation, curl delivers all raw bytes but unreadable, chat rendering cutoff visible in output, not persisted as requested, but agents don’t reliably report these results as truncation. |
| OpenAI web search |
No fixed ceiling or silent dropping detected | Raw source count stable at 12 regardless of search_context_size setting. Query construction not temporally aware, internal queries append training-era date strings despite running in 2026. Documented domain filtering limits not functional in Python SDK. |
| Windsurf Cascade | No fixed ceiling detected at retrieval stage, nondeterministic agent-dependent write ceiling | Full retrieval agent and doc-size-dependent. Agents often retrieve fully under ~14 chunks, spotty at ~35, sparse sampling at 50+. Chunk index summary population not guaranteed, those present often include byte-count loss notices. Unique read-write asymmetry. Agents often self-report full retrieval, but fail to prove it with a write task or report truncation. |
Summarization
While default settings abstract orchestrator-subagent relationships away, platforms offer agent configuration, which is outside this testing’s scope. Observable outputs inform the conclusions below. Platform links lead to testing methodologies.
| Platform | Processing Layer | Inference |
|---|---|---|
| Claude API web fetch |
Dynamic filtering optional,web_fetch_20260209 |
Server-side tool called directly with inspectable tool result in response. Dynamic filtering available with certain LLMs in which Claude writes, executes code to filter before content reaches the context window, but it’s not default behavior. |
| Cursor | Inferred via filtering, undocumented for web fetch |
Codebase research, terminal commands, and browser automation requests trigger built-in subagents explore, bash, and browser. AET prompts likely invoked explore and bash alongside web fetch. Backend routing and structure-aware content filtering suggest a pre-generation processing layer, not a passive, linear pipeline. |
| Gemini API URL context |
API layer pipeline, undocumented |
Pre-generation injection suggests processing occurs before LLM invocation. No transformation layer between retrieval and generation; LLM receives content directly and any summarization occurs as part of generation, not as an intermediate pipeline stage. |
| GitHub Copilot |
Inferred via relevance-ranking, undocumented for web fetch |
Reassembled excerpts, outputs that don’t note discarded content, browser masquerading, and tool substitution patterns suggests an orchestrator-subagent relationship and not a linear, passive pipeline. Agent loop descriptions vary by implementation. VS Code-Copilot docs describe subagent delegation as main agent-initiated for complex tasks with further config available, but Copilot SDK docs only describe subagents as configurable, and not default architecture. |
| OpenAI web search |
Differs by API surface, undocumented |
Chat Completions autonomously retrieves, but Responses’ LLM actively manages search in the chain of thought with open_page and find_in_page, suggesting a processing layer, but not explicitly documented or named in either API responses. |
| Windsurf Cascade |
Inferred via chunking, undocumented for web and docs search |
Codebase research triggers built-in subagent Fast Context. AET prompts likely invoked Fast Context alongside web search. Chunk analysis, tool substitution, terminal execution, and workspace referencing suggest an extensive processing layer, not a passive, linear pipeline. |
Agent Ecosystem Testing