Key Findings for Claude’s Web Fetch Tool, Raw
Test Workflow
- Call Claude’s API with the web fetch tool enabled
- Give Claude a minimal prompt, just enough to trigger the fetch
- Claude fetches page, but not asked to interpret or describe it
- Extract the raw content directly from the
web_fetch_tool_resultblock in the response object - Run analysis in Python: character counts, CSS indicator detection,
boilerplate estimation,
truncation detection - Results saved to
claude-api/results/
Results Summary
| Finding | Detail |
|---|---|
| CSS indicators: 0 across all tests |
API tool strips CSS effectively before content reaches the LLM. Unlike Claude Code, choked by inline CSS, no CSS indicators appeared in any test, including the short HTML page that broke Claude Code entirely. |
| Boilerplate still significant | Short HTML: 81% boilerplate before the first heading. Long HTML: 97.5%. Nav menus, image links, product listings consume the majority of the content budget before any docs appear. |
| Markdown dramatically reduces boilerplate | Short HTML: 25,925 chars, first heading at char 21,009 - 81% boilerplate. Short Markdown: 6,024 chars, first heading at char 314 - 5.2% boilerplate. Serving Markdown reduces total content by 77% and moves the first heading 98% earlier. |
max_content_tokens truncation approximate |
Setting max_content_tokens=5000 reduced content from 20,696 to 17,186 chars, not 5,000. Consistent with the API docs noting the parameter is approximate, but the margin is substantial. |
max_content_tokens truncates mid-content |
Fourth test ends with /docs/current, a mid-URL cutoff. Truncation isn’t clean and can occur inside a token, not at a sentence or element boundary. |
| Truncation limit undocumented | Third test with no max_content_tokens set still truncated at 20,696 chars, ending mid-word, last char: 'e'. This suggests a default limit exists even with the parameter omitted, but it isn’t mentioned in the API docs. |
Agent Ecosystem Testing