Key Findings Gemini's URL Context Tool - Raw

Key Findings for Gemini’s URL Context Tool, Raw

Call the Gemini API with the URL context tool enabled
Give Gemini a minimal prompt - just enough to trigger URL retrieval
Gemini fetches each URL via its pre-retrieval step, but isn’t asked to interpret, describe, or reflect on what it received
Extract raw retrieval outcomes directly from url_context_metadata in the response object -retrieved_url and url_retrieval_status per URL
Extract token accounting from usage_metadata — tool_use_prompt_token_count - URL content tokens - and prompt_token_count - text prompt tokens - recorded separately
Run all analysis in Python: URL counts, status enum enumeration, success/failure rates, token breakdowns
Gemini never interprets or reflects on the retrieval results
Results stored in google-gemini-url-context/results/raw/

Results summary

5 raw track runs: gemini-2.5-flash, runs 1–3 on free tier, daily cap exhausted after r3 test 2, runs 4–5 on paid tier; canonical results are r4 and r5 -

Test	URLs Req	URLs OK	Tool tokens - r4/r5	Result
`test_1_single_html`	1	1	3,099 / 3,128	Consistent across runs
`test_2_single_pdf`	1	0	119 / 126	`URL_RETRIEVAL_STATUS_ERROR` consistent
`test_3_multi_url_5`	5	5	27,508 / 27,506	Consistent across runs
`test_4_multi_url_20`	20	20	111,326 / 111,326	Consistent across runs
`test_5_multi_url_21`	21	0	-	`400 INVALID_ARGUMENT` consistent
`test_6_unsupported_youtube`	1	1	1,584 / 1,570	`URL_RETRIEVAL_STATUS_SUCCESS` - docs say unsupported
`test_7_unsupported_google_doc`	1	0	162 / 219	`URL_RETRIEVAL_STATUS_ERROR` consistent
`test_8_json_content`	1	1 - r1, r2 / 0 - r4, r5	116 / 112	Non-deterministic - succeeded r1 & r2, failed r4 & r5

#	Finding	Tests	Observed	Spec contribution
1	20-URL limit is a hard API error	`test_5` r3, r4, r5	`400 INVALID_ARGUMENT`: `"Number of urls to lookup exceeds the limit (21 > 20)"`. Zero URL content tokens consumed. Reproduced on all clean runs.	Limit enforced at the API layer before retrieval. Not truncation or silent dropping.
2	YouTube succeeds despite being documented as unsupported	`test_6` r1, r4, r5	`URL_RETRIEVAL_STATUS_SUCCESS` on all clean runs. Tool tokens: 1,525 / 1,584 / 1,570 - variance <4%.	Documented limitation doesn’t reflect current behavior on `gemini-2.5-flash` as of March 2026.
3	PDF retrieval fails consistently on a valid public PDF	`test_2` all 5 runs	`URL_RETRIEVAL_STATUS_ERROR` on every run. Tool tokens: 119–126, minimal and consistent. PDF is a documented supported type.	PDF retrieval fails reliably for this W3C URL. Follow-up needed with a different PDF source before drawing a firm conclusion.
4	Google Docs fail at the retrieval layer, not the API layer	`test_7` r1, r4, r5	`URL_RETRIEVAL_STATUS_ERROR` with tool tokens 156–219. Request completes normally - no API-level rejection.	Two distinct failure modes exist: API-layer rejection, hard error, zero tokens, as in `test_5` vs. retrieval-layer failure, request completes, status recorded in metadata.
5	JSON API endpoint retrieval is non-deterministic	`test_8` all runs	`URL_RETRIEVAL_STATUS_SUCCESS` in r1 and r2 - ~2,490 tool tokens; `URL_RETRIEVAL_STATUS_ERROR` in r4 and r5 - 112–116 tool tokens. No change in endpoint or prompt between runs.	The Gemini URL context tool’s handling of `application/json` responses from this endpoint is unreliable. Treat JSON API endpoints as non-deterministic until confirmed with a stable public endpoint.
6	Tool tokens dominate cost at scale and are stable across runs	`test_1`, `test_3`, `test_4` r4 & r5	At 20 URLs, tool tokens = 111,326 on both r4 and r5 - 0% variance. At 5 URLs: 27,506–27,508. At 1 URL: 3,099–3,134.	`tool_use_prompt_token_count` is reproducible to <1% across runs and accounts for ~98.6% of total cost at 20 URLs. Use it for cost estimation.
7	`url_context_metadata` order is non-deterministic	`test_3`, `test_4` r4 & r5	Metadata order shuffled relative to input order on every run. Shuffle pattern itself varies between runs.	Match results by `retrieved_url` string, not array index.

Token Scaling - r4 and r5 Averages

Test	URLs	Tool tokens avg	% of total
`test_1_single_html`	1	3,114	~86%
`test_3_multi_url_5`	5	27,507	~98.6%
`test_4_multi_url_20`	20	111,326	~98.9%