Key Findings for OpenAI Web Search, Raw

Raw Test Workflow

Call the Responses API with gpt-4o, web_search_preview tool enabled
Give the model a minimal prompt; just enough to trigger retrieval
The model may or may not invoke web_search_preview depending on the query
Extract raw outcomes directly from response.output items:
- web_search_call items: type, action.query - the internal search query issued
- message items: output_text
Extract sources list from response.sources - all URLs consulted, not just cited
Extract token accounting from response.usage
Run all analysis in Python: tool invocation flag, source counts, latency
The model never interprets or reflects on the retrieval results
Ensure results saved to open-ai-web-search/results/raw/

Platform Limit Summary

Limit	Observation
Tool Invocation	Conditional, skipped for static facts and trivial math, consistent
Tool Invocation Visibility	Available explicit `web_search_call` item in `response.output`
`search_context_size` Latency Impact	Inconsistent `high` was slower in run 1, run 3, but faster than `low` in run 2
`search_context_size` Source Count Impact	None observed source count was 12 across all context sizes
Sources List All URLs Consulted	Available via `include=["web_search_call.action.sources"]`
Domain Filtering Allow List	Worked once on `web_search_preview` for run 1; broken on `web_search` across all subsequent runs
Domain Filtering Block List	Never succeeded, `filters` parameter rejected in all configurations and models tested
`search_queries_issued` Date Accuracy	Unreliable, model appends training-era year to internal queries despite running in 2026

Results Details

Run 5 = test_8, test_9 only, targeted domain filter retry on web_search_preview;
run 5 model = gpt-5 while the remainder of the test runs model = gpt-4o

Cross-run Tool Invocation

Test	Label	R1	R2	R3	R4	R5	R6
`test_1_live_data`	Live Data	✓	✓	✓	✓	null	✓
`test_2_recent_event`	Recent Event	✓	✓	✓	✓	null	✓
`test_3_static_fact`	Static Fact	✗	✗	✗	✗	null	✗
`test_4_trivial_math`	Trivial Math	✗	✗	✗	✗	null	✗
`test_5_open_research`	Open-ended Research	✓	✓	✓	✓	null	✓
`test_6_context_size_low`	`context_size` Low	✓	✓	✓	✓	null	✓
`test_7_context_size_high`	`context_size` High	✓	✓	✓	✓	null	✓
`test_8_domain_filter_allowed`	Allow List Filter	`ERR`†	✓*	`ERR`‡	`ERR`‡	`ERR`¶	`ERR`‡
`test_9_domain_filter_blocked`	Block List Filter	`ERR`†	`ERR`§	`ERR`‡	`ERR`‡	`ERR`¶	`ERR`‡
`test_10_ambiguous_query`	Ambiguous Query	✓	✓	✓	✓	null	✓

Domain Filter Error Progression

†, Run 1: "Unknown parameter: 'tools[0].filters.type'" initial schema with type: "domain" key
*, Run 2 test_8: filter_respected: true, 2 “apnews.com” sources, web_search_preview + allowed_domains, only success across all runs
§, Run 2 test_9: "Unknown parameter: 'tools[0].filters.excluded_domains'" first block-list key attempt
‡, R3/R4/R6: "Unsupported parameter 'filters'" after switching to web_search per docs guidance
¶, Run 5 with gpt-5: "Unsupported parameter 'filters'" model change produced identical error

`search_context_size` Latency Detail

	R1	R2	R3
`Low` Latency - ms	10,867	10,251	9,531
`High` Latency - ms	15,984	8,614	11,233
`Low` Source Count	12	12	12
`High` Source Count	12	12	12

Key Findings for OpenAI Web Search, Raw

Raw Test Workflow

Platform Limit Summary

Results Details

Cross-run Tool Invocation

search_context_size Latency Detail

`search_context_size` Latency Detail