Cascade Framework Reference
This framework generates standardized test prompts and logs CSV results, enabling consistent testing across cases, measurement tracking over time, truncation pattern identification, and web search behavior comparisons across three tracks: interpreted, explicit, and raw
Requirements: Python 3.8+, Windsurf IDE
Topic Guide
Installation
# Clone and/or navigate to `agent-ecosystem-testing` directory
cd agent-ecosystem-testing
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
# Windows: venv\Scripts\activate
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Navigate to the Cascade testing directory
cd windsurf-cascade-web-search
For whatever reason, such as incompatible Python versions or some accidental corruption, use
rm -rf venvto remove thevenvand start over
Workflow
-
List Available Tests
python web_search_testing_framework.py --list-tests -
Generate Test Prompt for a Single Test
Print a formatted test harness with a structured prompt to copy into the Cascade chat window, fields requiring values, and expected size reference:
# Cascade-interpreted track - ask model to report measurements; no @web python web_search_testing_framework.py --test BL-1 --track interpreted # Explicit track - identical to interpreted track, prefixed with @web python web_search_testing_framework.py --test BL-1 --track explicit # Raw track - request verbatim output, no @web python web_search_testing_framework.py --test BL-1 --track raw -
Copy Prompt → Run in Cascade
- Review the terminal output → copy the prompt
- Open Cascade chat window → paste the prompt
- Inspect Cascade’s web search behavior → examine agent output
-
Assess Hypotheses
Before logging test results, assess the run against each hypothesis based on the model’s self-reported metrics and tool visibility output:
ID Description Question H1Character-based truncation
at fixed limitIs there a ceiling at ~10–100 KB? H2Token-based truncation Is there a ceiling at ~2,000 tokens? H3Structure-aware truncation Does truncation fall on Markdown boundaries rather than arbitrary
byte positions?H4@webdirective changes retrieval ceiling, tool chain, or
chunking behaviorIs Cascade’s @webredundant
as Cursor’s@Web?H5view_content_chunkauto-paginates viaDocumentIdwithout explicit promptingDoes the agent fetch with
auto-chunking, or only when reasoned into it? -
Log Results
Run the interactive logger and follow the prompts. Fields are grouped by track: session fields first, then track-specific output fields, then hypothesis and notes. Quotation marks not necessary; optional fields can be skipped with
Enter-# Call the logger python web_search_log_results.py # Logger prompts and validates fields before confirmation ✓ Result logged to results/{track name}/results.csvIf a log command fails before completing, the CSV may be written without headers. Run
python web_search_add_csv_headers.py --track {track name}to recover.Framework fields logged per track:
Column Description Example test_idTest identifier BL-1,SC-2,EC-1timestampISO 8601format2026-03-16T17:05:02.998376dateDate tested 2026-03-16urlFull URL tested https://www.mongodb.com/docs...trackTest track interpreted,raw,explicitmethodRetrieval method cascade-implicit,cascade-explicitmodel_selectorModel selector setting Hybrid Arenamodel_observedModel invoked SWE-1.5,Claude Sonnet 4.5approval_requiredFetch approval prompted? yes/no/unknownpagination_observedview_content_chunkinvoked?yes-auto/yes-prompted/no/unknowninput_est_charsExpected input size in characters 87040hypothesis_matchHypothesis success/failure H1-no,H2-yes,H3-partialwindsurf_versionWindsurf-Cascade version 1.9600.40-pronotesObservations Auto-paginated...output_charsInterpreted/explicit: Cascade-measured output length 27890truncatedInterpreted/explicit: truncation detected yes/notruncation_char_numInterpreted/explicit: character position if truncated 5857tokens_estInterpreted/explicit: estimated token count 16890tools_used**Raw: observed
tool chainread_url_content, view_content_chunktools_blocked**Raw: tools requested but blocked/skipped search_webexecution_attempts**Raw: total tool calls including fallbacks 3cascade_reported_output_chars**Raw: Cascade-measured output character count 9876cascade_reported_truncated**Raw: Cascade-measured truncation status yes/nocascade_reported_truncation_point**Raw: Cascade-measured truncation position 9876cascade_reported_tokens_est**Raw: Cascade-estimated token count 2469cascade_reported_file_size_bytes**Raw: Cascade-measured file size in bytes 4817cascade_reported_md5_checksum**Raw: Cascade-measured MD5 checksum abc123...cascade_reported_lines**Raw: Cascade-measured line count 143cascade_reported_words**Raw: Cascade-measured word count 564cascade_reported_code_blocks**Raw: Cascade-measured code block count 2cascade_reported_table_rows**Raw: Cascade-measured table row count 57cascade_reported_headers**Raw: Cascade-measured header count 4verified_file_size_bytes**Raw: Verifier-measured file size in bytes 4817verified_md5_checksum**Raw: Verifier-measured MD5 checksum d6ad8451d3778bf3544574...verified_total_lines**Raw: Verifier-measured line count 143verified_total_words**Raw: Verifier-measured word count 564verified_tokens**Raw: Verifier-measured token count 197verified_chars_per_token**Raw: Verifier-measured chars/token ratio 4.43verified_code_blocks**Raw: Verifier-measured code block count 2verified_table_rows**Raw: Verifier-measured table row count 57verified_headers**Raw: Verifier-measured header count 4*
cascade-implicitis used for interpreted and raw tracks, no@web;cascade-explicitis used for the explicit track,@webprefixed.**Optional field, raw track only.
cascade_reportedfields may reflect tool output or payload estimates;web_search_verify_raw_results.pycalculates values againstraw_output_{test_id}.txtfiles.
Baseline Testing Path
- Run interpreted track to identify baseline behavioral patterns
- Run explicit track to isolate
@webdirective effect against interpreted baseline - Run raw track for ground truth measurements, verify previous tracks
- Run test a minimum of 5 times/track to capture variance:
| Test IDs | Purpose | Key Question |
|---|---|---|
BL-1BL-2 |
Baseline truncation threshold on small pages |
What is the interpreted vs explicit delta? |
SC-2 |
Code blocks, HTML-to-Markdown conversion |
How does read_url_contenthandle code structure? |
OP-1 |
Fragment identifier navigation | Does Cascade jump to specific section via URL fragment? |
OP-4 |
Auto-pagination hypothesis |
Does view_content_chunk invokeautomatically via DocumentId? |
BL-3 |
Hard ceiling | What is the absolute output limit across read_url_content runs? |
SC-1SC-3SC-4 |
Structured content | Does truncation respect Markdown boundaries? |
EC-1EC-3EC-6 |
Edge cases | What are the failure modes and approval-gating edge behaviors? |
Raw track only: rename raw output files to capture variance; if results are consistent, remove files to prevent test contamination between runs
Analyzing Results
Examine truncation analysis, method and track comparison, hypothesis matching:
# Single track — full analysis or summary
python web_search_results_analyzer.py --csv results/cascade-interpreted/results.csv --full
python web_search_results_analyzer.py --csv results/cascade-interpreted/results.csv --summary
python web_search_results_analyzer.py --csv results/raw/results.csv --full
python web_search_results_analyzer.py --csv results/explicit/results.csv --summary
# Filter by method
python cascade_web_search_results_analyzer.py \
--csv results/cascade-interpreted/results.csv --method "cascade-explicit"
# Compare implicit and explicit tracks to observe @web effect
python web_search_results_analyzer.py \
--csv results/cascade-interpreted/results.csv results/explicit/results.csv --full
# Compare all three tracks
python web_search_results_analyzer.py \
--csv results/cascade-interpreted/results.csv \
results/raw/results.csv \
results/explicit/results.csv --full
Provide full relative path, including subdirectory:
results/cascade-interpreted/results.csv,results/raw/results.csv, orresults/explicit/results.csv
Agent Ecosystem Testing