Cursor Framework Reference
This framework generates standardized test prompts and logs CSV results, enabling consistent testing across cases, measurement tracking over time, truncation pattern identification, and fetch method comparisons
Requirements: Python 3.8+, Cursor IDE
Topic Guide
Installation
# Clone and/or navigate to `agent-ecosystem-testing` directory
cd agent-ecosystem-testing
# Create virtual environment
python3 -m venv venv
# Activate virtual environment
# Windows: venv\Scripts\activate
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Navigate to the Cursor testing directory
cd cursor-web-fetch
For whatever reason, such as incompatible Python versions or some accidental corruption, use
rm -rf venvto remove thevenvand start over
Workflow
-
List Available Tests
python web_fetch_testing_framework.py --list-tests -
Generate Test Prompt for a Single Test
Print a formatted test harness with a structured prompt to copy into the Cursor chat window, fields requiring values, and expected size reference:
# Cursor-interpreted track - ask model to report measurements python web_fetch_testing_framework.py --test BL-1 --track interpreted # Raw track - request verbatim output python web_fetch_testing_framework.py --test BL-2 --track raw -
Copy Prompt → Run in Cursor
- Review the Terminal output → copy the prompt
- Open Cursor chat window → paste the prompt
- Review Cursor’s fetch behavior → examine the response
-
Log Results
Depending on the track, results stored in
cursor-web-fetch/results/{track}/results.csvwith the following fields:Column Description Example test_idTest identifier BL-1,SC-2,EC-1timestampISO 8601 timestamp 2026-03-16T17:05:02.998376 dateDate tested 2026-03-16 urlFull URL tested https://www.mongodb.com/docs...methodFetch method @Web*modelModel used Auto- Cursor’s agent routerinput_est_charsExpected input size 87040 output_charsActual output length, chars via wc -m27890 truncatedTruncation detected yes/no truncation_char_numCharacter position if truncated 5857 tokensToken count via tiktoken16890 hypothesis_matchHypothesis matched H1-no, H2-yes, H3-yes notesObservations and findings Pro-plan retry: successfully… trackTest track interpreted/raw cursor_versionCursor IDE version 2.6.19, 2.6.19-pro file_size_bytes**Exact file size via ls -l28158 md5_checksum**MD5 of saved output file d542d945f2b5dc15c5254d… total_lines**Line count 979 total_words**Word count 4871 code_blocks**Fenced code block count 24 table_rows**Table row count 87 headers**Header count 63 *
@Webis a Cursor UI composer feature, but the underlying mechanisms isWebFetchormcp_web_fetch- more information in the Friction Note;
**Optional field, measurement for raw track results only
Key Hypotheses:
H1: Character-based truncation at fixed limit, ~10-100KB?H2: Token-based truncation, ~2000 tokens?H3: Structure-aware truncation, respects Markdown boundariesH4: MCP servers override native@Weblimits*H5: Agent auto-chunks after truncation, requests next chunk automatically
*
@Webmay route tomcp_web_fetchinternally; mechanism is agent’s choice and not user-controllable; H4 not testable through@Webalone, visit the Friction Note
# Log interpreted track result
python web_fetch_testing_framework.py --log BL-1 \
--track interpreted \
--method @Web \
--model "Auto" \
--cursor-version "2.6.19" \
--output-chars 48500 \
--truncated no \
--tokens 12000 \
--hypothesis "H1-no" \
--notes "Full content returned, no truncation observed..."
# Verify key metrics before logging raw track runs
python web_fetch_verify_raw_results.py BL-1
# Log raw track result
python web_fetch_testing_framework.py --log BL-1 \
--track raw \
--method @Web \
--model "Auto" \
--cursor-version "2.6.19" \
--output-chars 9876 \
--truncated yes \
--truncation-point 9876 \
--tokens 2469 \
--hypothesis "H1-yes" \
--file-size-bytes 4817 \
--md5-checksum "d6ad8451d3778bf3544574431203a3a7" \
--total-lines 143 \
--total-words 564 \
--code-blocks 2 \
--table-rows 57 \
--headers 4 \
--notes "@Web returns converted..."
Ensure to provide all required flags:
--method,--model,--cursor-version,--output-chars,--truncated,--tokens,--hypothesis
Baseline Testing Path
Complete the interpreted track first to establish behavioral observations, then run the raw track for exact measurements. Run each test ID a minimum of 3 times to capture variance on both tracks:
BL-1,BL-2: baseline, quick wins establish basic truncation thresholdSC-2: code blocks, tests HTML-to-Markdown conversionOP-3:@Webvs MCP, do MCP servers have different limits?*OP-4: auto-chunking, determines DX and key ecosystem testing gapBL-3: hard ceiling to identify absolute limitSC-1, SC-3, SC-4: structured content to test structure-aware truncation hypothesisEC-1,EC-3,EC-6: edge cases to identify failure modes and unusual inputs
While the interpreted track captures Cursor’s self-report and perceived completeness, the raw track provides ground truth measurements for validation. Cross-referencing reveals where Cursor’s self-assessment diverges from reality. Comprehensive truncation pattern analysis requires both datasets.
*OP-3 not executable as designed;
@Webmay route tomcp_web_fetch; the two “sides” of the comparison aren’t separable through@Webalone; see Friction Note
Analyzing Results
Examine truncation threshold analysis, method comparison, interpretive vs raw track comparisons, hypothesis matching -
# Generate full analysis report
python web_fetch_results_analyzer.py --csv results.csv --full
# Generate summary
python web_fetch_results_analyzer.py --csv results.csv --summary
# Analyze specific methods
python web_fetch_results_analyzer.py --csv results.csv --method "@Web"
Provide the full relative path to the CSV file when running the analyzer, including the subdirectory:
results/cursor-interpreted/results.csvorresults/raw/results.csv