#13
Stepfun ยท Release: 2026-02-01 ยท stepfun/step-3.5-flash::medium
Flaky tests
2
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Did not follow instructions: 3 Wrong answer: 3
Charts
Choose the first model, then click a second model to open a side-by-side page.
Quick Compare
Step 3.5 FlashmediumFree AvailablevsGemini 3.1 Flash Lite PreviewmediumStep 3.5 FlashmediumFree AvailablevsGLM 5mediumStep 3.5 FlashmediumFree AvailablevsClaude Sonnet 4.6mediumStep 3.5 FlashmediumFree AvailablevsGPT-5.2 ChatnoneStep 3.5 FlashmediumFree AvailablevsQwen3.5-122B-A10BmediumStep 3.5 FlashmediumFree AvailablevsGemini 2.5 FlashmediumStep 3.5 FlashmediumFree AvailablevsGemini 3 Flash PreviewmediumStep 3.5 FlashmediumFree AvailablevsGemini 3.1 Pro PreviewmediumStep 3.5 FlashmediumFree AvailablevsTrinity Large PreviewnoneFree Available
Category Breakdown
| Category | Avg Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 10.0 | 10.0 | |
| Combined | 10.0 | 10.0 | |
| Data parsing and extraction | 10.0 | 10.0 | |
| Domain specific | 4.0 | 7.2 | |
| General Intelligence | 6.0 | 10.0 | |
| Instructions following | 9.0 | 6.8 | |
| Puzzle Solving | 4.0 | 10.0 | |
| Tool Calling | 10.0 | 10.0 |