#55
Stepfun
Release: 2026-02-01
Tested on: 2026-05-22 00:30
stepfun/step-3.5-flash::medium
(medium)
(none)
7.4
Consistency
9.4
10.0
Total Output Tokens
264,022
Total Input Tokens
33,555
Input Price
$0.090 / 1M
Output Price
$0.300 / 1M
Flaky tests
1
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Run history
| Tested on | Score | Reliability | Tests Correct | Total Cost | Compare |
|---|---|---|---|---|---|
| 2026-05-22 00:30 Suite changed | 7.4 | 9.3 | $0.015 | Current run | |
| 2026-05-08 15:30 Suite changed | 7.6 | 10.0 | $0.011 | Compare | |
| 2026-04-11 00:35 First recorded run | 7.9 | N/A | $0.000 | Compare |
This run used a different benchmark suite. Keep suite changes in mind when reading historical movement.
Price History
Historical pricing data for this model from OpenRouter.
| Date | Input Price | Output Price |
|---|---|---|
| 2026-06-03 21:35 | $0.090 / 1M | $0.300 / 1M |
Charts
Choose the first model, then click a second model to open a side-by-side page.
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Quick Compare
Step 3.5 FlashmediumvsKimi K2.6mediumFree AvailableStep 3.5 FlashmediumvsGLM 5.1mediumStep 3.5 FlashmediumvsGemini 3.1 Flash Lite PreviewnoneStep 3.5 FlashmediumvsGPT-5.3 ChatnoneStep 3.5 FlashmediumvsQwen3.5 Plus 2026-04-20mediumStep 3.5 FlashmediumvsMiMo-V2.5mediumStep 3.5 FlashmediumvsGemini 3 Flash PreviewmediumStep 3.5 FlashmediumvsGemini 3.5 FlashhighStep 3.5 FlashmediumvsRing-2.6-1TmediumStep 3.5 FlashmediumvsGemini 3.5 Flashlow
Category Breakdown
| Category | Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 10.0 | 10.0 | |
| Coding | 3.5 | 7.8 | |
| Combined | 10.0 | 10.0 | |
| Data parsing and extraction | 10.0 | 10.0 | |
| Domain specific | 5.3 | 7.2 | |
| General Intelligence | 5.5 | 10.0 | |
| Instructions following | 8.3 | 10.0 | |
| Puzzle Solving | 5.3 | 10.0 | |
| Tool Calling | 10.0 | 10.0 | |
| Trivia | 3.0 | 10.0 |