AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

#108

Step 3.5 Flash

Stepfun Release: 2026-02-01 Tested on: 2026-04-11 01:44 stepfun/step-3.5-flash::none
(medium) (none)

Archived model: this model is no longer updated or tested on new tests.

Score

3.0

Consistency

10.0

Reliability

N/A

Total Cost

$0.000

Total Output Tokens

0

Input Price

$0.100 / 1M

Output Price

$0.300 / 1M

Tests Correct

Wrong Tests: 1

Attempt pass rate: 0.0%

Flaky tests

0

Flaky tests had mixed outcomes across runs (at least one pass and one fail).

Response Time (avg)

0ms

Response Time (max): 0ms

Response Time (total): 0ms

Run history

Tested on Score Reliability Tests Correct Total Cost Compare
2026-05-08 15:30 New test added 7.8 10.0 $0.020 Compare
2026-04-11 01:44 First recorded run 3.0 N/A $0.000 Current run

Run comparison

RunScoreConsistencyReliabilityTests CorrectFlaky testsTotal Output TokensTotal CostResponse Time (avg)
2026-04-11 01:44 · First recorded run3.010.0N/A0/100$0.0000ms
2026-05-08 15:30 · New test added7.810.010.06/9064,795$0.02039.03s
Difference-4.80.0-90-64795-$0.020-39032ms

These two runs used different benchmark suites, so the deltas reflect both model changes and suite changes.

Charts

Choose the first model, then click a second model to open a side-by-side page.

Total Output Tokens

Score vs Total Output Tokens

Quick Compare

Category Breakdown

Category Score Consistency Tests Correct
Coding 3.0 10.0

Compared models