AI BENCHY
Compare
❤️ Made by XCS

Model Name

StepFun: Step 3.5 Flash

Last updated at : Feb 19, 2026

Metric StepFun: Step 3.5 Flash
Rank#18
CompanyStepfun
Score 4.92
Consistency 7.34
Cost per result 0.0000
Total Cost $0.00000
Tests Correct 5/12
Attempt pass rate 58.3%
Flaky tests 4
Output Tokens 46,871
Reasoning Tokens 95,440

Category Breakdown

Category Fully passed tests Score Consistency Attempt pass rate Flaky tests Reasoning score Cost
Anti-AI Tricks 1/2 5.50 5.81 83.3% 1 10.00 $0.00000
Data parsing and extraction 1/2 5.00 10.00 50.0% 0 9.75 $0.00000
Domain specific 1/3 4.00 7.21 44.4% 1 8.44 $0.00000
Instructions following 2/2 10.00 10.00 100.0% 0 9.67 $0.00000
Puzzle Solving 0/3 2.00 4.96 33.3% 2 9.22 $0.00000

Compared models

Compare StepFun: Step 3.5 Flash against...

#17 · MiniMax

MiniMax: MiniMax M2.5

Reasoning (medium)

Score: 5.08

Consistency: 6.00

Attempt pass rate: 61.1%

Flaky tests: 6

Cost per result: 4.0276

Tests Correct: 5/12

Total Cost: $0.20138

Compare

#19 · OpenAI

OpenAI: GPT-4o-mini

No Reasoning

Score: 4.00

Consistency: 9.98

Attempt pass rate: 25.0%

Flaky tests: 0

Cost per result: 0.0576

Tests Correct: 3/12

Total Cost: $0.00173

Compare

#16 · Anthropic

Anthropic: Claude Opus 4.6

Reasoning (medium)

Score: 5.42

Consistency: 8.60

Attempt pass rate: 55.5%

Flaky tests: 2

Cost per result: 12.8695

Tests Correct: 6/12

Total Cost: $0.77217

Compare

Quick Compare

Compare StepFun: Step 3.5 Flash against...