Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Anthropic: Claude Sonnet 5 vs StepFun: Step 3.7 Flash

Summary

Claude Sonnet 5 vs Step 3.7 Flash benchmark comparison: Claude Sonnet 5 leads on average score with 7.9 vs 7.1. Claude Sonnet 5 has the lower benchmark cost at $0.550 vs $1.148. Claude Sonnet 5 is faster at 9.94s vs 64.46s, with pass rates of 79.4% vs 63.5%.

Recommended model: Claude Sonnet 5 - It has the best score here (7.9), while costing about 2.1x less than Step 3.7 Flash.

Last updated at: 2026-06-30

Metric Claude Sonnet 5 Claude Sonnet 5 medium Release: 2026-06-30 Step 3.7 Flash Step 3.7 Flash high Release: 2026-05-29
Score 7.9 7.1
Rank #30 #65
Reliability 10.0 10.0
Consistency 9.0 8.2
Tests Correct
Attempt pass rate 79.4% 63.5%
Flaky tests 3 4
Total Runs 63 63
Cost per result 3.662 10.434
Total Cost $0.550 $1.148
Input Price $2.000 / 1M $0.200 / 1M
Output Price $10.000 / 1M $1.150 / 1M
Total Input Tokens 67,416 38,391
Output Tokens 34,012 991,355
Reasoning Tokens 7,673 0
Response Time (avg) 9.94s 64.46s
Response Time (max) 56.94s 364.99s
Response Time (total) 208.71s 1353.57s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#30 Claude Sonnet 5

medium
Cost
$0.007
Time
6.4s
Tokens
832 tok

#65 Step 3.7 Flash

high
Cost
$0.007
Time
63.6s
Tokens
6,030 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 3.80s 834 1,220 446
Step 3.7 Flash 10.0 10.0 100.0% 0 13.40s 696 42,656 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 9.0 7.9 88.9% 1 17.28s 10,590 13,153 2,379
Step 3.7 Flash 4.0 6.0 22.2% 1 206.21s 6,057 327,340 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.5 2.1 66.7% 1 37.01s 29,394 4,848 2,170
Step 3.7 Flash 10.0 10.0 100.0% 0 13.01s 13,638 8,802 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 3.16s 10,503 312 0
Step 3.7 Flash 10.0 10.0 100.0% 0 14.72s 7,368 23,113 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 7.7 10.0 66.7% 0 20.38s 975 12,140 1,994
Step 3.7 Flash 4.1 4.4 44.5% 2 149.64s 783 410,502 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.8 3.2 33.3% 1 4.32s 708 264 0
Step 3.7 Flash 5.5 10.0 0.0% 0 4.17s 510 2,862 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 9.9 10.0 100.0% 0 3.10s 909 318 269
Step 3.7 Flash 9.8 10.0 100.0% 0 1.52s 705 2,010 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 7.7 10.0 66.7% 0 2.98s 894 407 121
Step 3.7 Flash 5.3 7.2 44.4% 1 10.22s 711 25,422 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 10.70s 12,351 433 90
Step 3.7 Flash 10.0 10.0 100.0% 0 2.79s 7,701 1,172 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 7.06s 258 917 204
Step 3.7 Flash 3.0 10.0 0.0% 0 149.34s 222 147,476 0

Quick Compare

Switch Comparison Pair