Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

StepFun: Step 3.7 Flash vs xAI: Grok 4.3

Last updated at: 2026-05-29

Metric Step 3.7 Flash Step 3.7 Flash low Release: 2026-05-29 Grok 4.3 Grok 4.3 medium Release: 2026-05-01
Score 7.4 7.8
Rank #60 #36
Reliability 10.0 10.0
Consistency 8.7 8.4
Tests Correct
Attempt pass rate 68.3% 75.0%
Flaky tests 3 4
Total Runs 60 60
Cost per result 2.796 4.557
Total Cost $0.336 $0.593
Input Price $0.200 / 1M $1.250 / 1M
Output Price $1.150 / 1M $2.500 / 1M
Output Tokens 285,209 1,485
Reasoning Tokens 0 214,710
Response Time (avg) 16.06s 49.23s
Response Time (max) 124.75s 216.69s
Response Time (total) 321.11s 984.52s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 8.7 7.9 91.7% 1 4.02s 10,896 0
Grok 4.3 10.0 10.0 100.0% 0 8.83s 88 8,207
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 9.43s 14,569 0
Grok 4.3 7.4 6.5 66.7% 1 55.26s 532 24,554
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 7.98s 6,426 0
Grok 4.3 10.0 10.0 100.0% 0 63.99s 234 15,301
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 7.3 5.8 83.3% 1 2.29s 2,667 0
Grok 4.3 10.0 10.0 100.0% 0 18.97s 180 9,546
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 5.3 7.2 44.4% 1 43.31s 104,487 0
Grok 4.3 5.3 7.2 44.4% 1 181.74s 14 111,300
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 3.4 9.3 0.0% 0 7.00s 4,604 0
Grok 4.3 5.4 2.5 66.7% 1 24.70s 70 5,020
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 9.8 10.0 100.0% 0 1.58s 1,857 0
Grok 4.3 9.8 10.0 100.0% 0 18.58s 57 8,713
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 5.5 9.9 33.3% 0 1.84s 3,564 0
Grok 4.3 5.9 7.2 55.6% 1 22.52s 128 14,468
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 3.25s 1,360 0
Grok 4.3 10.0 10.0 100.0% 0 17.66s 168 4,615
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 3.0 10.0 0.0% 0 124.75s 134,779 0
Grok 4.3 3.0 10.0 0.0% 0 44.47s 14 12,986

Quick Compare

Switch Comparison Pair