Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

StepFun: Step 3.7 Flash vs Z.ai: GLM 5.1

Last updated at: 2026-05-29

Metric Step 3.7 Flash Step 3.7 Flash low Release: 2026-05-29 GLM 5.1 GLM 5.1 medium Release: 2026-04-07
Score 7.4 7.4
Rank #60 #56
Reliability 10.0 5.0
Consistency 8.7 8.3
Tests Correct
Attempt pass rate 68.3% 71.7%
Flaky tests 3 4
Total Runs 60 60
Cost per result 2.796 2.382
Total Cost $0.336 $0.286
Input Price $0.200 / 1M $0.980 / 1M
Output Price $1.150 / 1M $3.080 / 1M
Output Tokens 285,209 11,511
Reasoning Tokens 0 71,979
Response Time (avg) 16.06s 33.45s
Response Time (max) 124.75s 172.60s
Response Time (total) 321.11s 635.63s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 8.7 7.9 91.7% 1 4.02s 10,896 0
GLM 5.1 10.0 10.0 100.0% 0 8.31s 401 5,122
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 9.43s 14,569 0
GLM 5.1 4.7 1.6 66.7% 2 145.56s 4,727 34,384
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 7.98s 6,426 0
GLM 5.1 9.5 10.0 100.0% 0 43.11s 327 4,206
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 7.3 5.8 83.3% 1 2.29s 2,667 0
GLM 5.1 10.0 10.0 100.0% 0 9.33s 991 4,552
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 5.3 7.2 44.4% 1 43.31s 104,487 0
GLM 5.1 5.3 10.0 33.3% 0 29.77s 969 11,314
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 3.4 9.3 0.0% 0 7.00s 4,604 0
GLM 5.1 10.0 10.0 100.0% 0 20.95s 2,875 2,875
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 9.8 10.0 100.0% 0 1.58s 1,857 0
GLM 5.1 6.4 5.8 66.7% 1 7.47s 204 1,617
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 5.5 9.9 33.3% 0 1.84s 3,564 0
GLM 5.1 8.2 7.2 88.9% 1 31.64s 935 5,730
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 10.0 10.0 100.0% 0 3.25s 1,360 0
GLM 5.1 3.0 10.0 0.0% 0 0ms 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Step 3.7 Flash 3.0 10.0 0.0% 0 124.75s 134,779 0
GLM 5.1 3.0 10.0 0.0% 0 29.40s 82 2,179

Quick Compare

Switch Comparison Pair