Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

OpenAI: gpt-oss-120b vs Z.ai: GLM 5.2

Summary

gpt-oss-120b vs GLM 5.2 benchmark comparison: GLM 5.2 leads on average score with 7.1 vs 6.7. gpt-oss-120b has the lower benchmark cost at $0.013 vs $0.076. GLM 5.2 is faster at 6.34s vs 22.28s, with pass rates of 52.4% vs 60.3%.

Recommended model: gpt-oss-120b - Its score stays close to the best score here (6.7 vs 7.1), while costing about 6.2x less than GLM 5.2.

Last updated at: 2026-06-18

Metric gpt-oss-120b gpt-oss-120b medium Release: 2025-08-05 Free Available GLM 5.2 GLM 5.2 none Release: 2026-06-17
Score 6.7 7.1
Rank #78 #61
Reliability 10.0 9.9
Consistency 8.0 9.6
Tests Correct
Attempt pass rate 52.4% 60.3%
Flaky tests 5 1
Total Runs 63 63
Cost per result 0.141 0.628
Total Cost $0.013 $0.076
Input Price $0.039 / 1M $1.400 / 1M
Output Price $0.180 / 1M $4.400 / 1M
Total Input Tokens 39,084 38,671
Output Tokens 20,013 4,817
Reasoning Tokens 50,233 0
Response Time (avg) 22.28s 6.34s
Response Time (max) 68.16s 20.69s
Response Time (total) 311.96s 133.19s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#78 gpt-oss-120b

medium
Cost
$0.001
Time
26.7s
Tokens
555 tok

#61 GLM 5.2

none
Invalid SVG
Cost
$0.033
Time
87.7s
Tokens
7,455 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 6.7 9.9 50.0% 0 10.21s 1,314 3,518 2,177
GLM 5.2 8.3 10.0 75.0% 0 3.70s 567 313 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 5.9 7.0 55.6% 1 38.37s 7,782 3,365 11,973
GLM 5.2 3.7 9.5 0.0% 0 7.55s 7,263 1,958 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 10.0 10.0 100.0% 0 31.18s 11,535 694 5,072
GLM 5.2 10.0 10.0 100.0% 0 20.69s 14,296 1,489 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 6.4 5.9 66.7% 1 1.98s 7,476 241 1,114
GLM 5.2 10.0 10.0 100.0% 0 7.17s 7,113 204 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 2.9 4.4 22.2% 2 50.92s 1,266 6,784 20,606
GLM 5.2 5.3 10.0 33.3% 0 6.50s 696 27 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 4.3 10.0 0.0% 0 7.90s 659 107 387
GLM 5.2 6.1 3.1 66.7% 1 4.42s 480 82 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 9.9 10.0 100.0% 0 7.63s 1,036 126 1,799
GLM 5.2 9.8 10.0 100.0% 0 3.84s 642 66 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 5.3 7.2 44.4% 1 21.71s 1,190 1,790 2,264
GLM 5.2 7.7 10.0 66.7% 0 3.31s 618 265 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 9.8 10.0 100.0% 0 6.91s 6,514 287 1,083
GLM 5.2 10.0 10.0 100.0% 0 15.76s 6,807 400 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
gpt-oss-120b 3.0 10.0 0.0% 0 26.51s 312 3,101 3,758
GLM 5.2 3.0 10.0 0.0% 0 3.41s 189 13 0

Quick Compare

Switch Comparison Pair