Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Inception: Mercury 2 vs Z.ai: GLM 5

Last updated at: 2026-06-03

Metric Mercury 2 Mercury 2 medium Release: 2026-02-24 GLM 5 GLM 5 none Release: 2026-02-12
Score 6.5 6.3
Rank #89 #96
Reliability 10.0 10.0
Consistency 8.8 9.7
Tests Correct
Attempt pass rate 51.7% 46.7%
Flaky tests 3 1
Total Runs 60 60
Cost per result 0.611 0.246
Total Cost $0.055 $0.025
Input Price $0.250 / 1M $0.600 / 1M
Output Price $0.750 / 1M $1.920 / 1M
Total Input Tokens 32,570 34,537
Output Tokens 4,022 1,985
Reasoning Tokens 58,405 0
Response Time (avg) 2.27s 3.95s
Response Time (max) 14.63s 11.07s
Response Time (total) 43.20s 51.38s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 6.9 9.9 50.0% 0 1.12s 554 2,546 2,609
GLM 5 4.8 10.0 25.0% 0 2.37s 510 275 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 7.2 6.5 66.7% 1 2.29s 4,519 270 8,514
GLM 5 4.6 6.8 16.7% 1 5.18s 4,658 424 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 10.0 10.0 100.0% 0 3.28s 12,909 268 4,887
GLM 5 3.0 10.0 0.0% 0 4.98s 12,812 406 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 7.3 5.9 83.3% 1 1.11s 6,234 183 1,656
GLM 5 10.0 10.0 100.0% 0 5.78s 7,107 203 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 2.9 7.2 11.1% 1 6.48s 695 41 30,754
GLM 5 3.0 10.0 0.0% 0 2.24s 643 19 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 4.8 10.0 0.0% 0 821ms 456 137 542
GLM 5 10.0 10.0 100.0% 0 3.27s 477 103 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 10.0 10.0 100.0% 0 1.07s 340 14 958
GLM 5 10.0 10.0 100.0% 0 1.48s 636 61 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 5.4 10.0 33.3% 0 949ms 601 361 2,781
GLM 5 7.7 10.0 66.7% 0 1.91s 609 261 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 10.0 10.0 100.0% 0 1.89s 6,080 180 1,956
GLM 5 10.0 10.0 100.0% 0 11.07s 6,899 220 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Mercury 2 3.0 10.0 0.0% 0 2.58s 182 22 3,748
GLM 5 3.0 10.0 0.0% 0 3.62s 186 13 0

Quick Compare

Switch Comparison Pair