Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

xAI: Grok Build 0.1 vs Z.ai: GLM 5.1

Last updated at: 2026-05-22

Metric Grok Build 0.1 Grok Build 0.1 medium Release: 2026-05-21 GLM 5.1 GLM 5.1 medium Release: 2026-04-07
Score 7.6 7.4
Rank #45 #51
Reliability 10.0 3.3
Consistency 8.5 8.3
Tests Correct
Attempt pass rate 70.0% 71.7%
Flaky tests 4 4
Total Runs 60 60
Cost per result 5.271 2.379
Total Cost $0.633 $0.286
Input Price $1.000 / 1M $0.980 / 1M
Output Price $2.000 / 1M $3.080 / 1M
Output Tokens 2,167 11,475
Reasoning Tokens 293,436 71,876
Response Time (avg) 26.36s 32.22s
Response Time (max) 103.89s 172.60s
Response Time (total) 527.19s 612.25s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 5.46s 195 9,825
GLM 5.1 10.0 10.0 100.0% 0 8.31s 401 5,122
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 5.3 2.9 50.0% 2 67.43s 574 87,798
GLM 5.1 4.7 1.6 66.7% 2 145.56s 4,727 34,384
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 30.81s 231 18,779
GLM 5.1 9.5 10.0 100.0% 0 43.11s 327 4,206
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 7.76s 180 10,343
GLM 5.1 10.0 10.0 100.0% 0 9.33s 991 4,552
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 5.3 10.0 33.3% 0 77.75s 501 111,807
GLM 5.1 5.3 10.0 33.3% 0 29.77s 969 11,314
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 3.8 2.5 33.3% 1 10.14s 78 5,386
GLM 5.1 10.0 10.0 100.0% 0 20.95s 2,875 2,875
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 9.8 10.0 100.0% 0 9.62s 57 12,436
GLM 5.1 6.4 5.8 66.7% 1 7.47s 204 1,617
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 6.2 7.5 55.6% 1 8.67s 161 15,476
GLM 5.1 8.2 7.2 88.9% 1 23.85s 899 5,627
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 10.0 10.0 100.0% 0 9.40s 180 5,319
GLM 5.1 3.0 10.0 0.0% 0 0ms 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok Build 0.1 3.0 10.0 0.0% 0 26.07s 10 16,267
GLM 5.1 3.0 10.0 0.0% 0 29.40s 82 2,179

Quick Compare

Switch Comparison Pair