Navigate
AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: GPT-5.4 vs Z.ai: GLM 5

Compare:

Last updated at: 2026-03-06

Metric OpenAI: GPT-5.4 medium Release: 2026-03-05 Z.ai: GLM 5 none Release: 2026-02-12
Avg Score 8.2 5.8
Rank #7 #32
Tests Correct
Consistency 8.9 10.0
Cost per result 6.533 0.219
Total Cost $0.784 $0.018
Attempt pass rate 86.7% 53.3%
Flaky tests 2 0
common.totalRuns 45 (15 x 3) 45 (15 x 3)
Output Tokens 1,611 1,445
Reasoning Tokens 46,321 0
Response Time (avg) 21.06s 4.13s
Response Time (max) 100.41s 11.07s
Response Time (total) 315.95s 33.03s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 5.02s 216 1,466
Z.ai: GLM 5 4.0 10.0 33.3% 0 3.39s 272 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 20.57s 301 3,543
Z.ai: GLM 5 10.0 10.0 0.0% 0 4.98s 406 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 9.9 10.0 100.0% 0 5.32s 234 804
Z.ai: GLM 5 9.9 10.0 100.0% 0 5.78s 203 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 4.0 7.2 44.4% 1 74.27s 61 34,748
Z.ai: GLM 5 10.0 10.0 0.0% 0 2.24s 19 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 3.11s 93 897
Z.ai: GLM 5 10.0 10.0 100.0% 0 1.48s 61 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 7.0 7.2 88.9% 1 9.13s 442 3,832
Z.ai: GLM 5 7.0 10.0 66.7% 0 2.05s 264 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 13.28s 264 1,031
Z.ai: GLM 5 10.0 10.0 100.0% 0 11.07s 220 0

Quick Compare

Switch Comparison Pair