Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Opus 4.8 vs OpenAI: GPT-5.4 Nano

Last updated at: 2026-05-28

Metric Claude Opus 4.8 Claude Opus 4.8 none Release: 2026-05-28 GPT-5.4 Nano GPT-5.4 Nano medium Release: 2026-03-17
Score 7.3 7.2
Rank #63 #69
Reliability 10.0 10.0
Consistency 9.2 8.8
Tests Correct
Attempt pass rate 65.0% 63.3%
Flaky tests 2 3
Total Runs 60 60
Cost per result 4.324 0.900
Total Cost $0.519 $0.099
Input Price $5.000 / 1M $0.200 / 1M
Output Price $25.000 / 1M $1.250 / 1M
Output Tokens 8,098 2,993
Reasoning Tokens 0 70,928
Response Time (avg) 3.51s 11.79s
Response Time (max) 17.73s 94.06s
Response Time (total) 70.19s 235.81s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 6.5 10.0 50.0% 0 3.40s 1,472 0
GPT-5.4 Nano 8.3 10.0 75.0% 0 4.52s 683 2,254
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 6.8 10.0 50.0% 0 3.59s 1,323 0
GPT-5.4 Nano 6.8 6.2 66.7% 1 21.10s 495 15,186
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 9.5 10.0 100.0% 0 17.73s 3,259 0
GPT-5.4 Nano 9.8 10.0 100.0% 0 24.13s 349 5,719
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 7.3 5.8 83.3% 1 1.77s 308 0
GPT-5.4 Nano 10.0 10.0 100.0% 0 2.54s 234 516
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 5.3 7.2 44.4% 1 1.66s 61 0
GPT-5.4 Nano 5.9 7.2 55.6% 1 38.18s 60 43,325
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 10.0 10.0 100.0% 0 3.48s 230 0
GPT-5.4 Nano 4.5 10.0 0.0% 0 4.15s 179 443
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 9.9 10.0 100.0% 0 1.37s 95 0
GPT-5.4 Nano 9.8 10.0 100.0% 0 1.88s 95 521
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 7.7 10.0 66.7% 0 2.74s 783 0
GPT-5.4 Nano 4.1 7.2 22.2% 1 3.79s 594 1,408
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 10.0 10.0 100.0% 0 5.35s 355 0
GPT-5.4 Nano 10.0 10.0 100.0% 0 7.71s 234 382
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.8 3.0 10.0 0.0% 0 3.41s 212 0
GPT-5.4 Nano 3.0 10.0 0.0% 0 4.81s 70 1,174

Quick Compare

Switch Comparison Pair