Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Anthropic: Claude Opus 4.8 vs OpenAI: GPT-5 Mini

Summary

Claude Opus 4.8 vs GPT-5 Mini benchmark comparison: GPT-5 Mini leads on average score with 8.5 vs 7.7. GPT-5 Mini has the lower benchmark cost at $0.159 vs $1.270. Claude Opus 4.8 is faster at 10.83s vs 23.64s, with pass rates of 79.4% vs 63.5%.

Recommended model: GPT-5 Mini - It has the best score here (8.5), while costing about 8.0x less than Claude Opus 4.8.

Last updated at: 2026-07-02

Metric Claude Opus 4.8 Claude Opus 4.8 low Release: 2026-05-28 GPT-5 Mini GPT-5 Mini medium Release: 2025-08-07
Score 7.7 8.5
Rank #38 #16
Reliability 10.0 10.0
Consistency 8.8 9.1
Tests Correct
Attempt pass rate 79.4% 63.5%
Flaky tests 3 2
Total Runs 63 63
Cost per result 8.466 1.319
Total Cost $1.270 $0.159
Input Price $5.000 / 1M $0.250 / 1M
Output Price $25.000 / 1M $2.000 / 1M
Total Input Tokens 60,946 37,100
Output Tokens 31,771 6,801
Reasoning Tokens 6,831 67,690
Response Time (avg) 10.83s 23.64s
Response Time (max) 127.97s 88.15s
Response Time (total) 227.39s 496.44s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#38 Claude Opus 4.8

low
Cost
$0.031
Time
14.1s
Tokens
1,345 tok

#16 GPT-5 Mini

medium
Cost
$0.007
Time
42.9s
Tokens
3,432 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 10.0 10.0 100.0% 0 3.30s 834 793 371
GPT-5 Mini 7.1 7.6 66.7% 1 13.86s 606 1,715 6,378
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 6.6 4.6 77.8% 2 7.58s 10,590 3,637 809
GPT-5 Mini 10.0 10.0 100.0% 0 27.63s 7,302 658 17,152
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 9.8 10.0 100.0% 0 20.84s 23,500 2,216 1,081
GPT-5 Mini 10.0 10.0 100.0% 0 88.15s 14,118 754 11,520
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 6.3 5.8 66.7% 1 2.27s 10,503 310 0
GPT-5 Mini 10.0 10.0 100.0% 0 12.58s 7,140 453 3,200
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 5.3 10.0 33.3% 0 45.53s 975 23,311 3,908
GPT-5 Mini 3.6 7.2 22.2% 1 44.63s 515 293 14,016
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 10.0 10.0 100.0% 0 2.55s 708 231 0
GPT-5 Mini 4.5 10.0 0.0% 0 13.50s 477 349 1,856
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 9.8 10.0 100.0% 0 2.78s 909 111 221
GPT-5 Mini 10.0 10.0 100.0% 0 11.59s 660 310 3,968
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 10.0 10.0 100.0% 0 3.01s 894 592 184
GPT-5 Mini 5.6 9.8 33.3% 0 15.20s 642 1,622 6,144
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 10.0 10.0 100.0% 0 6.85s 11,775 370 35
GPT-5 Mini 10.0 10.0 100.0% 0 18.64s 5,445 487 1,600
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.8 3.0 10.0 0.0% 0 5.48s 258 200 222
GPT-5 Mini 3.0 10.0 0.0% 0 9.99s 195 160 1,856

Quick Compare

Switch Comparison Pair