Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Anthropic: Claude Sonnet 4.6 vs OpenAI: GPT-5.4

Summary

Claude Sonnet 4.6 vs GPT-5.4 benchmark comparison: GPT-5.4 leads on average score with 8.5 vs 7.3. Claude Sonnet 4.6 has the lower benchmark cost at $0.316 vs $1.210. Claude Sonnet 4.6 is faster at 5.04s vs 22.35s, with pass rates of 55.6% vs 76.2%.

Recommended model: Claude Sonnet 4.6 - It offers the best overall trade-off: a competitive score (7.3), lower cost than GPT-5.4, and balanced response time.

Last updated at: 2026-06-12

Metric Claude Sonnet 4.6 Claude Sonnet 4.6 none Release: 2026-02-17 GPT-5.4 GPT-5.4 medium Release: 2026-03-05
Score 7.3 8.5
Rank #56 #20
Reliability 10.0 10.0
Consistency 9.7 8.6
Tests Correct
Attempt pass rate 55.6% 76.2%
Flaky tests 1 4
Total Runs 63 63
Cost per result 2.870 8.640
Total Cost $0.316 $1.210
Input Price $3.000 / 1M $2.500 / 1M
Output Price $15.000 / 1M $15.000 / 1M
Total Input Tokens 57,886 34,108
Output Tokens 9,465 2,242
Reasoning Tokens 0 72,707
Response Time (avg) 5.04s 22.35s
Response Time (max) 23.84s 100.41s
Response Time (total) 70.60s 469.29s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#56 Claude Sonnet 4.6

none
Cost
$0.038
Time
27.3s
Tokens
2,598 tok

#20 GPT-5.4

medium
Cost
$0.214
Time
199.6s
Tokens
14,349 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 4.8 10.0 25.0% 0 2.94s 636 1,214 0
GPT-5.4 8.3 10.0 75.0% 0 4.11s 606 240 1,511
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 5.5 10.0 33.3% 0 5.19s 8,522 2,127 0
GPT-5.4 8.8 7.8 88.9% 1 44.36s 7,305 433 24,216
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 9.5 10.0 100.0% 0 23.84s 26,024 3,766 0
GPT-5.4 10.0 10.0 100.0% 0 20.57s 11,019 301 3,543
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 3.43s 8,574 252 0
GPT-5.4 10.0 10.0 100.0% 0 5.32s 7,140 234 804
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 7.7 10.0 66.7% 0 3.54s 759 413 0
GPT-5.4 5.3 7.2 44.4% 1 74.27s 619 61 34,748
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 6.1 3.1 66.7% 1 2.56s 513 192 0
GPT-5.4 4.7 3.1 33.3% 1 4.92s 477 145 321
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 6.5 10.0 50.0% 0 1.96s 690 90 0
GPT-5.4 10.0 10.0 100.0% 0 3.11s 660 93 897
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 7.7 10.0 66.7% 0 2.53s 663 533 0
GPT-5.4 8.2 7.2 88.9% 1 9.14s 642 441 3,815
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 4.11s 11,301 447 0
GPT-5.4 10.0 10.0 100.0% 0 13.28s 5,445 264 1,031
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 3.0 10.0 0.0% 0 4.67s 204 431 0
GPT-5.4 3.0 10.0 0.0% 0 13.95s 195 30 1,821

Quick Compare

Switch Comparison Pair