Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Anthropic: Claude Sonnet 4.6 vs OpenAI: GPT-5.2 Chat

Summary

Claude Sonnet 4.6 vs GPT-5.2 Chat benchmark comparison: GPT-5.2 Chat leads on average score with 8.5 vs 7.8. GPT-5.2 Chat has the lower benchmark cost at $0.393 vs $1.418. GPT-5.2 Chat is faster at 7.13s vs 17.06s, with pass rates of 65.1% vs 74.6%.

Recommended model: GPT-5.2 Chat - It has the best score here (8.5), while costing about 3.6x less than Claude Sonnet 4.6.

Last updated at: 2026-07-02

Metric Claude Sonnet 4.6 Claude Sonnet 4.6 medium Release: 2026-02-17 GPT-5.2 Chat GPT-5.2 Chat none Release: 2025-12-11
Score 7.8 8.5
Rank #32 #19
Reliability 10.0 10.0
Consistency 9.1 8.9
Tests Correct
Attempt pass rate 65.1% 74.6%
Flaky tests 2 3
Total Runs 63 63
Cost per result 10.904 2.803
Total Cost $1.418 $0.393
Input Price $3.000 / 1M $1.750 / 1M
Output Price $15.000 / 1M $14.000 / 1M
Total Input Tokens 49,112 34,212
Output Tokens 54,703 23,744
Reasoning Tokens 29,970 0
Response Time (avg) 17.06s 7.13s
Response Time (max) 46.35s 38.52s
Response Time (total) 221.83s 149.69s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#32 Claude Sonnet 4.6

medium
Invalid SVG
Cost
$0.000
Time
300.0s
Tokens
0 tok

#19 GPT-5.2 Chat

none
Cost
$0.010
Time
15.3s
Tokens
797 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 6.5 10.0 50.0% 0 2.98s 789 1,046 1,093
GPT-5.2 Chat 8.7 7.9 91.7% 1 3.40s 606 1,807 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 5.7 6.6 44.4% 1 33.29s 6,995 16,089 3,686
GPT-5.2 Chat 8.8 7.8 88.9% 1 9.82s 7,305 6,731 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 46.35s 18,351 5,871 3,962
GPT-5.2 Chat 10.0 10.0 100.0% 0 9.12s 11,019 1,243 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 13.90s 8,676 649 742
GPT-5.2 Chat 10.0 10.0 100.0% 0 3.05s 7,140 980 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 2.9 7.2 11.1% 1 0ms 471 25,790 16,919
GPT-5.2 Chat 5.3 10.0 33.3% 0 17.78s 723 7,810 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 4.94s 564 256 433
GPT-5.2 Chat 4.4 3.0 33.3% 1 3.20s 477 335 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 2.61s 792 318 552
GPT-5.2 Chat 9.8 10.0 100.0% 0 5.51s 660 1,441 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 5.31s 816 592 646
GPT-5.2 Chat 7.7 10.0 66.7% 0 4.10s 642 1,603 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 7.48s 11,454 655 351
GPT-5.2 Chat 10.0 10.0 100.0% 0 4.68s 5,445 555 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 4.6 3.0 10.0 0.0% 0 30.09s 204 3,437 1,586
GPT-5.2 Chat 3.0 10.0 0.0% 0 6.89s 195 1,239 0

Quick Compare

Switch Comparison Pair