Claude Opus 4.7 vs GPT-5.5 benchmark comparison: GPT-5.5 leads on average score with 9.3 vs 7.4. Claude Opus 4.7 has the lower benchmark cost at $0.505 vs $0.907. Claude Opus 4.7 is faster at 3.02s vs 9.76s, with pass rates of 76.2% vs 85.7%.
Recommended model: Claude Opus 4.7 - It offers the best overall trade-off: a competitive score (7.4), lower cost than GPT-5.5, and balanced response time.
Last updated at: 2026-06-18
Metric
Claude Opus 4.7Claude Opus 4.7noneArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-16
7.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
9.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#49
#4
Reliability
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
9.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.02sResponse Time (max)18.27sResponse Time (total)57.44sA test is fully passed only if every run passed for that test.…
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)9.76sResponse Time (max)56.19sResponse Time (total)204.92sA test is fully passed only if every run passed for that test.…
Attempt pass rate
76.2%Attempt pass rate = passed attempts / total attempts across runs.…
85.7%Attempt pass rate = passed attempts / total attempts across runs.…
Flaky tests
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
Total Runs
57Total Runs…
63Total Runs…
Cost per result
3.154Shows the average cost per correct benchmark answer in cents (lower is better).…
5.035Shows the average cost per correct benchmark answer in cents (lower is better).…
Total Cost
$0.505Total Cost (Current Price)…
$0.907Total Cost (Current Price)…
Input Price
$5.000 / 1MInput Price…
$5.000 / 1MInput Price…
Output Price
$25.000 / 1MOutput Price…
$30.000 / 1MOutput Price…
Total Input Tokens
69,576Total Input Tokens…
34,209Total Input Tokens…
Output Tokens
6,265Output Tokens…
2,046Output Tokens…
Reasoning Tokens
0Reasoning Tokens…
22,460Reasoning Tokens…
Response Time (avg)
3.02sResponse Time (avg)…
9.76sResponse Time (avg)…
Response Time (max)
18.27sResponse Time (max)…
56.19sResponse Time (max)…
Response Time (total)
57.44sResponse Time (total)…
204.92sResponse Time (total)…
Generation showcase
Hamster playing table tennis
Prompt: Create a detailed SVG illustration of a hamster playing table tennis.
#49 Claude Opus 4.7
none
Cost
$0.051
Time
24.2s
Tokens
2,181 tok
#4 GPT-5.5
low
Cost
$0.068
Time
37.0s
Tokens
2,339 tok
Score
-
Cost
-
Time
-
Tokens
-
Top Models by Score
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Category Breakdown
Anti-AI Tricks
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
8.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
75.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.75sResponse Time (total)8.50sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.41sResponse Time (max)6.32sResponse Time (total)17.64sA test is fully passed only if every run passed for that test.…
4.41sResponse Time (avg)…
606Total Input Tokens…
238Output Tokens…
1,020Reasoning Tokens…
Coding
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
3.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
3.3Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.84sResponse Time (max)2.84sResponse Time (total)2.84sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.04sResponse Time (max)21.06sResponse Time (total)45.11sA test is fully passed only if every run passed for that test.…
15.04sResponse Time (avg)…
7,302Total Input Tokens…
423Output Tokens…
6,402Reasoning Tokens…
Combined
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
9.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.27sResponse Time (max)18.27sResponse Time (total)18.27sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.56sResponse Time (max)9.56sResponse Time (total)9.56sA test is fully passed only if every run passed for that test.…
9.56sResponse Time (avg)…
11,019Total Input Tokens…
303Output Tokens…
717Reasoning Tokens…
Data parsing and extraction
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.15sResponse Time (max)2.33sResponse Time (total)4.29sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)5.13sResponse Time (total)6.56sA test is fully passed only if every run passed for that test.…
3.28sResponse Time (avg)…
7,140Total Input Tokens…
228Output Tokens…
157Reasoning Tokens…
Domain specific
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
7.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.19sResponse Time (max)1.40sResponse Time (total)3.58sA test is fully passed only if every run passed for that test.…
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)28.05sResponse Time (max)56.19sResponse Time (total)84.16sA test is fully passed only if every run passed for that test.…
28.05sResponse Time (avg)…
723Total Input Tokens…
69Output Tokens…
11,609Reasoning Tokens…
General Intelligence
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.47sResponse Time (max)3.47sResponse Time (total)3.47sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.17sResponse Time (max)5.17sResponse Time (total)5.17sA test is fully passed only if every run passed for that test.…
5.17sResponse Time (avg)…
477Total Input Tokens…
133Output Tokens…
245Reasoning Tokens…
Instructions following
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)1.68sResponse Time (total)2.91sA test is fully passed only if every run passed for that test.…
9.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)3.99sResponse Time (total)7.48sA test is fully passed only if every run passed for that test.…
3.74sResponse Time (avg)…
660Total Input Tokens…
93Output Tokens…
415Reasoning Tokens…
Puzzle Solving
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)3.72sResponse Time (total)7.38sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)5.61sResponse Time (total)14.21sA test is fully passed only if every run passed for that test.…
4.74sResponse Time (avg)…
642Total Input Tokens…
279Output Tokens…
954Reasoning Tokens…
Tool Calling
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)4.74sResponse Time (total)4.74sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.96sResponse Time (max)4.96sResponse Time (total)4.96sA test is fully passed only if every run passed for that test.…
4.96sResponse Time (avg)…
5,445Total Input Tokens…
250Output Tokens…
101Reasoning Tokens…
Trivia
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.06sResponse Time (max)10.06sResponse Time (total)10.06sA test is fully passed only if every run passed for that test.…