Anthropic: Claude Opus 4.7 vs Google: Gemini 3 Flash Preview
Summary
Claude Opus 4.7 vs Gemini 3 Flash Preview benchmark comparison: Claude Opus 4.7 leads on average score with 7.4 vs 7.4. Gemini 3 Flash Preview has the lower benchmark cost at $0.111 vs $0.505. Claude Opus 4.7 is faster at 3.02s vs 5.76s, with pass rates of 76.2% vs 79.4%.
Recommended model: Gemini 3 Flash Preview - Its score stays close to the best score here (7.4 vs 7.4), while costing about 4.6x less than Claude Opus 4.7.
Last updated at: 2026-06-18
Metric
Claude Opus 4.7Claude Opus 4.7noneArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-16
7.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#49
#52
Reliability
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
9.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
9.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.02sResponse Time (max)18.27sResponse Time (total)57.44sA test is fully passed only if every run passed for that test.…
A test is fully passed only if every run passed for that test.Wrong answer: 5Response Time (avg)5.76sResponse Time (max)14.72sResponse Time (total)120.93sA test is fully passed only if every run passed for that test.…
Attempt pass rate
76.2%Attempt pass rate = passed attempts / total attempts across runs.…
79.4%Attempt pass rate = passed attempts / total attempts across runs.…
Flaky tests
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
2Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
Total Runs
57Total Runs…
63Total Runs…
Cost per result
3.154Shows the average cost per correct benchmark answer in cents (lower is better).…
0.689Shows the average cost per correct benchmark answer in cents (lower is better).…
Total Cost
$0.505Total Cost (Current Price)…
$0.111Total Cost (Current Price)…
Input Price
$5.000 / 1MInput Price…
$0.500 / 1MInput Price…
Output Price
$25.000 / 1MOutput Price…
$3.000 / 1MOutput Price…
Total Input Tokens
69,576Total Input Tokens…
36,769Total Input Tokens…
Output Tokens
6,265Output Tokens…
2,076Output Tokens…
Reasoning Tokens
0Reasoning Tokens…
28,518Reasoning Tokens…
Response Time (avg)
3.02sResponse Time (avg)…
5.76sResponse Time (avg)…
Response Time (max)
18.27sResponse Time (max)…
14.72sResponse Time (max)…
Response Time (total)
57.44sResponse Time (total)…
120.93sResponse Time (total)…
Generation showcase
Hamster playing table tennis
Prompt: Create a detailed SVG illustration of a hamster playing table tennis.
#49 Claude Opus 4.7
none
Cost
$0.051
Time
24.2s
Tokens
2,181 tok
#52 Gemini 3 Flash Preview
low
Cost
$0.007
Time
12.1s
Tokens
2,289 tok
Score
-
Cost
-
Time
-
Tokens
-
Top Models by Score
Score vs Total Cost
Response Time (avg)
Score vs Response Time (avg)
Total Output Tokens
Score vs Total Output Tokens
Category Breakdown
Anti-AI Tricks
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
8.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
75.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.75sResponse Time (total)8.50sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)4.31sResponse Time (total)13.94sA test is fully passed only if every run passed for that test.…
3.48sResponse Time (avg)…
500Total Input Tokens…
281Output Tokens…
3,082Reasoning Tokens…
Coding
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
3.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
3.3Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.84sResponse Time (max)2.84sResponse Time (total)2.84sA test is fully passed only if every run passed for that test.…
5.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
44.4%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.00sResponse Time (max)6.94sResponse Time (total)18.00sA test is fully passed only if every run passed for that test.…
6.00sResponse Time (avg)…
8,122Total Input Tokens…
456Output Tokens…
7,421Reasoning Tokens…
Combined
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
9.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.27sResponse Time (max)18.27sResponse Time (total)18.27sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27sA test is fully passed only if every run passed for that test.…
3.27sResponse Time (avg)…
12,860Total Input Tokens…
326Output Tokens…
0Reasoning Tokens…
Data parsing and extraction
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.15sResponse Time (max)2.33sResponse Time (total)4.29sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.40sResponse Time (max)14.72sResponse Time (total)18.80sA test is fully passed only if every run passed for that test.…
9.40sResponse Time (avg)…
7,261Total Input Tokens…
279Output Tokens…
3,656Reasoning Tokens…
Domain specific
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
7.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.19sResponse Time (max)1.40sResponse Time (total)3.58sA test is fully passed only if every run passed for that test.…
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
44.4%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.05sResponse Time (max)14.40sResponse Time (total)24.15sA test is fully passed only if every run passed for that test.…
8.05sResponse Time (avg)…
645Total Input Tokens…
12Output Tokens…
6,410Reasoning Tokens…
General Intelligence
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.47sResponse Time (max)3.47sResponse Time (total)3.47sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.68sResponse Time (max)3.68sResponse Time (total)3.68sA test is fully passed only if every run passed for that test.…
3.68sResponse Time (avg)…
492Total Input Tokens…
120Output Tokens…
981Reasoning Tokens…
Instructions following
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)1.68sResponse Time (total)2.91sA test is fully passed only if every run passed for that test.…
9.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.02sResponse Time (max)7.35sResponse Time (total)14.03sA test is fully passed only if every run passed for that test.…
7.02sResponse Time (avg)…
621Total Input Tokens…
71Output Tokens…
2,752Reasoning Tokens…
Puzzle Solving
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)3.72sResponse Time (total)7.38sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.77sResponse Time (max)10.27sResponse Time (total)17.32sA test is fully passed only if every run passed for that test.…
5.77sResponse Time (avg)…
562Total Input Tokens…
288Output Tokens…
3,168Reasoning Tokens…
Tool Calling
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)4.74sResponse Time (total)4.74sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.99sResponse Time (max)4.99sResponse Time (total)4.99sA test is fully passed only if every run passed for that test.…
4.99sResponse Time (avg)…
5,550Total Input Tokens…
234Output Tokens…
415Reasoning Tokens…
Trivia
Score
Consistency
Attempt pass rate
Flaky tests
Tests Correct
Response Time (avg)
Input Tokens
Output Tokens
Reasoning Tokens
Claude Opus 4.7Archived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75sA test is fully passed only if every run passed for that test.…