Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Anthropic: Claude Opus 4.6 vs Google: Gemini 2.5 Flash

Last updated at: 2026-04-07

Metric Claude Opus 4.6 Claude Opus 4.6 medium Release: 2026-02-05 Gemini 2.5 Flash Gemini 2.5 Flash medium Release: 2025-06-17
Score 7.5 8.1
Rank #35 #17
Consistency 9.0 9.5
Tests Correct
Attempt pass rate 68.6% 74.5%
Flaky tests 2 1
Total Runs 51 51
Cost per result 11.973 2.430
Total Cost $1.317 $0.292
Input Price $5.000 / 1M $0.300 / 1M
Output Price $25.000 / 1M $2.500 / 1M
Output Tokens 26,343 1,376
Reasoning Tokens 17,434 111,923
Response Time (avg) 20.87s 11.88s
Response Time (max) 83.40s 95.48s
Response Time (total) 208.73s 201.89s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 6.4 5.8 66.7% 2 7.45s 986 1,071
Gemini 2.5 Flash 8.4 10.0 75.0% 0 6.30s 255 10,233
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 76.66s 8,178 5,194
Gemini 2.5 Flash 10.0 10.0 100.0% 0 28.44s 303 11,922
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 7.37s 691 757
Gemini 2.5 Flash 10.0 10.0 100.0% 0 4.06s 279 2,325
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 3.0 10.0 0.0% 0 83.40s 14,642 8,687
Gemini 2.5 Flash 5.9 7.2 55.6% 1 37.34s 18 80,702
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 5.04s 188 292
Gemini 2.5 Flash 4.8 10.0 0.0% 0 4.86s 92 1,899
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 2.43s 266 467
Gemini 2.5 Flash 9.8 10.0 100.0% 0 2.62s 69 1,203
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 7.7 10.0 66.7% 0 4.60s 531 637
Gemini 2.5 Flash 7.7 10.0 66.7% 0 3.94s 126 2,499
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 9.73s 861 329
Gemini 2.5 Flash 10.0 10.0 100.0% 0 6.20s 234 1,140

Quick Compare

Switch Comparison Pair