Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Opus 4.6 vs Google: Gemini 3.5 Flash

Summary

Claude Opus 4.6 vs Gemini 3.5 Flash benchmark comparison: Claude Opus 4.6 leads on average score with 7.7 vs 7.0. Gemini 3.5 Flash has the lower benchmark cost at $1.079 vs $2.053. Gemini 3.5 Flash is faster at 9.93s vs 25.89s, with pass rates of 61.9% vs 77.8%.

Recommended model: Gemini 3.5 Flash - Its score stays close to the best score here (7.0 vs 7.7), while costing about 1.9x less than Claude Opus 4.6.

Last updated at: 2026-06-18

Metric Claude Opus 4.6 Claude Opus 4.6 medium Release: 2026-02-05 Gemini 3.5 Flash Gemini 3.5 Flash none Release: 2026-05-19
Score 7.7 7.0
Rank #38 #66
Reliability 10.0 10.0
Consistency 8.8 8.9
Tests Correct
Attempt pass rate 61.9% 77.8%
Flaky tests 3 3
Total Runs 63 63
Cost per result 17.103 7.190
Total Cost $2.053 $1.079
Input Price $5.000 / 1M $1.500 / 1M
Output Price $25.000 / 1M $9.000 / 1M
Total Input Tokens 53,227 13,843
Output Tokens 47,446 117,518
Reasoning Tokens 24,000 0
Response Time (avg) 25.89s 9.93s
Response Time (max) 83.40s 64.36s
Response Time (total) 362.49s 178.68s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#38 Claude Opus 4.6

medium
Invalid SVG
Cost
$0.000
Time
300.0s
Tokens
0 tok

#66 Gemini 3.5 Flash

none
Cost
$0.225
Time
125.5s
Tokens
25,004 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 6.4 5.8 66.7% 2 7.45s 840 986 1,071
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.53s 492 5,101 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 5.7 7.1 44.4% 1 30.10s 8,522 13,057 4,121
Gemini 3.5 Flash 8.8 7.8 88.9% 1 34.69s 8,122 75,927 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 76.66s 20,685 8,178 5,194
Gemini 3.5 Flash 3.0 10.0 0.0% 0 0ms 0 0 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 7.37s 8,676 691 757
Gemini 3.5 Flash 6.5 10.0 50.0% 0 8.10s 2,781 5,895 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 3.0 10.0 0.0% 0 83.40s 674 14,642 8,687
Gemini 3.5 Flash 7.6 7.2 77.8% 1 10.64s 633 17,910 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 5.04s 564 188 292
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.46s 486 1,620 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 2.43s 792 266 467
Gemini 3.5 Flash 9.8 10.0 100.0% 0 3.38s 615 3,928 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 7.7 10.0 66.7% 0 4.71s 816 532 630
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.13s 558 4,640 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 10.0 10.0 100.0% 0 9.73s 11,454 861 329
Gemini 3.5 Flash 3.0 10.0 0.0% 0 0ms 0 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.6 3.0 10.0 0.0% 0 63.24s 204 8,045 2,452
Gemini 3.5 Flash 2.8 1.6 33.3% 1 4.87s 156 2,497 0

Quick Compare

Switch Comparison Pair