AI BENCHY Compare

Anthropic: Claude Opus 4.6 vs Google: Gemini 3.1 Pro Preview

Summary

Claude Opus 4.6 vs Gemini 3.1 Pro Preview benchmark comparison: Gemini 3.1 Pro Preview leads on average score with 9.2 vs 7.7. Gemini 3.1 Pro Preview has the lower benchmark cost at $1.054 vs $2.053. Gemini 3.1 Pro Preview is faster at 20.14s vs 25.89s, with pass rates of 61.9% vs 90.5%.

Recommended model: Gemini 3.1 Pro Preview - It has the best score here (9.2), while costing about 1.9x less than Claude Opus 4.6.

Last updated at: 2026-06-18

Metric	Claude Opus 4.6 Claude Opus 4.6 medium Release: 2026-02-05	Gemini 3.1 Pro Preview Gemini 3.1 Pro Preview medium Release: 2026-02-19

Metric	Claude Opus 4.6 Claude Opus 4.6 medium Release: 2026-02-05	Gemini 3.1 Pro Preview Gemini 3.1 Pro Preview medium Release: 2026-02-19
Score	7.7	9.2
Rank	#38	#7
Reliability	10.0	10.0
Consistency	8.8	10.0
Tests Correct
Attempt pass rate	61.9%	90.5%
Flaky tests	3	0
Total Runs	63	63
Cost per result	17.103	5.546
Total Cost	$2.053	$1.054
Input Price	$5.000 / 1M	$2.000 / 1M
Output Price	$25.000 / 1M	$12.000 / 1M
Total Input Tokens	53,227	41,617
Output Tokens	47,446	1,977
Reasoning Tokens	24,000	78,896
Response Time (avg)	25.89s	20.14s
Response Time (max)	83.40s	88.68s
Response Time (total)	362.49s	281.92s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#38 Claude Opus 4.6

medium

Invalid SVG

Cost: $0.000
Time: 300.0s
Tokens: 0 tok

#7 Gemini 3.1 Pro Preview

medium

Cost: $0.115
Time: 87.2s
Tokens: 9,629 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	6.4	5.8	66.7%	2		7.45s	840	986	1,071
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		7.90s	498	112	3,218

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	5.7	7.1	44.4%	1		30.10s	8,522	13,057	4,121
Gemini 3.1 Pro Preview	7.9	9.9	66.7%	0		40.17s	8,124	435	41,247

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	10.0	10.0	100.0%	0		76.66s	20,685	8,178	5,194
Gemini 3.1 Pro Preview	9.5	10.0	100.0%	0		40.61s	17,240	432	9,281

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	10.0	10.0	100.0%	0		7.37s	8,676	691	757
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		7.72s	7,265	279	3,904

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	3.0	10.0	0.0%	0		83.40s	674	14,642	8,687
Gemini 3.1 Pro Preview	7.7	10.0	66.7%	0		32.73s	635	18	12,424

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	10.0	10.0	100.0%	0		5.04s	564	188	292
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		11.77s	490	108	1,179

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	10.0	10.0	100.0%	0		2.43s	792	266	467
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		9.56s	621	72	2,236

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	7.7	10.0	66.7%	0		4.71s	816	532	630
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		6.90s	570	235	3,128

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	10.0	10.0	100.0%	0		9.73s	11,454	861	329
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		23.15s	6,018	274	982

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
Claude Opus 4.6	3.0	10.0	0.0%	0		63.24s	204	8,045	2,452
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		6.27s	156	12	1,297

Quick Compare

Switch Comparison Pair