Anthropic: Claude Opus 4.8 vs Google: Gemini 3.1 Flash Lite Preview

平均スコアは 7.3 vs 7.3 でほぼ同等です。 Gemini 3.1 Flash Lite Preview (medium) の benchmark コストが低く、$0.115 vs $1.166 です。 Gemini 3.1 Flash Lite Preview (medium) の方が高速で、4.61s vs 4.91s です、成功率は 63.6% vs 59.1% です。

おすすめモデルGemini 3.1 Flash Lite Preview (medium)ここでは最高スコア（7.3）で、Claude Opus 4.8 より約 10.2 倍低コストです。

ベンチマークは AI BENCHY テストスイートから次の日時に生成: 2026-07-17

指標	Claude Opus 4.8 Claude Opus 4.8 none リリース: 2026-05-28	Gemini 3.1 Flash Lite Preview Gemini 3.1 Flash Lite Preview medium リリース: 2026-03-03

指標	Claude Opus 4.8 Claude Opus 4.8 none リリース: 2026-05-28	Gemini 3.1 Flash Lite Preview Gemini 3.1 Flash Lite Preview medium リリース: 2026-03-03
スコア	7.3	7.3
順位	#63	#61
信頼性	10.0	10.0
一貫性	9.2	9.9
正解テスト
試行ごとの合格率	63.6%	59.1%
不安定なテスト	2	0
総実行回数	66	66
結果あたりのコスト	8.969	0.884
合計コスト	$1.166	$0.115
入力価格	$5.000 / 1M	$0.250 / 1M
出力価格	$25.000 / 1M	$1.500 / 1M
合計入力トークン	149,206	117,480
出力トークン	16,797	10,589
推論トークン	0	46,394
応答時間（平均）	4.91s	4.61s
応答時間（最大）	35.03s	18.34s
応答時間（合計）	108.03s	101.39s

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#63 Claude Opus 4.8

none

コスト: $0.053
時間: 22.0s
トークン: 2,253 tok

#61 Gemini 3.1 Flash Lite Preview

medium

コスト: $0.003
時間: 5.2s
トークン: 1,944 tok

スコア上位モデル

スコア vs 総コスト

応答時間（平均）

スコア vs 応答時間（平均）

合計出力トークン

スコア vs 合計出力トークン

カテゴリ内訳

カテゴリ:

反AIトリック	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	6.5	10.0	50.0%	0		3.40s	834	1,472	0
Gemini 3.1 Flash Lite Preview	9.1	10.0	75.0%	0		2.33s	512	570	4,305

コーディング	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	5.5	10.0	33.3%	0		3.29s	10,590	1,332	0
Gemini 3.1 Flash Lite Preview	5.5	10.0	33.3%	0		4.09s	8,126	461	8,597

複合	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	9.8	10.0	100.0%	0		26.38s	111,760	11,949	0
Gemini 3.1 Flash Lite Preview	7.2	9.1	50.0%	0		16.63s	93,097	8,706	16,997

データ解析と抽出	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	7.3	5.8	83.3%	1		1.77s	10,503	308	0
Gemini 3.1 Flash Lite Preview	10.0	10.0	100.0%	0		2.29s	7,362	279	2,952

ドメイン特化	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	5.3	7.2	44.4%	1		1.70s	975	61	0
Gemini 3.1 Flash Lite Preview	3.0	10.0	0.0%	0		4.21s	639	18	5,325

汎用知能	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	10.0	10.0	100.0%	0		3.48s	708	230	0
Gemini 3.1 Flash Lite Preview	10.0	10.0	100.0%	0		3.16s	488	96	1,488

指示追従	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	9.9	10.0	100.0%	0		1.37s	909	95	0
Gemini 3.1 Flash Lite Preview	10.0	10.0	100.0%	0		1.91s	621	72	2,121

パズル解決	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	7.7	10.0	66.7%	0		2.74s	894	783	0
Gemini 3.1 Flash Lite Preview	7.7	10.0	66.7%	0		5.30s	566	141	1,896

ツール呼び出し	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	10.0	10.0	100.0%	0		5.35s	11,775	355	0
Gemini 3.1 Flash Lite Preview	10.0	10.0	100.0%	0		3.80s	5,909	234	912

雑学	スコア	一貫性	試行ごとの合格率	不安定なテスト	正解テスト	応答時間（平均）	入力トークン	出力トークン	推論トークン
Claude Opus 4.8	3.0	10.0	0.0%	0		3.41s	258	212	0
Gemini 3.1 Flash Lite Preview	3.0	10.0	0.0%	0		2.68s	160	12	1,801

クイック比較

比較ペアを切り替え

Claude Opus 4.8nonevsStep 3.7 Flashlow Claude Opus 4.8nonevsKimi K2.6medium Claude Sonnet 4.6nonevsGemini 3.1 Flash Lite Previewmedium Claude Opus 4.8nonevsGemini 3.1 Flash Litemedium Claude Opus 4.8nonevsKAT-Coder-Pro V2.5high Gemini 3.1 Flash Lite PreviewmediumvsKAT-Coder-Pro V2.5low Gemini 3.1 Flash Lite PreviewmediumvsStep 3.7 Flashlow Claude Opus 4.8nonevsQwen3.5 Plus 2026-04-20medium Gemini 3.1 Flash Lite PreviewmediumvsKAT-Coder-Pro V2.5high Claude Opus 4.8nonevsKAT-Coder-Pro V2.5low Claude Opus 4.8nonevsGemini 3 Flash Previewlow Claude Opus 4.8nonevsQwen3.5-122B-A10Bmedium