AI BENCHY Compare

Google: Gemini 3.1 Pro Preview vs Grok 4.20 Beta

Benchmarks gegenereerd uit AI BENCHY-testsuites op: 2026-04-26

Metriek	Gemini 3.1 Pro Preview Gemini 3.1 Pro Preview medium Releasedatum: 2026-02-19	Grok 4.20 Beta Grok 4.20 Beta none Releasedatum: 2026-03-12

Metriek	Gemini 3.1 Pro Preview Gemini 3.1 Pro Preview medium Releasedatum: 2026-02-19	Grok 4.20 Beta Grok 4.20 Beta none Releasedatum: 2026-03-12
Score	9.6	5.3
Rang	#2	#93
Betrouwbaarheid	n.v.t.	n.v.t.
Consistentie	10.0	9.2
Correcte tests
Slaagpercentage per poging	94.4%	29.6%
Instabiele tests	0	2
Totaal runs	54	52
Kosten per resultaat	3.400	2.255
Totale kosten	$0.578	$0.091
Invoerprijs	$2.000 / 1M	$0.000 / 1M
Uitvoerprijs	$12.000 / 1M	$0.000 / 1M
Uitvoer-tokens	1,932	1,591
Redeneer-tokens	40,542	0
Responstijd (gem.)	15.96s	1.19s
Responstijd (max)	40.61s	6.48s
Responstijd (totaal)	175.52s	21.37s

Topmodellen op score

Score vs totale kosten

Responstijd (gem.)

Score vs Responstijd (gem.)

Totaal aantal uitvoer-tokens

Score vs Totaal aantal uitvoer-tokens

Categorie-uitsplitsing

Anti-AI-trucs	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		7.90s	112	3,218
Grok 4.20 Beta	4.0	8.4	16.7%	1		597ms	251	0

Programmeren	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		19.88s	405	4,201
Grok 4.20 Beta	5.5	10.0	0.0%	0		1.14s	74	0

Gecombineerd	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	9.5	10.0	100.0%	0		40.61s	432	9,281
Grok 4.20 Beta	3.0	10.0	0.0%	0		6.48s	282	0

Gegevensparsering en extractie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		7.72s	279	3,904
Grok 4.20 Beta	10.0	10.0	100.0%	0		601ms	197	0

Domeinspecifiek	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	7.7	10.0	66.7%	0		32.73s	18	12,424
Grok 4.20 Beta	3.0	10.0	0.0%	0		611ms	160	0

Algemene intelligentie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		11.77s	108	1,179
Grok 4.20 Beta	5.0	10.0	0.0%	0		541ms	87	0

Instructies opvolgen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		9.56s	72	2,236
Grok 4.20 Beta	4.8	10.0	0.0%	0		687ms	60	0

Puzzeloplossing	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		7.15s	232	3,117
Grok 4.20 Beta	5.9	7.2	55.6%	1		541ms	291	0

Toolaanroepen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
Gemini 3.1 Pro Preview	10.0	10.0	100.0%	0		23.15s	274	982
Grok 4.20 Beta	10.0	10.0	100.0%	0		4.79s	189	0

Snelle vergelijking

Vergelijkingspaar wisselen

Claude Opus 4.7nonevsGemini 3.1 Pro Previewmedium Gemini 3.1 Pro PreviewmediumvsGPT-5.5low Gemini 3.1 Pro PreviewmediumvsHY3 PreviewhighGratis beschikbaar Gemini 3.1 Pro PreviewmediumvsHY3 PreviewlowGratis beschikbaar Gemini 3.1 Pro PreviewmediumvsGPT-5.2 Chatnone DeepSeek V4 FlashhighvsGemini 3.1 Pro Previewmedium Gemini 3.1 Pro PreviewmediumvsGPT-5.3 Chatnone Claude Sonnet 4.6nonevsGemini 3.1 Pro Previewmedium DeepSeek V4 ProhighvsGemini 3.1 Pro Previewmedium Gemini 3.1 Pro PreviewmediumvsQwen3.5 Plus 2026-02-15none Gemini 3.1 Pro PreviewmediumvsGPT-5.5none DeepSeek V4 PrononevsGemini 3.1 Pro Previewmedium