AI BENCHY Compare

Inception: Mercury 2 vs xAI: Grok 4.20

Samenvatting

Benchmarkvergelijking Mercury 2 vs Grok 4.20: Mercury 2 leidt in gemiddelde score met 4.6 vs 4.4. Mercury 2 heeft lagere benchmarkkosten met $0.011 vs $0.057. Mercury 2 is sneller met 653ms vs 1.11s, met slagingspercentages van 23.8% vs 28.6%.

Aanbevolen model: Mercury 2 - Het heeft hier de beste score (4.6) en kost ongeveer 5.5x minder dan Grok 4.20.

Benchmarks gegenereerd uit AI BENCHY-testsuites op: 2026-06-18

Metriek	Mercury 2 Mercury 2 none Releasedatum: 2026-02-24	Grok 4.20 Grok 4.20 none Releasedatum: 2026-03-31

Metriek	Mercury 2 Mercury 2 none Releasedatum: 2026-02-24	Grok 4.20 Grok 4.20 none Releasedatum: 2026-03-31
Score	4.6	4.4
Rang	#151	#155
Betrouwbaarheid	10.0	n.v.t.
Consistentie	9.2	8.5
Correcte tests
Slaagpercentage per poging	23.8%	28.6%
Instabiele tests	2	0
Totaal runs	63	54
Kosten per resultaat	0.259	1.570
Totale kosten	$0.011	$0.057
Invoerprijs	$0.250 / 1M	$1.250 / 1M
Uitvoerprijs	$0.750 / 1M	$2.500 / 1M
Totaal aantal invoer-tokens	28,113	41,313
Uitvoer-tokens	4,439	1,923
Redeneer-tokens	0	0
Responstijd (gem.)	653ms	1.11s
Responstijd (max)	1.43s	6.04s
Responstijd (totaal)	13.72s	19.96s

Generatie-showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#151 Mercury 2

none

Kosten: $0.002
Tijd: 1.8s
Tokens: 1,514 tok

#155 xAI: Grok 4.20

none

Kosten: $0.004
Tijd: 6.5s
Tokens: 1,367 tok

Topmodellen op score

Score vs totale kosten

Responstijd (gem.)

Score vs Responstijd (gem.)

Totaal aantal uitvoer-tokens

Score vs Totaal aantal uitvoer-tokens

Categorie-uitsplitsing

Anti-AI-trucs	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	3.0	10.0	0.0%	0		483ms	631	286	0
Grok 4.20	4.8	10.0	25.0%	0		501ms	1,986	267	0

Programmeren	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	3.4	9.6	0.0%	0		1.03s	7,229	3,088	0
Grok 4.20	1.1	3.1	0.0%	0		1.22s	1,074	312	0

Gecombineerd	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	3.0	10.0	0.0%	0		606ms	4,821	131	0
Grok 4.20	3.0	10.0	0.0%	0		6.04s	17,673	282	0

Gegevensparsering en extractie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	7.3	5.9	83.3%	1		667ms	6,362	180	0
Grok 4.20	10.0	10.0	100.0%	0		522ms	7,749	207	0

Domeinspecifiek	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	5.3	7.2	44.4%	1		534ms	784	46	0
Grok 4.20	3.0	10.0	0.0%	0		687ms	1,746	325	0

Algemene intelligentie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	4.8	10.0	0.0%	0		628ms	495	159	0
Grok 4.20	4.8	10.0	0.0%	0		659ms	819	83	0

Instructies opvolgen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	6.5	10.0	50.0%	0		551ms	691	82	0
Grok 4.20	6.3	10.0	50.0%	0		445ms	1,350	60	0

Puzzeloplossing	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	3.1	10.0	0.0%	0		535ms	694	251	0
Grok 4.20	5.3	10.0	33.3%	0		473ms	1,671	198	0

Toolaanroepen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	10.0	10.0	100.0%	0		1.27s	6,193	197	0
Grok 4.20	10.0	10.0	100.0%	0		4.63s	7,245	189	0

Algemene kennis	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Invoer-tokens	Uitvoer-tokens	Redeneer-tokens
Mercury 2	3.0	10.0	0.0%	0		548ms	213	19	0
Grok 4.20	0.0	0.0	0.0%	0		0ms	0	0	0

Snelle vergelijking

Vergelijkingspaar wisselen

Mercury 2nonevsQwen3 Coder Nextmedium Grok 4.20nonevsGLM 4.7 Flashmedium Mercury 2nonevsMiniMax M2.5medium CobuddymediumvsMercury 2none Qwen3 Coder NextmediumvsGrok 4.20none Mercury 2nonevsGLM 4.7 Flashmedium MiniMax M2.5mediumvsGrok 4.20none Mercury 2nonevsMistral Small 4medium Mercury 2nonevsMiniMax M2.7medium CobuddymediumvsGrok 4.20none Qwen3.5-9BmediumvsGrok 4.20none Mistral Small 4mediumvsGrok 4.20none