AI BENCHY Compare

OpenAI: GPT-5.4 vs Elephant Alpha

Benchmarks gegenereerd uit AI BENCHY-testsuites op: 2026-05-29

Metriek	GPT-5.4 GPT-5.4 none Releasedatum: 2026-03-05	Elephant Alpha Elephant Alpha medium Releasedatum: 2026-04-14

Metriek	GPT-5.4 GPT-5.4 none Releasedatum: 2026-03-05	Elephant Alpha Elephant Alpha medium Releasedatum: 2026-04-14
Score	5.6	5.4
Rang	#120	#127
Betrouwbaarheid	10.0	n.v.t.
Consistentie	9.1	9.6
Correcte tests
Slaagpercentage per poging	38.3%	33.3%
Instabiele tests	2	1
Totaal runs	60	60
Kosten per resultaat	1.644	0.000
Totale kosten	$0.116	$0.000
Invoerprijs	$2.500 / 1M	$0.000 / 1M
Uitvoerprijs	$15.000 / 1M	$0.000 / 1M
Uitvoer-tokens	2,402	2,596
Redeneer-tokens	0	0
Responstijd (gem.)	1.45s	1.27s
Responstijd (max)	2.95s	3.70s
Responstijd (totaal)	29.00s	22.82s

Topmodellen op score

Score vs totale kosten

Responstijd (gem.)

Score vs Responstijd (gem.)

Totaal aantal uitvoer-tokens

Score vs Totaal aantal uitvoer-tokens

Categorie-uitsplitsing

Anti-AI-trucs	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	3.2	8.0	8.3%	1		1.21s	406	0
Elephant Alpha	6.6	10.0	50.0%	0		1.19s	815	0

Programmeren	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	6.8	10.0	50.0%	0		1.99s	501	0
Elephant Alpha	4.0	6.7	16.7%	1		1.30s	365	0

Gecombineerd	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	3.0	10.0	0.0%	0		2.89s	291	0
Elephant Alpha	3.0	10.0	0.0%	0		3.70s	562	0

Gegevensparsering en extractie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	10.0	10.0	100.0%	0		1.04s	222	0
Elephant Alpha	6.5	10.0	50.0%	0		979ms	246	0

Domeinspecifiek	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	5.3	7.2	44.4%	1		1.07s	50	0
Elephant Alpha	3.0	10.0	0.0%	0		925ms	24	0

Algemene intelligentie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	4.4	9.9	0.0%	0		1.78s	184	0
Elephant Alpha	4.3	10.0	0.0%	0		920ms	105	0

Instructies opvolgen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	6.5	10.0	50.0%	0		1.07s	81	0
Elephant Alpha	9.8	10.0	100.0%	0		987ms	82	0

Puzzeloplossing	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	5.6	9.8	33.3%	0		1.44s	381	0
Elephant Alpha	5.3	10.0	33.3%	0		868ms	166	0

Toolaanroepen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	10.0	10.0	100.0%	0		2.75s	246	0
Elephant Alpha	3.0	10.0	0.0%	0		2.83s	231	0

Algemene kennis	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	3.0	10.0	0.0%	0		990ms	40	0
Elephant Alpha	0.0	0.0	0.0%	0		0ms	0	0

Snelle vergelijking

Vergelijkingspaar wisselen

Elephant AlphamediumvsQwen3.5-122B-A10Bnone Elephant AlphamediumvsGrok 4.20none gpt-oss-120bnoneGratis beschikbaarvsElephant Alphamedium Elephant AlphamediumvsGLM 5 Turbonone Kimi K2.5nonevsElephant Alphamedium Ling-2.6-flashnonevsElephant Alphamedium MiniMax M2.5mediumGratis beschikbaarvsGPT-5.4none Elephant AlphamediumvsQwen3.6 Flashnone CobuddymediumvsGPT-5.4none Mistral Small 4mediumvsGPT-5.4none Elephant AlphamediumvsMiMo-V2.5-Pronone MiniMax M2.7mediumvsGPT-5.4none