AI BENCHY Compare

OpenAI: GPT-5.4 vs Owl Alpha

Benchmarks gegenereerd uit AI BENCHY-testsuites op: 2026-04-30

Metriek	GPT-5.4 GPT-5.4 none Releasedatum: 2026-03-05	Owl Alpha Owl Alpha medium Releasedatum: 2026-04-30

Metriek	GPT-5.4 GPT-5.4 none Releasedatum: 2026-03-05	Owl Alpha Owl Alpha medium Releasedatum: 2026-04-30
Score	5.9	5.8
Rang	#89	#91
Betrouwbaarheid	n.v.t.	10.0
Consistentie	9.1	9.5
Correcte tests
Slaagpercentage per poging	42.6%	40.7%
Instabiele tests	2	1
Totaal runs	54	54
Kosten per resultaat	1.477	0.000
Totale kosten	$0.104	$0.000
Invoerprijs	$2.500 / 1M	$0.000 / 1M
Uitvoerprijs	$15.000 / 1M	$0.000 / 1M
Uitvoer-tokens	2,317	1,596
Redeneer-tokens	0	0
Responstijd (gem.)	1.51s	11.04s
Responstijd (max)	2.95s	58.63s
Responstijd (totaal)	27.21s	198.65s

Topmodellen op score

Score vs totale kosten

Responstijd (gem.)

Score vs Responstijd (gem.)

Totaal aantal uitvoer-tokens

Score vs Totaal aantal uitvoer-tokens

Categorie-uitsplitsing

Anti-AI-trucs	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	3.2	8.0	8.3%	1		1.21s	406	0
Owl Alpha	4.8	10.0	25.0%	0		3.97s	87	0

Programmeren	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	10.0	10.0	100.0%	0		2.95s	480	0
Owl Alpha	10.0	10.0	100.0%	0		7.35s	402	0

Gecombineerd	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	3.0	10.0	0.0%	0		2.89s	291	0
Owl Alpha	3.0	10.0	0.0%	0		10.01s	315	0

Gegevensparsering en extractie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	10.0	10.0	100.0%	0		1.04s	222	0
Owl Alpha	10.0	10.0	100.0%	0		21.64s	246	0

Domeinspecifiek	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	5.3	7.2	44.4%	1		1.07s	50	0
Owl Alpha	5.3	10.0	33.3%	0		8.58s	28	0

Algemene intelligentie	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	4.4	9.9	0.0%	0		1.78s	184	0
Owl Alpha	4.3	10.0	0.0%	0		58.63s	98	0

Instructies opvolgen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	6.5	10.0	50.0%	0		1.07s	81	0
Owl Alpha	6.3	10.0	50.0%	0		9.59s	57	0

Puzzeloplossing	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	5.6	9.8	33.3%	0		1.52s	357	0
Owl Alpha	3.4	7.2	11.1%	1		3.44s	135	0

Toolaanroepen	Score	Consistentie	Slaagpercentage per poging	Instabiele tests	Correcte tests	Responstijd (gem.)	Uitvoer-tokens	Redeneer-tokens
GPT-5.4	10.0	10.0	100.0%	0		2.75s	246	0
Owl Alpha	10.0	10.0	100.0%	0		8.26s	228	0

Snelle vergelijking

Vergelijkingspaar wisselen

Kimi K2.6nonevsOwl Alphamedium Owl AlphamediumvsQwen3.5-122B-A10Bnone Owl AlphamediumvsQwen3.5 Plus 2026-04-20none Owl AlphamediumvsMiMo-V2.5-Pronone Owl AlphamediumvsQwen3.6 Flashnone Owl AlphamediumvsGLM 5.1none Owl AlphamediumvsMiMo-V2-Pronone DeepSeek V3.2nonevsOwl Alphamedium Owl AlphamediumvsQwen3.5-27Bnone Owl AlphamediumvsQwen3.6 27Bnone DeepSeek V4 PrononevsOwl Alphamedium Mistral Small 4mediumvsGPT-5.4none