Foutenranglijst voor Verkeerd antwoord

Zie welke AI-modellen het vaakst tegen Verkeerd antwoord aanlopen, zodat je betrouwbaarheidsrisico's ziet voordat je kiest. Sorteren op: Score ↑.

Getoonde modellen

Totaal fouten

1558

Meest getroffen model

LFM2-24B-A2B 9

Categorieën

In categorie Domeinspecifiek412 In categorie Anti-AI-trucs293 In categorie Programmeren252 In categorie Puzzeloplossing201 In categorie Algemene kennis168 In categorie Gecombineerd68 In categorie Instructies opvolgen61 In categorie Algemene intelligentie59 In categorie Gegevensparsering en extractie41 In categorie Toolaanroepen3

209/209

Rang	Model	Bedrijf	Verkeerd antwoord-aantal	Score	Totale kosten	Correcte tests	Responstijd (gem.)
#195	Elephant Alpha medium	Openrouter	9	4.3	$0.000	6/21	1.27s
Totaal tests 21 Foute tests 15 Totale kosten $0.000 Responstijd (gem.) 1.27s
#194	GLM 4.7 Flash medium	Z.ai	9	4.3	$0.166	4/22	142.6s
Totaal tests 22 Foute tests 18 Totale kosten $0.166 Responstijd (gem.) 142.6s
#193	Elephant Alpha none	Openrouter	9	4.3	$0.000	5/21	1.22s
Totaal tests 21 Foute tests 16 Totale kosten $0.000 Responstijd (gem.) 1.22s
#192	Laguna M.1 none	Poolside	10	4.4	$0.009	4/19	2.89s
Totaal tests 19 Foute tests 15 Totale kosten $0.009 Responstijd (gem.) 2.89s
#191	Grok 4.20 Beta none	X AI	10	4.4	$0.087	6/18	1.19s
Totaal tests 18 Foute tests 12 Totale kosten $0.087 Responstijd (gem.) 1.19s
#190	MiniMax M2.5 medium	Minimax	7	4.6	$0.340	5/22	68.3s
Totaal tests 22 Foute tests 17 Totale kosten $0.340 Responstijd (gem.) 68.3s
#189	Mercury 2 none	Inception	17	4.6	$0.030	4/22	829ms
Totaal tests 22 Foute tests 18 Totale kosten $0.030 Responstijd (gem.) 829ms
#188	Cobuddy medium	Baidu	9	4.7	$0.000	7/21	39.9s
Totaal tests 21 Foute tests 14 Totale kosten $0.000 Responstijd (gem.) 39.9s
#187	Qwen3 Coder Next medium	Qwen	13	4.7	$0.032	4/22	9.61s
Totaal tests 22 Foute tests 18 Totale kosten $0.032 Responstijd (gem.) 9.61s
#186	Laguna M.1 medium	Poolside	4	4.7	$0.033	9/19	14.7s
Totaal tests 19 Foute tests 10 Totale kosten $0.033 Responstijd (gem.) 14.7s
#185	Grok 4.1 Fast medium	X AI	4	4.7	$0.069	9/19	23.8s
Totaal tests 19 Foute tests 10 Totale kosten $0.069 Responstijd (gem.) 23.8s
#184	Hunter Alpha medium	OpenRouter	4	4.7	$0.000	8/18	10.3s
Totaal tests 18 Foute tests 10 Totale kosten $0.000 Responstijd (gem.) 10.3s
#183	Trinity Large Preview none	Arcee AI	12	4.8	$0.008	4/21	2.98s
Totaal tests 21 Foute tests 17 Totale kosten $0.008 Responstijd (gem.) 2.98s
#182	KAT-Coder-Air V2.5 none	Kwaipilot	13	4.8	$0.067	5/22	12.2s
Totaal tests 22 Foute tests 17 Totale kosten $0.067 Responstijd (gem.) 12.2s
#181	Grok 4.20 Multi Agent Beta medium	X AI	4	4.8	$5.599	8/18	9.69s
Totaal tests 18 Foute tests 10 Totale kosten $5.599 Responstijd (gem.) 9.69s

Verkeerd antwoord-fouten

Modellen filteren

Topmodellen op Verkeerd antwoord-aantal

Verkeerd antwoord-aantal vs Score

Topmodellen op Responstijd (gem.)