Foutenranglijst voor Verkeerd antwoord

Zie welke AI-modellen het vaakst tegen Verkeerd antwoord aanlopen, zodat je betrouwbaarheidsrisico's ziet voordat je kiest. Sorteren op: Aantal fouten ↑.

Getoonde modellen

Totaal fouten

1585

Meest getroffen model

Gemini 3.6 Flash 1

Categorieën

In categorie Domeinspecifiek421 In categorie Anti-AI-trucs293 In categorie Programmeren259 In categorie Puzzeloplossing204 In categorie Algemene kennis172 In categorie Gecombineerd69 In categorie Algemene intelligentie62 In categorie Instructies opvolgen61 In categorie Gegevensparsering en extractie41 In categorie Toolaanroepen3

215/215

Rang	Model	Bedrijf	Verkeerd antwoord-aantal	Score	Totale kosten	Correcte tests	Responstijd (gem.)
#81	Kimi K2.5 medium	Moonshot AI	5	7.0	$0.600	10/22	99.0s
Totaal tests 22 Foute tests 12 Totale kosten $0.600 Responstijd (gem.) 99.0s
#92	Gemini 3.5 Flash minimal	Google	5	6.8	$0.300	14/22	2.65s
Totaal tests 22 Foute tests 8 Totale kosten $0.300 Responstijd (gem.) 2.65s
#107	MiMo-V2.5 medium	Xiaomi	5	6.5	$0.082	12/22	32.2s
Totaal tests 22 Foute tests 10 Totale kosten $0.082 Responstijd (gem.) 32.2s
#115	Mimo V2 PRO medium	Xiaomi	5	6.3	$0.333	12/21	22.2s
Totaal tests 21 Foute tests 9 Totale kosten $0.333 Responstijd (gem.) 22.2s
#119	MiMo-V2-Flash medium	Xiaomi	5	6.3	$0.043	12/21	20.1s
Totaal tests 21 Foute tests 9 Totale kosten $0.043 Responstijd (gem.) 20.1s
#140	Mimo V2 Omni medium	Xiaomi	5	5.9	$0.683	10/21	41.2s
Totaal tests 21 Foute tests 11 Totale kosten $0.683 Responstijd (gem.) 41.2s
#146	Nemotron 3 Super medium	NVIDIA	5	5.7	$0.055	8/22	52.0s
Totaal tests 22 Foute tests 14 Totale kosten $0.055 Responstijd (gem.) 52.0s
#185	Ring-2.6-1T none	Inclusionai	5	4.8	$0.026	9/22	55.1s
Totaal tests 22 Foute tests 13 Totale kosten $0.026 Responstijd (gem.) 55.1s
#23	Grok 4.5 low	X AI	6	8.4	$0.935	16/22	15.6s
Totaal tests 22 Foute tests 6 Totale kosten $0.935 Responstijd (gem.) 15.6s
#25	Grok 4.5 medium	X AI	6	8.3	$1.928	16/22	61.7s
Totaal tests 22 Foute tests 6 Totale kosten $1.928 Responstijd (gem.) 61.7s
#27	Muse Spark 1.1 low	Meta	6	8.3	$0.647	13/22	11.5s
Totaal tests 22 Foute tests 9 Totale kosten $0.647 Responstijd (gem.) 11.5s
#28	Gemini 2.5 Flash medium	Google	6	8.2	$0.643	15/22	21.2s
Totaal tests 22 Foute tests 7 Totale kosten $0.643 Responstijd (gem.) 21.2s
#31	Gemini 3.5 Flash-Lite high	Google	6	8.1	$0.584	14/22	9.48s
Totaal tests 22 Foute tests 8 Totale kosten $0.584 Responstijd (gem.) 9.48s
#34	GPT-5.2 Chat none	OpenAI	6	8.0	$0.604	14/22	7.65s
Totaal tests 22 Foute tests 8 Totale kosten $0.604 Responstijd (gem.) 7.65s
#49	DeepSeek V4 Flash high	DeepSeek	6	7.7	$0.041	13/22	49.7s
Totaal tests 22 Foute tests 9 Totale kosten $0.041 Responstijd (gem.) 49.7s

Verkeerd antwoord-fouten

Modellen filteren

Topmodellen op Verkeerd antwoord-aantal

Verkeerd antwoord-aantal vs Score

Topmodellen op Responstijd (gem.)