Clasament al eșecurilor pentru Răspuns greșit

Vezi ce modele AI se lovesc cel mai des de Răspuns greșit, ca să identifici riscurile de fiabilitate înainte să alegi. Sortează după: Număr de eșecuri ↑.

Modele afișate

Eșecuri totale

1558

Modelul cel mai afectat

Gemini 3 Flash Preview 1

Categorii

În categoria Specific domeniului412 În categoria Trucuri anti-AI293 În categoria Programare252 În categoria Rezolvare de puzzle-uri201 În categoria Cultură generală168 În categoria Combinat68 În categoria Respectarea instrucțiunilor61 În categoria Inteligență generală59 În categoria Parsare și extragere de date41 În categoria Apelare instrumente3

209/209

Rang	Model	Companie	Număr de Răspuns greșit	Scor	Cost total	Teste corecte	Timp de răspuns (mediu)
#1	Gemini 3 Flash Preview medium	Google	1	9.6	$0.742	21/22	19.2s
Total teste 22 Teste greșite 1 Cost total $0.742 Timp de răspuns (mediu) 19.2s
#2	Gemini 3.5 Flash high	Google	1	9.5	$1.976	20/22	15.1s
Total teste 22 Teste greșite 2 Cost total $1.976 Timp de răspuns (mediu) 15.1s
#209	Step 3.5 Flash none	Stepfun	1	2.3	$0.020	6/12	39.0s
Total teste 12 Teste greșite 6 Cost total $0.020 Timp de răspuns (mediu) 39.0s
#7	Gemini 3.1 Pro Preview medium	Google	2	9.2	$1.361	20/22	21.5s
Total teste 22 Teste greșite 2 Cost total $1.361 Timp de răspuns (mediu) 21.5s
#9	Gemini 3.5 Flash medium	Google	2	9.1	$0.642	19/22	8.20s
Total teste 22 Teste greșite 3 Cost total $0.642 Timp de răspuns (mediu) 8.20s
#11	Gemini 3.5 Flash low	Google	2	8.9	$0.433	19/22	5.55s
Total teste 22 Teste greșite 3 Cost total $0.433 Timp de răspuns (mediu) 5.55s
#12	Grok 4.5 high	X AI	2	8.9	$1.707	17/22	76.5s
Total teste 22 Teste greșite 5 Cost total $1.707 Timp de răspuns (mediu) 76.5s
#17	Claude Fable 5 medium	Anthropic	2	8.6	$3.478	17/22	17.2s
Total teste 22 Teste greșite 5 Cost total $3.478 Timp de răspuns (mediu) 17.2s
#110	Gemma 4 31B medium	Google	2	6.3	$0.163	14/22	75.4s
Total teste 22 Teste greșite 8 Cost total $0.163 Timp de răspuns (mediu) 75.4s
#119	Qwen3.5-35B-A3B medium	Qwen	2	6.2	$0.837	11/22	112.5s
Total teste 22 Teste greșite 11 Cost total $0.837 Timp de răspuns (mediu) 112.5s
#163	Gemini 3.1 Flash Lite Preview high	Google	2	5.3	$2.310	13/16	68.1s
Total teste 16 Teste greșite 3 Cost total $2.310 Timp de răspuns (mediu) 68.1s
#175	Qwen3.6 Plus Preview medium	Qwen	2	4.9	$0.000	9/19	15.2s
Total teste 19 Teste greșite 10 Cost total $0.000 Timp de răspuns (mediu) 15.2s
#204	Qwen3.5-9B medium	Qwen	2	3.8	$0.036	3/22	82.2s
Total teste 22 Teste greșite 19 Cost total $0.036 Timp de răspuns (mediu) 82.2s
#6	GPT-5.5 low	OpenAI	3	9.3	$1.253	19/22	10.1s
Total teste 22 Teste greșite 3 Cost total $1.253 Timp de răspuns (mediu) 10.1s
#8	Qwen3.7 Max medium	Qwen	3	9.2	$1.116	18/22	40.6s
Total teste 22 Teste greșite 4 Cost total $1.116 Timp de răspuns (mediu) 40.6s

Eșecuri Răspuns greșit

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)