Clasament Inteligență generală x Răspuns greșit

Vezi ce modele AI au cele mai mari șanse să întâmpine Răspuns greșit la Inteligență generală, ca să găsești mai repede punctele slabe.

Modele afișate

Eșecuri totale

Modelul cel mai afectat

Grok 4.5 1

Motive de eșec

Nu a urmat instrucțiunile78 Răspuns greșit59 Eroare API12 Timp expirat4

Categorii

Specific domeniului412 Trucuri anti-AI293 Programare252 Rezolvare de puzzle-uri201 Cultură generală168 Combinat68 Respectarea instrucțiunilor61 Inteligență generală59 Parsare și extragere de date41 Apelare instrumente3

59/59

Rang	Model	Companie	Număr de Răspuns greșit	Scor de categorie	Cost total	Teste corecte	Timp de răspuns (mediu)
#82	DeepSeek V4 Pro none	DeepSeek	1	5.0	$0.096	0/1	2.05s
Total teste 1 Teste greșite 1 Cost total $0.096 Timp de răspuns (mediu) 2.05s
#83	GPT-5.6 Sol none	OpenAI	1	6.5	$0.524	0/1	1.52s
Total teste 1 Teste greșite 1 Cost total $0.524 Timp de răspuns (mediu) 1.52s
#85	Qwen3.6 Flash medium	Qwen	1	4.8	$0.738	0/1	9.88s
Total teste 1 Teste greșite 1 Cost total $0.738 Timp de răspuns (mediu) 9.88s
#86	Step 3.7 Flash high	Stepfun	1	5.5	$1.207	0/1	4.17s
Total teste 1 Teste greșite 1 Cost total $1.207 Timp de răspuns (mediu) 4.17s
#91	LongCat 2.0 low	Meituan	1	3.4	$0.391	0/1	22.5s
Total teste 1 Teste greșite 1 Cost total $0.391 Timp de răspuns (mediu) 22.5s
#92	KAT-Coder-Pro V2.5 none	Kwaipilot	1	4.8	$0.476	0/1	5.16s
Total teste 1 Teste greșite 1 Cost total $0.476 Timp de răspuns (mediu) 5.16s
#96	GLM 5.2 none	Z.ai	1	6.1	$0.151	0/1	4.42s
Total teste 1 Teste greșite 1 Cost total $0.151 Timp de răspuns (mediu) 4.42s
#97	LongCat 2.0 high	Meituan	1	5.1	$0.469	0/1	17.0s
Total teste 1 Teste greșite 1 Cost total $0.469 Timp de răspuns (mediu) 17.0s
#98	Qwen3.6 Max Preview none	Qwen	1	4.3	$0.231	0/1	1.62s
Total teste 1 Teste greșite 1 Cost total $0.231 Timp de răspuns (mediu) 1.62s
#102	Laguna XS 2.1 medium	Poolside	1	5.0	$0.068	0/1	4.15s
Total teste 1 Teste greșite 1 Cost total $0.068 Timp de răspuns (mediu) 4.15s
#105	Gemini 3.1 Flash Lite low	Google	1	4.0	$0.621	0/1	1.37s
Total teste 1 Teste greșite 1 Cost total $0.621 Timp de răspuns (mediu) 1.37s
#107	Qwen3.5 Plus 2026-02-15 none	Qwen	1	4.4	$0.073	0/1	2.26s
Total teste 1 Teste greșite 1 Cost total $0.073 Timp de răspuns (mediu) 2.26s
#111	LongCat 2.0 none	Meituan	1	5.0	$0.044	0/1	2.76s
Total teste 1 Teste greșite 1 Cost total $0.044 Timp de răspuns (mediu) 2.76s
#117	GPT-5.6 Luna low	OpenAI	1	5.0	$0.249	0/1	2.25s
Total teste 1 Teste greșite 1 Cost total $0.249 Timp de răspuns (mediu) 2.25s
#118	Gemini 2.5 Flash none	Google	1	5.0	$0.017	0/1	615ms
Total teste 1 Teste greșite 1 Cost total $0.017 Timp de răspuns (mediu) 615ms

←

1 2 3 4

→

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)

Top modele după Cost irosit estimat

Inteligență generală: Răspuns greșit

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)

Top modele după Cost irosit estimat