Clasament Programare x Răspuns greșit

Eșecuri pe categorii AI BENCHY

Vezi ce modele AI au cele mai mari șanse să întâmpine Răspuns greșit la Programare, ca să găsești mai repede punctele slabe.

Modele afișate

Eșecuri totale

230

Modelul cel mai afectat

Qwen3.6 Flash 3

Motive de eșec

Răspuns greșit230 Eroare API43 Timp expirat23 Fără răspuns18 Nu a urmat instrucțiunile16 Formatare suplimentară12

Categorii

Specific domeniului367 Trucuri anti-AI270 Programare230 Rezolvare de puzzle-uri172 Cultură generală149 Combinat58 Respectarea instrucțiunilor56 Inteligență generală49 Parsare și extragere de date36 Apelare instrumente3

134/134

Rang	Model	Companie	Număr de Răspuns greșit	Scor de categorie	Cost total	Teste corecte	Timp de răspuns (mediu)
#59	Qwen3.6 Flash medium	Qwen	3	5.0	$0.288	0/3	42.9s
Total teste 3 Teste greșite 3 Cost total $0.288 Timp de răspuns (mediu) 42.9s
#115	Qwen3.6 Max Preview none	Qwen	3	3.8	$0.075	0/3	3.12s
Total teste 3 Teste greșite 3 Cost total $0.075 Timp de răspuns (mediu) 3.12s
#117	GLM 5 none	Z.ai	3	4.0	$0.027	0/3	5.12s
Total teste 3 Teste greșite 3 Cost total $0.027 Timp de răspuns (mediu) 5.12s
#122	Qwen3.5 Plus 2026-02-15 none	Qwen	3	4.3	$0.016	0/3	2.05s
Total teste 3 Teste greșite 3 Cost total $0.016 Timp de răspuns (mediu) 2.05s
#123	North Mini Code medium	Cohere	3	4.5	$0.000	0/3	320.4s
Total teste 3 Teste greșite 3 Cost total $0.000 Timp de răspuns (mediu) 320.4s
#131	Claude Sonnet 5 none	Anthropic	3	4.6	$0.287	0/3	3.67s
Total teste 3 Teste greșite 3 Cost total $0.287 Timp de răspuns (mediu) 3.67s
#133	GLM 5.1 none	Z.ai	3	3.9	$0.057	0/3	4.96s
Total teste 3 Teste greșite 3 Cost total $0.057 Timp de răspuns (mediu) 4.96s
#134	DeepSeek V4 Flash none	DeepSeek	3	4.2	$0.007	0/3	17.1s
Total teste 3 Teste greșite 3 Cost total $0.007 Timp de răspuns (mediu) 17.1s
#140	GLM 5 Turbo none	Z.ai	3	3.9	$0.047	0/3	2.41s
Total teste 3 Teste greșite 3 Cost total $0.047 Timp de răspuns (mediu) 2.41s
#141	Laguna XS 2.1 none	Poolside	3	4.3	$0.003	0/3	623ms
Total teste 3 Teste greșite 3 Cost total $0.003 Timp de răspuns (mediu) 623ms
#142	GPT-5.6 Luna none	OpenAI	3	3.8	$0.047	0/3	980ms
Total teste 3 Teste greșite 3 Cost total $0.047 Timp de răspuns (mediu) 980ms
#144	Qwen3.5-122B-A10B none	Qwen	3	3.7	$0.020	0/3	2.77s
Total teste 3 Teste greșite 3 Cost total $0.020 Timp de răspuns (mediu) 2.77s
#148	Mistral Small 4 none	Mistral	3	3.7	$0.007	0/3	901ms
Total teste 3 Teste greșite 3 Cost total $0.007 Timp de răspuns (mediu) 901ms
#149	Qwen3 Coder Next none	Qwen	3	4.6	$0.009	0/3	2.22s
Total teste 3 Teste greșite 3 Cost total $0.009 Timp de răspuns (mediu) 2.22s
#150	North Mini Code none	Cohere	3	3.9	$0.000	0/3	22.0s
Total teste 3 Teste greșite 3 Cost total $0.000 Timp de răspuns (mediu) 22.0s

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)

Top modele după Cost irosit estimat

Programare: Răspuns greșit

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)

Top modele după Cost irosit estimat