Clasament al eșecurilor pentru Formatare suplimentară

Vezi ce modele AI se lovesc cel mai des de Formatare suplimentară, ca să identifici riscurile de fiabilitate înainte să alegi. Sortează după: Teste corecte ↑.

Modele afișate

Eșecuri totale

Modelul cel mai afectat

Granite 4.1 8B 1

Categorii

În categoria Trucuri anti-AI20 În categoria Programare18 În categoria Specific domeniului17 În categoria Rezolvare de puzzle-uri8 În categoria Parsare și extragere de date6 În categoria Respectarea instrucțiunilor3 În categoria Combinat1

42/42

Rang	Model	Companie	Număr de Formatare suplimentară	Scor	Cost total	Teste corecte	Timp de răspuns (mediu)
#201	Granite 4.1 8B none	IBM Granite	1	4.0	$0.007	2/22	1.45s
Total teste 22 Teste greșite 20 Cost total $0.007 Timp de răspuns (mediu) 1.45s
#204	Qwen3.5-9B medium	Qwen	1	3.8	$0.036	3/22	82.2s
Total teste 22 Teste greșite 19 Cost total $0.036 Timp de răspuns (mediu) 82.2s
#171	North Mini Code none	Cohere	2	5.1	$0.000	4/22	29.9s
Total teste 22 Teste greșite 18 Cost total $0.000 Timp de răspuns (mediu) 29.9s
#199	Hy3 preview none	Tencent	1	4.0	$0.003	4/21	12.9s
Total teste 21 Teste greșite 17 Cost total $0.003 Timp de răspuns (mediu) 12.9s
#200	MiMo-V2-Flash none	Xiaomi	1	4.0	$0.025	4/21	2.76s
Total teste 21 Teste greșite 17 Cost total $0.025 Timp de răspuns (mediu) 2.76s
#150	DeepSeek V4 Flash none	DeepSeek	2	5.6	$0.044	5/22	36.8s
Total teste 22 Teste greșite 17 Cost total $0.044 Timp de răspuns (mediu) 36.8s
#166	Qwen3 Coder Next none	Qwen	1	5.1	$0.025	5/22	9.12s
Total teste 22 Teste greșite 17 Cost total $0.025 Timp de răspuns (mediu) 9.12s
#168	MiMo-V2.5 none	Xiaomi	1	5.1	$0.025	5/22	4.62s
Total teste 22 Teste greșite 17 Cost total $0.025 Timp de răspuns (mediu) 4.62s
#182	KAT-Coder-Air V2.5 none	Kwaipilot	3	4.8	$0.067	5/22	12.2s
Total teste 22 Teste greșite 17 Cost total $0.067 Timp de răspuns (mediu) 12.2s
#159	GPT-5.6 Luna none	OpenAI	1	5.4	$0.142	6/22	1.50s
Total teste 22 Teste greșite 16 Cost total $0.142 Timp de răspuns (mediu) 1.50s
#164	Inkling none	Thinkingmachines	1	5.2	$0.147	6/22	3.50s
Total teste 22 Teste greșite 16 Cost total $0.147 Timp de răspuns (mediu) 3.50s
#173	DeepSeek V3.2 none	DeepSeek	2	5.0	$0.054	6/22	18.3s
Total teste 22 Teste greșite 16 Cost total $0.054 Timp de răspuns (mediu) 18.3s
#111	LongCat 2.0 none	Meituan	1	6.3	$0.044	7/22	5.18s
Total teste 22 Teste greșite 15 Cost total $0.044 Timp de răspuns (mediu) 5.18s
#144	KAT-Coder-Air V2.5 high	Kwaipilot	3	5.6	$0.077	7/22	15.9s
Total teste 22 Teste greșite 15 Cost total $0.077 Timp de răspuns (mediu) 15.9s
#158	KAT-Coder-Air V2.5 low	Kwaipilot	4	5.4	$0.041	7/22	10.1s
Total teste 22 Teste greșite 15 Cost total $0.041 Timp de răspuns (mediu) 10.1s

Eșecuri Formatare suplimentară

Filtrează modelele

Top modele după Număr de Formatare suplimentară

Număr de Formatare suplimentară vs Scor

Top modele după Timp de răspuns (mediu)