Clasament Trucuri anti-AI x Răspuns greșit

Vezi ce modele AI au cele mai mari șanse să întâmpine Răspuns greșit la Trucuri anti-AI, ca să găsești mai repede punctele slabe.

Modele afișate

Eșecuri totale

293

Modelul cel mai afectat

Seed-2.0-Lite 4

Motive de eșec

Răspuns greșit293 Nu a urmat instrucțiunile33 Formatare suplimentară20 Eroare API14 Fără răspuns4 Timp expirat4

Categorii

Specific domeniului421 Trucuri anti-AI293 Programare259 Rezolvare de puzzle-uri204 Cultură generală172 Combinat69 Inteligență generală62 Respectarea instrucțiunilor61 Parsare și extragere de date41 Apelare instrumente3

140/140

Rang	Model	Companie	Număr de Răspuns greșit	Scor de categorie	Cost total	Teste corecte	Timp de răspuns (mediu)
#188	KAT-Coder-Air V2.5 none	Kwaipilot	3	5.3	$0.067	1/4	2.68s
Total teste 4 Teste greșite 3 Cost total $0.067 Timp de răspuns (mediu) 2.68s
#193	Qwen3 Coder Next medium	Qwen	3	3.5	$0.032	0/4	8.64s
Total teste 4 Teste greșite 4 Cost total $0.032 Timp de răspuns (mediu) 8.64s
#198	Laguna M.1 none	Poolside	3	3.4	$0.009	0/4	705ms
Total teste 4 Teste greșite 4 Cost total $0.009 Timp de răspuns (mediu) 705ms
#203	Grok 4.20 none	X AI	3	4.8	$0.057	1/4	501ms
Total teste 4 Teste greșite 3 Cost total $0.057 Timp de răspuns (mediu) 501ms
#209	Grok 4.1 Fast none	X AI	3	3.2	$0.008	0/4	1.07s
Total teste 4 Teste greșite 4 Cost total $0.008 Timp de răspuns (mediu) 1.07s
#216	LFM2-24B-A2B none	Liquid	3	2.5	$0.001	0/3	471ms
Total teste 3 Teste greșite 3 Cost total $0.001 Timp de răspuns (mediu) 471ms
#27	Muse Spark 1.1 low	Meta	2	7.9	$0.647	2/4	4.36s
Total teste 4 Teste greșite 2 Cost total $0.647 Timp de răspuns (mediu) 4.36s
#50	DeepSeek V4 Pro high	DeepSeek	2	5.7	$0.200	1/4	25.7s
Total teste 4 Teste greșite 3 Cost total $0.200 Timp de răspuns (mediu) 25.7s
#51	MiniMax M3 medium	Minimax	2	5.5	$0.286	1/4	14.9s
Total teste 4 Teste greșite 3 Cost total $0.286 Timp de răspuns (mediu) 14.9s
#56	Kimi K2.7 Code medium	Moonshot AI	2	7.3	$0.740	2/4	11.6s
Total teste 4 Teste greșite 2 Cost total $0.740 Timp de răspuns (mediu) 11.6s
#63	Qwen3.7 Max none	Qwen	2	6.5	$0.197	2/4	1.08s
Total teste 4 Teste greșite 2 Cost total $0.197 Timp de răspuns (mediu) 1.08s
#66	KAT-Coder-Pro V2.5 low	Kwaipilot	2	6.9	$0.387	2/4	4.20s
Total teste 4 Teste greșite 2 Cost total $0.387 Timp de răspuns (mediu) 4.20s
#73	KAT-Coder-Pro V2.5 high	Kwaipilot	2	7.0	$0.482	2/4	3.17s
Total teste 4 Teste greșite 2 Cost total $0.482 Timp de răspuns (mediu) 3.17s
#75	Qwen3.7 Plus none	Qwen	2	6.5	$0.106	2/4	1.38s
Total teste 4 Teste greșite 2 Cost total $0.106 Timp de răspuns (mediu) 1.38s
#86	DeepSeek V4 Pro none	DeepSeek	2	3.2	$0.096	0/4	4.02s
Total teste 4 Teste greșite 4 Cost total $0.096 Timp de răspuns (mediu) 4.02s

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)

Top modele după Cost irosit estimat

Trucuri anti-AI: Răspuns greșit

Filtrează modelele

Top modele după Număr de Răspuns greșit

Număr de Răspuns greșit vs Scor

Top modele după Timp de răspuns (mediu)

Top modele după Cost irosit estimat