Clasament modele pentru Respectarea instrucțiunilor

Vezi ce modele AI se descurcă cel mai bine la Respectarea instrucțiunilor, care rămân fiabile și unde apar cele mai mari diferențe. Sortează după: Metrică ↑.

Modele afișate

Media pentru Scor Respectarea instrucțiunilor

8.6

Cel mai bun model

Grok 4.1 Fast 3.0

Motive de eșec

Cu motivul de eșec Răspuns greșit61 Cu motivul de eșec Nu a urmat instrucțiunile19 Cu motivul de eșec Formatare suplimentară3 Cu motivul de eșec Fără răspuns2 Cu motivul de eșec Eroare API1 Cu motivul de eșec Timp expirat1

216/216

Rang	Model	Companie	Scor Respectarea instrucțiunilor	Scor	Cost total	Teste corecte	Timp de răspuns (mediu)
#102	LongCat 2.0 high	Meituan	6.5	6.6	$0.469	1/2	6.96s
Total teste 2 Teste greșite 1 Cost total $0.469 Timp de răspuns (mediu) 6.96s
#117	LongCat 2.0 none	Meituan	6.5	6.3	$0.044	1/2	2.82s
Total teste 2 Teste greșite 1 Cost total $0.044 Timp de răspuns (mediu) 2.82s
#121	Gemma 4 31B none	Google	6.5	6.2	$0.021	1/2	2.84s
Total teste 2 Teste greșite 1 Cost total $0.021 Timp de răspuns (mediu) 2.84s
#144	Kimi K2.6 none	Moonshot AI	6.5	5.8	$0.184	1/2	1.64s
Total teste 2 Teste greșite 1 Cost total $0.184 Timp de răspuns (mediu) 1.64s
#145	GPT-5.4 none	OpenAI	6.5	5.8	$0.397	1/2	1.07s
Total teste 2 Teste greșite 1 Cost total $0.397 Timp de răspuns (mediu) 1.07s
#151	GLM 5V Turbo none	Z.ai	6.5	5.6	$0.052	1/2	1.97s
Total teste 2 Teste greșite 1 Cost total $0.052 Timp de răspuns (mediu) 1.97s
#152	Owl Alpha medium	Openrouter	6.5	5.6	$0.000	1/2	10.2s
Total teste 2 Teste greșite 1 Cost total $0.000 Timp de răspuns (mediu) 10.2s
#153	Mimo V2 PRO none	Xiaomi	6.5	5.6	$0.045	1/2	2.51s
Total teste 2 Teste greșite 1 Cost total $0.045 Timp de răspuns (mediu) 2.51s
#156	DeepSeek V4 Flash none	DeepSeek	6.5	5.6	$0.042	1/2	17.5s
Total teste 2 Teste greșite 1 Cost total $0.042 Timp de răspuns (mediu) 17.5s
#161	Kimi K2.5 none	Moonshot AI	6.5	5.5	$0.127	1/2	2.67s
Total teste 2 Teste greșite 1 Cost total $0.127 Timp de răspuns (mediu) 2.67s
#163	Mimo V2 Omni none	Xiaomi	6.5	5.5	$0.021	1/2	4.26s
Total teste 2 Teste greșite 1 Cost total $0.021 Timp de răspuns (mediu) 4.26s
#171	Mistral Small 4 none	Mistral	6.5	5.1	$0.022	1/2	380ms
Total teste 2 Teste greșite 1 Cost total $0.022 Timp de răspuns (mediu) 380ms
#174	MiMo-V2.5 none	Xiaomi	6.5	5.1	$0.025	1/2	751ms
Total teste 2 Teste greșite 1 Cost total $0.025 Timp de răspuns (mediu) 751ms
#175	Qwen3.5-9B none	Qwen	6.5	5.1	$0.021	1/2	514ms
Total teste 2 Teste greșite 1 Cost total $0.021 Timp de răspuns (mediu) 514ms
#176	GLM 5 Turbo none	Z.ai	6.5	5.1	$0.047	1/2	2.13s
Total teste 2 Teste greșite 1 Cost total $0.047 Timp de răspuns (mediu) 2.13s

Clasament Respectarea instrucțiunilor

Filtrează modelele

Top modele după Scor Respectarea instrucțiunilor

Scor Respectarea instrucțiunilor vs cost total

Top modele după Timp de răspuns (mediu)