Clasament modele pentru Respectarea instrucțiunilor

Vezi ce modele AI se descurcă cel mai bine la Respectarea instrucțiunilor, care rămân fiabile și unde apar cele mai mari diferențe.

Modele afișate

Media pentru Scor Respectarea instrucțiunilor

8.6

Cel mai bun model

Gemini 3 Flash Preview 10.0

Motive de eșec

Cu motivul de eșec Răspuns greșit61 Cu motivul de eșec Nu a urmat instrucțiunile19 Cu motivul de eșec Formatare suplimentară3 Cu motivul de eșec Fără răspuns2 Cu motivul de eșec Eroare API1 Cu motivul de eșec Timp expirat1

216/216

Rang	Model	Companie	Scor Respectarea instrucțiunilor	Scor	Cost total	Teste corecte	Timp de răspuns (mediu)
#67	Claude Sonnet 4.6 none	Anthropic	6.5	7.3	$0.661	1/2	1.96s
Total teste 2 Teste greșite 1 Cost total $0.661 Timp de răspuns (mediu) 1.96s
#96	LongCat 2.0 low	Meituan	6.5	6.7	$0.391	1/2	6.39s
Total teste 2 Teste greșite 1 Cost total $0.391 Timp de răspuns (mediu) 6.39s
#102	LongCat 2.0 high	Meituan	6.5	6.6	$0.469	1/2	6.96s
Total teste 2 Teste greșite 1 Cost total $0.469 Timp de răspuns (mediu) 6.96s
#117	LongCat 2.0 none	Meituan	6.5	6.3	$0.044	1/2	2.82s
Total teste 2 Teste greșite 1 Cost total $0.044 Timp de răspuns (mediu) 2.82s
#121	Gemma 4 31B none	Google	6.5	6.2	$0.021	1/2	2.84s
Total teste 2 Teste greșite 1 Cost total $0.021 Timp de răspuns (mediu) 2.84s
#144	Kimi K2.6 none	Moonshot AI	6.5	5.8	$0.184	1/2	1.64s
Total teste 2 Teste greșite 1 Cost total $0.184 Timp de răspuns (mediu) 1.64s
#145	GPT-5.4 none	OpenAI	6.5	5.8	$0.397	1/2	1.07s
Total teste 2 Teste greșite 1 Cost total $0.397 Timp de răspuns (mediu) 1.07s
#151	GLM 5V Turbo none	Z.ai	6.5	5.6	$0.052	1/2	1.97s
Total teste 2 Teste greșite 1 Cost total $0.052 Timp de răspuns (mediu) 1.97s
#152	Owl Alpha medium	Openrouter	6.5	5.6	$0.000	1/2	10.2s
Total teste 2 Teste greșite 1 Cost total $0.000 Timp de răspuns (mediu) 10.2s
#153	Mimo V2 PRO none	Xiaomi	6.5	5.6	$0.045	1/2	2.51s
Total teste 2 Teste greșite 1 Cost total $0.045 Timp de răspuns (mediu) 2.51s
#156	DeepSeek V4 Flash none	DeepSeek	6.5	5.6	$0.042	1/2	17.5s
Total teste 2 Teste greșite 1 Cost total $0.042 Timp de răspuns (mediu) 17.5s
#161	Kimi K2.5 none	Moonshot AI	6.5	5.5	$0.127	1/2	2.67s
Total teste 2 Teste greșite 1 Cost total $0.127 Timp de răspuns (mediu) 2.67s
#163	Mimo V2 Omni none	Xiaomi	6.5	5.5	$0.021	1/2	4.26s
Total teste 2 Teste greșite 1 Cost total $0.021 Timp de răspuns (mediu) 4.26s
#171	Mistral Small 4 none	Mistral	6.5	5.1	$0.022	1/2	380ms
Total teste 2 Teste greșite 1 Cost total $0.022 Timp de răspuns (mediu) 380ms
#174	MiMo-V2.5 none	Xiaomi	6.5	5.1	$0.025	1/2	751ms
Total teste 2 Teste greșite 1 Cost total $0.025 Timp de răspuns (mediu) 751ms

Clasament Respectarea instrucțiunilor

Filtrează modelele

Top modele după Scor Respectarea instrucțiunilor

Scor Respectarea instrucțiunilor vs cost total

Top modele după Timp de răspuns (mediu)