Ranking de falhas por Não seguiu as instruções

Veja quais modelos de IA encontram Não seguiu as instruções com mais frequência para identificar riscos de confiabilidade antes de escolher. Ordenar por: Tempo de resposta (médio) ↓.

Modelos exibidos

Falhas totais

245

Modelo mais afetado

Step 3.5 Flash 3

Categorias

Na categoria Resolução de quebra-cabeças90 Na categoria Inteligência geral78 Na categoria Truques anti-IA33 Na categoria Seguimento de instruções18 Na categoria Programação16 Na categoria Chamada de ferramentas8 Na categoria Combinado1 Na categoria Específico do domínio1

140/140

Posição	Modelo	Empresa	Contagem de Não seguiu as instruções	Pontuação	Custo total	Testes corretos	Tempo de resposta (médio)
#108	Ring-2.6-1T medium	Inclusionai	2	6.3	$0.103	11/22	68.7s
Total de testes 22 Testes errados 11 Custo total $0.103 Tempo de resposta (médio) 68.7s
#76	DeepSeek V3.2 medium	DeepSeek	1	7.0	$0.078	11/22	68.6s
Total de testes 22 Testes errados 11 Custo total $0.078 Tempo de resposta (médio) 68.6s
#190	MiniMax M2.5 medium	Minimax	3	4.6	$0.340	5/22	68.3s
Total de testes 22 Testes errados 17 Custo total $0.340 Tempo de resposta (médio) 68.3s
#163	Gemini 3.1 Flash Lite Preview high	Google	1	5.3	$2.310	13/16	68.1s
Total de testes 16 Testes errados 3 Custo total $2.310 Tempo de resposta (médio) 68.1s
#28	Inkling high	Thinkingmachines	1	8.0	$1.006	15/22	64.2s
Total de testes 22 Testes errados 7 Custo total $1.006 Tempo de resposta (médio) 64.2s
#31	GLM 5.2 high	Z.ai	1	8.0	$0.970	14/22	62.7s
Total de testes 22 Testes errados 8 Custo total $0.970 Tempo de resposta (médio) 62.7s
#143	Gemini 3.1 Flash Lite high	Google	3	5.6	$2.044	10/18	62.0s
Total de testes 18 Testes errados 8 Custo total $2.044 Tempo de resposta (médio) 62.0s
#90	Qwen3.6 35B A3B medium	Qwen	1	6.7	$0.746	13/22	58.1s
Total de testes 22 Testes errados 9 Custo total $0.746 Tempo de resposta (médio) 58.1s
#179	Ring-2.6-1T none	Inclusionai	2	4.8	$0.026	9/22	55.1s
Total de testes 22 Testes errados 13 Custo total $0.026 Tempo de resposta (médio) 55.1s
#128	GPT-5 Nano medium	OpenAI	2	6.1	$0.114	9/22	54.9s
Total de testes 22 Testes errados 13 Custo total $0.114 Tempo de resposta (médio) 54.9s
#140	Nemotron 3 Super medium	NVIDIA	3	5.7	$0.050	8/22	52.0s
Total de testes 22 Testes errados 14 Custo total $0.050 Tempo de resposta (médio) 52.0s
#45	DeepSeek V4 Flash high	DeepSeek	2	7.7	$0.042	13/22	49.7s
Total de testes 22 Testes errados 9 Custo total $0.042 Tempo de resposta (médio) 49.7s
#35	Seed-2.0-Lite medium	Bytedance Seed	2	7.9	$0.234	14/22	48.5s
Total de testes 22 Testes errados 8 Custo total $0.234 Tempo de resposta (médio) 48.5s
#73	Grok 4.3 medium	X AI	2	7.1	$0.779	13/22	47.4s
Total de testes 22 Testes errados 9 Custo total $0.779 Tempo de resposta (médio) 47.4s
#85	Qwen3.6 Flash medium	Qwen	1	6.9	$0.738	12/22	44.7s
Total de testes 22 Testes errados 10 Custo total $0.738 Tempo de resposta (médio) 44.7s

Falhas por Não seguiu as instruções

Filtrar modelos

Melhores modelos por Contagem de Não seguiu as instruções

Contagem de Não seguiu as instruções vs Pontuação

Melhores modelos por Tempo de resposta (médio)