Ranking de falhas por Não seguiu as instruções

Veja quais modelos de IA encontram Não seguiu as instruções com mais frequência para identificar riscos de confiabilidade antes de escolher. Ordenar por: Testes corretos ↑.

Modelos exibidos

Falhas totais

245

Modelo mais afetado

Granite 4.1 8B 4

Categorias

Na categoria Resolução de quebra-cabeças90 Na categoria Inteligência geral78 Na categoria Truques anti-IA33 Na categoria Seguimento de instruções18 Na categoria Programação16 Na categoria Chamada de ferramentas8 Na categoria Combinado1 Na categoria Específico do domínio1

140/140

Posição	Modelo	Empresa	Contagem de Não seguiu as instruções	Pontuação	Custo total	Testes corretos	Tempo de resposta (médio)
#200	MiMo-V2-Flash none	Xiaomi	2	4.0	$0.025	4/21	2.76s
Total de testes 21 Testes errados 17 Custo total $0.025 Tempo de resposta (médio) 2.76s
#207	Nemotron 3 Nano Omni 30b A3b Reasoning medium	NVIDIA	1	3.4	$0.000	4/19	17.1s
Total de testes 19 Testes errados 15 Custo total $0.000 Tempo de resposta (médio) 17.1s
#150	DeepSeek V4 Flash none	DeepSeek	1	5.6	$0.044	5/22	36.8s
Total de testes 22 Testes errados 17 Custo total $0.044 Tempo de resposta (médio) 36.8s
#160	Laguna XS 2.1 none	Poolside	1	5.3	$0.008	5/22	1.55s
Total de testes 22 Testes errados 17 Custo total $0.008 Tempo de resposta (médio) 1.55s
#165	Mistral Small 4 none	Mistral	1	5.1	$0.022	5/22	1.20s
Total de testes 22 Testes errados 17 Custo total $0.022 Tempo de resposta (médio) 1.20s
#166	Qwen3 Coder Next none	Qwen	1	5.1	$0.025	5/22	9.12s
Total de testes 22 Testes errados 17 Custo total $0.025 Tempo de resposta (médio) 9.12s
#167	Mistral Small 4 medium	Mistral	2	5.1	$0.096	5/22	10.8s
Total de testes 22 Testes errados 17 Custo total $0.096 Tempo de resposta (médio) 10.8s
#168	MiMo-V2.5 none	Xiaomi	1	5.1	$0.025	5/22	4.62s
Total de testes 22 Testes errados 17 Custo total $0.025 Tempo de resposta (médio) 4.62s
#172	MiniMax M2.7 medium	Minimax	5	5.0	$0.163	5/22	41.3s
Total de testes 22 Testes errados 17 Custo total $0.163 Tempo de resposta (médio) 41.3s
#174	GPT-4o-mini none	OpenAI	1	5.0	$0.010	5/22	1.99s
Total de testes 22 Testes errados 17 Custo total $0.010 Tempo de resposta (médio) 1.99s
#177	Nemotron 3 Super none	NVIDIA	2	4.9	$0.008	5/22	5.97s
Total de testes 22 Testes errados 17 Custo total $0.008 Tempo de resposta (médio) 5.97s
#190	MiniMax M2.5 medium	Minimax	3	4.6	$0.340	5/22	68.3s
Total de testes 22 Testes errados 17 Custo total $0.340 Tempo de resposta (médio) 68.3s
#193	Elephant Alpha none	Openrouter	3	4.3	$0.000	5/21	1.22s
Total de testes 21 Testes errados 16 Custo total $0.000 Tempo de resposta (médio) 1.22s
#205	Laguna Xs.2 none	Poolside	1	3.8	$0.004	5/19	806ms
Total de testes 19 Testes errados 14 Custo total $0.004 Tempo de resposta (médio) 806ms
#136	GPT-5.4 Mini none	OpenAI	3	5.9	$0.095	6/22	1.53s
Total de testes 22 Testes errados 16 Custo total $0.095 Tempo de resposta (médio) 1.53s

Falhas por Não seguiu as instruções

Filtrar modelos

Melhores modelos por Contagem de Não seguiu as instruções

Contagem de Não seguiu as instruções vs Pontuação

Melhores modelos por Tempo de resposta (médio)