AI BENCHY
Advertise here

Kushindwa kwa AI BENCHY

Kushindwa kwa Hakufuata maelekezo

Ona ni modeli gani za AI hukutana na Hakufuata maelekezo mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Alama ↑.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

215

Modeli iliyoathirika zaidi

Granite 4.1 8B 4
Nafasi Modeli Kampuni Idadi ya Hakufuata maelekezo Alama Majaribio sahihi Muda wa majibu (wastani)
#131 Qwen3.5-122B-A10B none Qwen 2 5.3 6/21 3.41s
#130 MiniMax M2.7 medium Minimax 5 5.3 5/21 38.2s
#129 MiniMax M2.5 medium Minimax 3 5.3 5/21 65.4s
#128 Qwen3.6 Flash none Qwen 1 5.4 7/21 1.60s
#126 gpt-oss-120b none OpenAI 2 5.4 6/19 21.6s
#124 Kimi K2.6 none Moonshot AI 3 5.5 7/21 13.3s
#125 GPT-5.4 none OpenAI 1 5.5 7/21 1.42s
#123 MiMo-V2.5-Pro none Xiaomi 4 5.5 6/21 1.78s
#121 Owl Alpha none Openrouter 3 5.5 7/21 9.88s
#122 GLM 4.7 Flash none Z.ai 1 5.5 6/21 2.86s
#120 Mimo V2 PRO none Xiaomi 2 5.6 7/21 2.27s
#119 Cobuddy medium Baidu 3 5.6 7/21 39.9s
#118 Qwen3.6 27B none Qwen 2 5.6 7/21 3.72s
#117 Qwen3.5-35B-A3B none Qwen 2 5.6 7/21 3.37s
#116 Hunter Alpha none OpenRouter 2 5.7 6/18 4.70s

Modeli bora kwa Idadi ya Hakufuata maelekezo

Idadi ya Hakufuata maelekezo dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)