AI BENCHY
Advertise here

Kushindwa kwa AI BENCHY

Kushindwa kwa Hakufuata maelekezo

Ona ni modeli gani za AI hukutana na Hakufuata maelekezo mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

215

Modeli iliyoathirika zaidi

MiniMax M2.7 5
Nafasi Modeli Kampuni Idadi ya Hakufuata maelekezo Alama Majaribio sahihi Muda wa majibu (wastani)
#26 Qwen3.6 Plus medium Qwen 1 7.9 14/21 30.7s
#28 Gemini 2.5 Flash medium Google 1 7.8 14/21 15.5s
#32 Gemini 3.5 Flash minimal Google 1 7.7 14/21 1.57s
#33 Hy3 preview medium Tencent 1 7.7 14/21 16.3s
#39 Qwen3.6 Flash medium Qwen 1 7.5 12/21 19.2s
#40 Gemini 3.1 Flash Lite Preview medium Google 1 7.5 13/21 3.96s
#44 Gemini 3.1 Flash Lite medium Google 1 7.5 13/21 3.23s
#46 Qwen3.6 35B A3B medium Qwen 1 7.4 13/21 18.1s
#49 Qwen3.5-Flash medium Qwen 1 7.4 12/21 63.3s
#50 Gemini 3.1 Flash Lite Preview low Google 1 7.4 13/21 2.77s
#51 Mimo V2 PRO medium Xiaomi 1 7.4 12/21 22.2s
#56 MiMo-V2.5 medium Xiaomi 1 7.3 12/21 27.1s
#59 GLM 5V Turbo medium Z.ai 1 7.2 11/21 23.1s
#64 MiMo-V2-Flash medium Xiaomi 1 7.2 12/21 20.1s
#68 Claude Opus 4.8 none Anthropic 1 7.0 12/21 3.47s

Modeli bora kwa Idadi ya Hakufuata maelekezo

Idadi ya Hakufuata maelekezo dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)