AI BENCHY
Advertise here

Kushindwa kwa AI BENCHY

Kushindwa kwa Hakuna jibu

Ona ni modeli gani za AI hukutana na Hakuna jibu mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

43

Modeli iliyoathirika zaidi

Step 3.7 Flash 4
Nafasi Modeli Kampuni Idadi ya Hakuna jibu Alama Majaribio sahihi Muda wa majibu (wastani)
#46 Qwen3.6 35B A3B medium Qwen 1 7.4 13/21 18.1s
#53 Gemini 3.1 Flash Lite high Google 1 7.3 10/18 62.0s
#55 GLM 5.1 medium Z.ai 1 7.3 12/21 33.7s
#56 MiMo-V2.5 medium Xiaomi 1 7.3 12/21 27.1s
#57 Step 3.7 Flash low Stepfun 1 7.3 12/21 15.7s
#60 Kimi K2.6 medium Moonshot AI 1 7.2 12/21 71.7s
#62 Step 3.5 Flash medium Stepfun 1 7.2 11/20 72.5s
#67 MiniMax M3 medium Minimax 1 7.1 11/21 68.2s
#68 Claude Opus 4.8 none Anthropic 1 7.0 12/21 3.47s
#86 Grok 4.1 Fast medium X AI 1 6.5 9/19 23.8s
#92 Laguna M.1 medium Poolside 1 6.4 9/19 14.7s
#105 Nemotron 3 Super medium NVIDIA 1 5.8 8/21 32.0s
#129 MiniMax M2.5 medium Minimax 1 5.3 5/21 65.4s
#130 MiniMax M2.7 medium Minimax 1 5.3 5/21 38.2s
#149 Nemotron 3 Nano Omni 30b A3b Reasoning medium NVIDIA 1 4.6 4/19 17.1s

Modeli bora kwa Idadi ya Hakuna jibu

Idadi ya Hakuna jibu dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)