AI BENCHY
Linganisha Chati Mbinu
❤️ Made by XCS
Your ad here

Kushindwa kwa kategoria za AI BENCHY

Mbinu za kupinga AI
Jibu lisilo sahihi

Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Mbinu za kupinga AI, ili uone udhaifu haraka.

Modeli zilizoonyeshwa

26

Jumla ya kushindwa

53

Modeli iliyoathirika zaidi

Qwen3.5-Flash 3
Nafasi Modeli Kampuni Idadi ya Jibu lisilo sahihi Alama ya kategoria Majaribio sahihi Muda wa majibu (wastani)
#37 Qwen3.5-Flash none Qwen 3 2.3 0/3 1.62s
#38 Gemini 2.5 Flash none Google 3 10.0 0/3 668ms
#42 Qwen3.5-35B-A3B none Qwen 3 10.0 0/3 1.76s
#44 GPT-5.4 none OpenAI 3 10.0 0/3 1.41s
#45 Trinity Large Preview none Arcee AI 3 10.0 0/3 3.59s
#46 Kimi K2.5 none Moonshot AI 3 2.7 0/3 11.4s
#49 GLM 4.7 Flash none Z.ai 3 10.0 0/3 6.59s
#51 Mercury 2 none Inception 3 10.0 0/3 466ms
#54 MiMo-V2-Flash none Xiaomi 3 10.0 0/3 1.36s
#55 LFM2-24B-A2B none Liquid 3 10.0 0/3 471ms
#29 Qwen3.5 Plus 2026-02-15 none Qwen 2 4.0 1/3 2.74s
#31 GLM 5 none Z.ai 2 4.0 1/3 3.39s
#40 Qwen3.5-122B-A10B none Qwen 2 4.0 1/3 927ms
#41 Qwen3.5-27B none Qwen 2 4.0 1/3 796ms
#47 GPT-4o-mini none OpenAI 2 4.0 1/3 1.83s
#50 Qwen3 Coder Next medium Qwen 2 1.3 0/3 15.3s
#53 Grok 4.1 Fast none X AI 2 1.3 0/3 1.73s
#16 Gemini 2.5 Flash medium Google 1 7.3 2/3 6.98s
#17 Gemini 3.1 Flash Lite Preview low Google 1 7.0 2/3 2.18s
#18 DeepSeek V3.2 medium DeepSeek 1 7.0 2/3 33.4s
#20 Gemini 3 Flash Preview none Google 1 7.0 2/3 1.59s
#22 Gemini 3.1 Flash Lite Preview none Google 1 6.0 1/3 1.16s
#33 DeepSeek V3.2 none DeepSeek 1 10.0 0/3 8.79s
#34 GPT-5 Nano medium OpenAI 1 7.0 2/3 37.7s
#48 Qwen3 Coder Next none Qwen 1 2.3 0/3 4.39s
#52 GLM 4.7 Flash medium Z.ai 1 4.0 1/3 27.1s

Modeli bora kwa Idadi ya Jibu lisilo sahihi

Idadi ya Jibu lisilo sahihi dhidi ya wastani wa alama

Modeli bora kwa Muda wa majibu (wastani)

Modeli bora kwa Gharama iliyopotezwa inayokadiriwa