AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

Kushindwa kwa kategoria za AI BENCHY

Mbinu za kupinga AI: Jibu lisilo sahihi

Mbinu za kupinga AI
Jibu lisilo sahihi

Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Mbinu za kupinga AI, ili uone udhaifu haraka.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

245

Modeli iliyoathirika zaidi

Gemini 2.5 Flash 4
Nafasi Modeli Kampuni Idadi ya Jibu lisilo sahihi Alama ya kategoria Majaribio sahihi Muda wa majibu (wastani)
#142 Mistral Small 4 none Mistral 4 3.4 0/4 395ms
#143 MiMo-V2.5 none Xiaomi 4 3.5 0/4 2.19s
#144 GPT-5.4 Mini none OpenAI 4 3.1 0/4 929ms
#148 GPT-5.4 Nano none OpenAI 4 3.5 0/4 1.18s
#151 Trinity Large Preview none Arcee AI 4 3.1 0/4 2.07s
#152 MiMo-V2-Flash none Xiaomi 4 3.2 0/4 1.19s
#153 Qwen3.6 35B A3B none Qwen 4 3.6 0/4 2.10s
#154 Qwen3.5-9B none Qwen 4 3.1 0/4 1.71s
#155 Mercury 2 none Inception 4 3.0 0/4 483ms
#159 Ling-2.6-1T none Inclusionai 4 3.4 0/4 6.55s
#74 Qwen3.6 Max Preview none Qwen 3 5.2 1/4 2.63s
#95 Qwen3.5 Plus 2026-02-15 none Qwen 3 4.8 1/4 1.91s
#98 GLM 5 none Z.ai 3 4.8 1/4 2.37s
#101 Mimo V2 Omni none Xiaomi 3 3.6 0/4 1.63s
#109 GLM 5V Turbo none Z.ai 3 4.8 1/4 3.13s

Modeli bora kwa Idadi ya Jibu lisilo sahihi

Idadi ya Jibu lisilo sahihi dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)

Modeli bora kwa Gharama iliyopotezwa inayokadiriwa