AI BENCHY
Your ad here

Kushindwa kwa kategoria za AI BENCHY

Utatuzi wa mafumbo: Jibu lisilo sahihi

Utatuzi wa mafumbo
Jibu lisilo sahihi

Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Utatuzi wa mafumbo, ili uone udhaifu haraka.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

85

Modeli iliyoathirika zaidi

Kimi K2.5 3
Nafasi Modeli Kampuni Idadi ya Jibu lisilo sahihi Alama ya kategoria Majaribio sahihi Muda wa majibu (wastani)
#76 Kimi K2.5 none Moonshot AI 3 3.1 0/3 4.73s
#87 Qwen3 Coder Next none Qwen 3 3.2 0/3 22.9s
#89 GPT-4o-mini none OpenAI 3 3.7 0/3 1.30s
#91 Mercury 2 none Inception 3 3.1 0/3 533ms
#94 MiMo-V2-Flash none Xiaomi 3 3.6 0/3 1.38s
#95 Grok 4.1 Fast none X AI 3 3.2 0/3 1.28s
#59 Qwen3.5-Flash none Qwen 2 3.3 0/3 5.90s
#61 Seed-2.0-Lite none Bytedance Seed 2 5.2 1/3 2.46s
#63 Qwen3.5-35B-A3B none Qwen 2 3.9 0/3 1.34s
#70 Qwen3.5-122B-A10B none Qwen 2 5.4 1/3 982ms
#78 Trinity Large Preview none Arcee AI 2 5.4 1/3 3.30s
#85 Elephant none Openrouter 2 3.3 0/3 849ms
#93 GLM 4.7 Flash medium Z.ai 2 2.9 0/3 12.9s
#96 GPT-5.4 Nano none OpenAI 2 3.7 0/3 1.29s
#11 Gemini 3.1 Flash Lite Preview high Google 1 7.7 2/3 46.3s

Modeli bora kwa Idadi ya Jibu lisilo sahihi

Idadi ya Jibu lisilo sahihi dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)

Modeli bora kwa Gharama iliyopotezwa inayokadiriwa