Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI: Hakufuata maelekezo
Mbinu za kupinga AI
Hakufuata maelekezo
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakufuata maelekezo katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Jumla ya gharama ↓.
Sababu za kushindwa
29/29
Chuja miundo
Hakuna miundo inayolingana na utafutaji na vichujio vya sasa.
| Nafasi | Modeli | Kampuni | Idadi ya Hakufuata maelekezo | Alama ya kategoria | Jumla ya gharama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|---|
| #35 | Kimi K2.6 medium | Moonshot AI | 1 | 7.0 | $0.889 | 2/4 | 11.6s |
| #22 | GPT-5.2 medium | OpenAI | 1 | 6.5 | $0.548 | 2/4 | 7.81s |
| #56 | GLM 5V Turbo medium | Z.ai | 1 | 7.2 | $0.457 | 2/4 | 10.8s |
| #45 | GPT-5.3 Chat none | OpenAI | 1 | 6.7 | $0.433 | 2/4 | 3.86s |
| #20 | Step 3.7 Flash medium | Stepfun | 1 | 8.7 | $0.376 | 3/4 | 9.65s |
| #146 | MiniMax M2.5 medium | Minimax | 1 | 7.9 | $0.303 | 2/4 | 20.8s |
| #16 | GPT-5 Mini medium | OpenAI | 1 | 7.1 | $0.159 | 2/4 | 13.9s |
| #40 | MiniMax M3 medium | Minimax | 1 | 5.5 | $0.131 | 1/4 | 14.9s |
| #127 | MiniMax M2.7 medium | Minimax | 1 | 7.9 | $0.104 | 2/4 | 40.3s |
| #34 | Gemini 3.1 Flash Lite medium | 1 | 9.1 | $0.071 | 3/4 | 2.39s | |
| #32 | Gemini 3.1 Flash Lite Preview medium | 1 | 9.1 | $0.068 | 3/4 | 2.33s | |
| #44 | Mercury 2 medium | Inception | 1 | 6.9 | $0.058 | 2/4 | 1.12s |
| #157 | GLM 4.7 Flash medium | Z.ai | 1 | 4.7 | $0.054 | 1/4 | 15.0s |
| #58 | DeepSeek V4 Pro none | DeepSeek | 1 | 3.2 | $0.034 | 0/4 | 4.02s |
| #144 | Ring-2.6-1T none | Inclusionai | 1 | 9.2 | $0.026 | 3/4 | 43.3s |