AI BENCHY Category Failures
Anti-AI Tricks: Wrong answer
Anti-AI Tricks
Wrong answer
See which AI models are most likely to hit Wrong answer on Anti-AI Tricks, so you can spot weak points faster. Sort by: Tests Correct ↑.
Failure Reasons
| Rank | Model | Company | Wrong answer Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #86 | Grok 4.1 Fast medium | X AI | 1 | 8.7 | 3/4 | 3.81s |
| #87 | Gemini 3.1 Flash Lite minimal | 1 | 8.3 | 3/4 | 1.10s | |
| #100 | Grok Build 0.1 none | X AI | 1 | 8.7 | 3/4 | 6.30s |
| #102 | Gemma 4 26B A4B none | 1 | 8.3 | 3/4 | 1.28s | |
| #119 | Cobuddy medium | Baidu | 1 | 8.7 | 3/4 | 10.00s |