AI BENCHY Category
Anti-AI Tricks Ranking
See which AI models perform best on Anti-AI Tricks, which ones stay reliable, and where the biggest gaps appear.
| Rank | Model | Company | Anti-AI Tricks Score | Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #136 | Elephant Alpha medium | Openrouter | 6.6 | 5.1 | 2/4 | 1.19s |
| #137 | Elephant Alpha none | Openrouter | 6.6 | 5.1 | 2/4 | 963ms |
| #42 | GPT-5.2 medium | OpenAI | 6.5 | 7.5 | 2/4 | 7.81s |
| #32 | Gemini 3.5 Flash minimal | 6.5 | 7.7 | 2/4 | 892ms | |
| #34 | Qwen3.7 Max none | Qwen | 6.5 | 7.7 | 2/4 | 1.08s |
| #52 | Claude Sonnet 4.6 medium | Anthropic | 6.5 | 7.4 | 2/4 | 2.98s |
| #68 | Claude Opus 4.8 none | Anthropic | 6.5 | 7.0 | 2/4 | 3.40s |
| #85 | Gemma 4 31B none | 6.5 | 6.5 | 2/4 | 1.85s | |
| #88 | Qwen3.7 Plus none | Qwen | 6.5 | 6.4 | 2/4 | 1.38s |
| #92 | Laguna M.1 medium | Poolside | 6.5 | 6.4 | 2/4 | 4.87s |
| #126 | gpt-oss-120b none | OpenAI | 6.5 | 5.4 | 2/4 | 32.8s |
| #94 | GPT-5 Nano medium | OpenAI | 6.5 | 6.3 | 2/4 | 25.5s |
| #82 | Hy3 preview high | Tencent | 6.4 | 6.6 | 2/4 | 15.1s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 6.4 | 6.0 | 2/4 | 16.5s |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 6.4 | 4.6 | 2/4 | 1.20s |