AI BENCHY
Advertise here

AI BENCHY Category Failures

Instructions following: No answer

Instructions following
No answer

See which AI models are most likely to hit No answer on Instructions following, so you can spot weak points faster. Sort by: Tests Correct ↓.

Models Shown

2

Total Failures

2

Most Affected Model

Gemini 3.1 Flash Lite 1

Top Models by No answer Count

No answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost