AI BENCHY
Your ad here

AI BENCHY Category Failures

Data parsing and extraction: Wrong answer

Data parsing and extraction
Wrong answer

See which AI models are most likely to hit Wrong answer on Data parsing and extraction, so you can spot weak points faster.

Models Shown

15

Total Failures

19

Most Affected Model

GPT-5 Nano 2
Rank Model Company Wrong answer Count Category Score Tests Correct Response Time (avg)
#57 GPT-5 Nano medium OpenAI 2 3.7 0/2 21.4s
#71 MiniMax M2.5 medium Minimax 2 4.6 0/2 7.48s
#98 LFM2-24B-A2B none Liquid 2 3.0 0/2 714ms
#23 MiMo-V2-Pro medium Xiaomi 1 7.3 1/2 17.2s
#54 Mercury 2 medium Inception 1 7.3 1/2 1.11s
#64 DeepSeek V3.2 none DeepSeek 1 6.3 1/2 9.42s
#68 gpt-oss-120b medium OpenAI 1 6.4 1/2 1.98s
#74 GLM 4.7 Flash none Z.ai 1 7.3 1/2 4.82s
#76 Kimi K2.5 none Moonshot AI 1 7.3 1/2 42.1s
#80 MiniMax M2.7 medium Minimax 1 6.3 1/2 21.9s
#81 Elephant medium Openrouter 1 6.5 1/2 979ms
#85 Elephant none Openrouter 1 6.5 1/2 1.04s
#87 Qwen3 Coder Next none Qwen 1 6.5 1/2 1.32s
#91 Mercury 2 none Inception 1 7.3 1/2 667ms
#92 Qwen3 Coder Next medium Qwen 1 6.5 1/2 81.8s

Top Models by Wrong answer Count

Wrong answer Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost