AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Category

Data parsing and extraction Ranking

See which AI models perform best on Data parsing and extraction, which ones stay reliable, and where the biggest gaps appear. Sort by: Response Time (avg) ↑.

Models Shown

55

Average Data parsing and extraction Score

8.8

Rank Model Company Data parsing and extraction Score Avg Score Tests Correct Response Time (avg)
#21 MiMo-V2-Flash medium Xiaomi 5.5 7.2 1/2 0ms
#38 Gemini 2.5 Flash none Google 9.9 5.2 2/2 652ms
#51 Mercury 2 none Inception 5.5 3.4 1/2 667ms
#55 LFM2-24B-A2B none Liquid 10.0 2.6 0/2 714ms
#53 Grok 4.1 Fast none X AI 9.9 2.9 2/2 943ms
#40 Qwen3.5-122B-A10B none Qwen 9.9 5.0 2/2 1.01s
#44 GPT-5.4 none OpenAI 9.9 4.5 2/2 1.04s
#36 Mercury 2 medium Inception 5.5 5.3 1/2 1.11s
#42 Qwen3.5-35B-A3B none Qwen 9.9 4.7 2/2 1.16s
#22 Gemini 3.1 Flash Lite Preview none Google 9.9 7.1 2/2 1.22s
#47 GPT-4o-mini none OpenAI 9.9 4.0 2/2 1.27s
#48 Qwen3 Coder Next none Qwen 5.4 4.0 1/2 1.32s
#20 Gemini 3 Flash Preview none Google 9.9 7.2 2/2 1.41s
#41 Qwen3.5-27B none Qwen 9.9 4.9 2/2 1.43s
#52 GLM 4.7 Flash medium Z.ai 5.0 3.1 1/2 1.51s
#37 Qwen3.5-Flash none Qwen 9.9 5.2 2/2 1.57s
#29 Qwen3.5 Plus 2026-02-15 none Qwen 9.9 6.2 2/2 1.89s
#39 gpt-oss-120b medium OpenAI 5.5 5.1 1/2 1.98s
#19 GPT-5.3 Chat none OpenAI 9.9 7.3 2/2 2.21s
#12 Gemini 3.1 Flash Lite Preview medium Google 9.9 7.5 2/2 2.29s
#17 Gemini 3.1 Flash Lite Preview low Google 9.9 7.3 2/2 3.00s
#15 GPT-5.2 Chat none OpenAI 9.9 7.4 2/2 3.05s
#3 GPT-5.3-Codex medium OpenAI 9.9 8.4 2/2 3.07s
#27 GPT-5.2 medium OpenAI 9.9 6.5 2/2 3.15s
#45 Trinity Large Preview none Arcee AI 9.9 4.2 2/2 3.26s
#25 Claude Sonnet 4.6 none Anthropic 9.9 6.8 2/2 3.43s
#16 Gemini 2.5 Flash medium Google 9.9 7.4 2/2 4.06s
#1 Gemini 3 Flash Preview medium Google 9.9 10.0 2/2 4.72s
#49 GLM 4.7 Flash none Z.ai 5.4 3.9 1/2 4.82s
#9 GPT-5.4 medium OpenAI 9.9 8.0 2/2 5.32s
#31 GLM 5 none Z.ai 9.9 6.0 2/2 5.78s
#30 Grok 4.1 Fast medium X AI 9.9 6.2 2/2 6.63s
#8 Gemini 3.1 Flash Lite Preview high Google 9.9 8.2 2/2 7.16s
#26 Claude Opus 4.6 medium Anthropic 9.9 6.6 2/2 7.37s
#43 MiniMax M2.5 medium Minimax 10.0 4.7 0/2 7.48s
#2 Gemini 3.1 Pro Preview medium Google 9.9 9.4 2/2 7.72s
#14 GLM 5 medium Z.ai 5.0 7.4 1/2 8.90s
#5 Gemini 3 Flash Preview low Google 9.9 8.2 2/2 9.40s
#33 DeepSeek V3.2 none DeepSeek 5.4 5.5 1/2 9.42s
#6 Gemini 3 Pro Preview medium Google 9.9 8.2 2/2 10.8s
#32 GPT-5 Mini medium OpenAI 9.9 6.0 2/2 12.6s
#11 Claude Sonnet 4.6 medium Anthropic 9.9 7.7 2/2 13.9s
#13 Step 3.5 Flash medium Stepfun 10.0 7.4 2/2 15.0s
#54 MiMo-V2-Flash none Xiaomi 10.0 2.9 0/2 19.7s
#34 GPT-5 Nano medium OpenAI 10.0 5.5 0/2 21.4s
#10 Qwen3.5-122B-A10B medium Qwen 9.9 7.7 2/2 23.4s
#23 Seed-2.0-Mini medium Bytedance Seed 9.9 6.9 2/2 24.3s
#7 Qwen3.5-27B medium Qwen 9.9 8.2 2/2 30.3s
#18 DeepSeek V3.2 medium DeepSeek 9.9 7.3 2/2 36.1s
#46 Kimi K2.5 none Moonshot AI 5.4 4.1 1/2 42.1s
#4 Qwen3.5 Plus 2026-02-15 medium Qwen 9.9 8.3 2/2 46.9s
#28 Kimi K2.5 medium Moonshot AI 9.9 6.4 2/2 49.8s
#24 Qwen3.5-Flash medium Qwen 5.5 6.9 1/2 57.0s
#35 Qwen3.5-35B-A3B medium Qwen 5.5 5.5 1/2 59.3s
#50 Qwen3 Coder Next medium Qwen 5.4 3.5 1/2 81.8s

Top Models by Data parsing and extraction Score

Data parsing and extraction Score vs Total Cost

Top Models by Response Time (avg)