Gemini 2.5 Flash vs Qwen3.5-27B benchmark comparison: Gemini 2.5 Flash leads on average score with 8.2 vs 7.9. Gemini 2.5 Flash has the lower benchmark cost at $0.379 vs $0.536. Gemini 2.5 Flash is faster at 15.49s vs 68.39s, with pass rates of 69.8% vs 73.0%.
Recommended model: Gemini 2.5 Flash - It has the best score here (8.2), while responding about 4.4x faster than Qwen3.5-27B.
8.2Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#24
#29
Reliability
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
9.6Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
8.5Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)15.49sResponse Time (max)95.48sResponse Time (total)325.39sA test is fully passed only if every run passed for that test.…
8.4Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
75.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.30sResponse Time (max)15.56sResponse Time (total)25.21sA test is fully passed only if every run passed for that test.…
8.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.9Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
91.7%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.75sResponse Time (max)49.95sResponse Time (total)79.01sA test is fully passed only if every run passed for that test.…
7.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)41.01sResponse Time (max)92.88sResponse Time (total)123.04sA test is fully passed only if every run passed for that test.…
6.2Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.1Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
55.6%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)160.69sResponse Time (max)234.36sResponse Time (total)482.07sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.44sResponse Time (max)28.44sResponse Time (total)28.44sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)163.96sResponse Time (max)163.96sResponse Time (total)163.96sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.06sResponse Time (max)5.06sResponse Time (total)8.11sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.26sResponse Time (max)32.03sResponse Time (total)60.52sA test is fully passed only if every run passed for that test.…
5.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
55.6%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)37.34sResponse Time (max)95.48sResponse Time (total)112.01sA test is fully passed only if every run passed for that test.…
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)79.53sResponse Time (max)95.52sResponse Time (total)238.59sA test is fully passed only if every run passed for that test.…
4.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86sA test is fully passed only if every run passed for that test.…
6.1Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
3.1Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)101.41sResponse Time (max)101.41sResponse Time (total)101.41sA test is fully passed only if every run passed for that test.…
9.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.62sResponse Time (max)2.78sResponse Time (total)5.24sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.66sResponse Time (max)32.25sResponse Time (total)39.32sA test is fully passed only if every run passed for that test.…
7.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.18sResponse Time (max)4.05sResponse Time (total)9.54sA test is fully passed only if every run passed for that test.…
8.2Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.7Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
77.8%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)59.60sResponse Time (max)123.57sResponse Time (total)178.80sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)6.20sResponse Time (total)6.20sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.45sResponse Time (max)7.45sResponse Time (total)7.45sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.76sResponse Time (max)2.76sResponse Time (total)2.76sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)85.11sResponse Time (max)85.11sResponse Time (total)85.11sA test is fully passed only if every run passed for that test.…