A test is fully passed only if every run passed for that test.No answer: 2Wrong answer: 2Response Time (avg)17.01sResponse Time (max)80.80sResponse Time (total)357.17s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)6.57sResponse Time (total)24.81s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.59sResponse Time (max)22.20sResponse Time (total)46.76s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.70sResponse Time (max)33.70sResponse Time (total)33.70s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.18sResponse Time (max)7.44sResponse Time (total)14.35s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)53.40sResponse Time (max)80.80sResponse Time (total)160.21s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.42sResponse Time (max)7.42sResponse Time (total)7.42s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.90sResponse Time (max)5.96sResponse Time (total)11.79s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)5.18sResponse Time (max)5.58sResponse Time (total)15.53s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.96sResponse Time (max)16.96sResponse Time (total)16.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)25.64sResponse Time (max)25.64sResponse Time (total)25.64s
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)7.45sResponse Time (max)11.88sResponse Time (total)14.90s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)76.66sResponse Time (max)76.66sResponse Time (total)76.66s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.37sResponse Time (max)7.37sResponse Time (total)7.37s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.04sResponse Time (max)5.04sResponse Time (total)5.04s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.43sResponse Time (total)2.43s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.71sResponse Time (max)4.75sResponse Time (total)9.41s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.73sResponse Time (max)9.73sResponse Time (total)9.73s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)63.24sResponse Time (max)63.24sResponse Time (total)63.24s
A test is fully passed only if every run passed for that test.Wrong answer: 4Extra formatting: 3Timed out: 1Response Time (avg)17.06sResponse Time (max)46.35sResponse Time (total)221.83s…
Coding
: 5.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)33.29sResponse Time (max)35.76sResponse Time (total)99.86s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.35sResponse Time (max)46.35sResponse Time (total)46.35s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.90sResponse Time (max)13.90sResponse Time (total)13.90s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.94sResponse Time (max)4.94sResponse Time (total)4.94s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.61sResponse Time (max)2.61sResponse Time (total)2.61s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.31sResponse Time (max)6.24sResponse Time (total)10.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.09sResponse Time (max)30.09sResponse Time (total)30.09s
A test is fully passed only if every run passed for that test.Wrong answer: 3No answer: 1Response Time (avg)9.66sResponse Time (max)38.03sResponse Time (total)202.89s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.95sResponse Time (max)5.76sResponse Time (total)15.79s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.33sResponse Time (max)22.27sResponse Time (total)45.98s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)38.03sResponse Time (max)38.03sResponse Time (total)38.03s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.29sResponse Time (max)19.64sResponse Time (total)24.59s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)14.15sResponse Time (max)28.41sResponse Time (total)42.46s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)2.46sResponse Time (total)2.46s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.32sResponse Time (max)5.07sResponse Time (total)6.63s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.95sResponse Time (max)4.33sResponse Time (total)11.85s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.96sResponse Time (max)8.96sResponse Time (total)8.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)6.14sResponse Time (max)6.14sResponse Time (total)6.14s
A test is fully passed only if every run passed for that test.Wrong answer: 3Timed out: 1Response Time (avg)4.73sResponse Time (max)23.18sResponse Time (total)94.51s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.85sResponse Time (max)2.71sResponse Time (total)7.38s
Coding
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.96sResponse Time (max)23.18sResponse Time (total)38.89s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.45sResponse Time (max)21.45sResponse Time (total)21.45s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.37sResponse Time (max)3.30sResponse Time (total)4.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)1.17sResponse Time (max)1.40sResponse Time (total)2.35s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.66sResponse Time (total)3.14s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.89sResponse Time (total)7.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)4.17sResponse Time (total)4.17s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.25sResponse Time (max)2.25sResponse Time (total)2.25s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)3.40sResponse Time (max)6.36sResponse Time (total)13.58s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.73sResponse Time (max)17.73sResponse Time (total)17.73s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.77sResponse Time (max)1.93sResponse Time (total)3.53s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.66sResponse Time (max)2.16sResponse Time (total)4.99s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)3.48sResponse Time (total)3.48s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.40sResponse Time (total)2.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.74sResponse Time (max)3.46sResponse Time (total)8.22s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.35sResponse Time (max)5.35sResponse Time (total)5.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.02sResponse Time (max)18.27sResponse Time (total)57.44s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.75sResponse Time (total)8.50s
Coding
: 3.3 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.84sResponse Time (max)2.84sResponse Time (total)2.84s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.27sResponse Time (max)18.27sResponse Time (total)18.27s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.15sResponse Time (max)2.33sResponse Time (total)4.29s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.19sResponse Time (max)1.40sResponse Time (total)3.58s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.47sResponse Time (max)3.47sResponse Time (total)3.47s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)1.68sResponse Time (total)2.91s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)3.72sResponse Time (total)7.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)4.74sResponse Time (total)4.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46s
Coding
: 6.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)63.38sResponse Time (max)95.88sResponse Time (total)190.15s
Combined
: 6.9 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)15.06sResponse Time (max)15.06sResponse Time (total)15.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.60sResponse Time (max)9.92sResponse Time (total)19.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.15sResponse Time (max)67.08sResponse Time (total)114.45s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.09sResponse Time (max)11.09sResponse Time (total)11.09s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)5.23sResponse Time (total)7.47s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.24sResponse Time (max)16.95sResponse Time (total)30.72s
Tool Calling
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)12.53sResponse Time (max)12.53sResponse Time (total)12.53s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
A test is fully passed only if every run passed for that test.Wrong answer: 3No answer: 2Timed out: 1Response Time (avg)23.28sResponse Time (max)101.36sResponse Time (total)488.94s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.89sResponse Time (max)7.10sResponse Time (total)23.56s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.96sResponse Time (max)60.28sResponse Time (total)122.88s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)51.96sResponse Time (max)51.96sResponse Time (total)51.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.44sResponse Time (max)16.02sResponse Time (total)26.88s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.39sResponse Time (max)17.39sResponse Time (total)17.39s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.90sResponse Time (max)10.13sResponse Time (total)15.80s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)13.13sResponse Time (max)25.90sResponse Time (total)39.40s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.41sResponse Time (max)20.41sResponse Time (total)20.41s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)34.25sResponse Time (max)34.25sResponse Time (total)34.25s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.82sResponse Time (max)7.69sResponse Time (total)19.26s
Coding
: 8.2 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)45.90sResponse Time (max)95.57sResponse Time (total)137.71s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.88sResponse Time (max)13.88sResponse Time (total)13.88s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.19sResponse Time (max)6.42sResponse Time (total)12.38s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)71.07sResponse Time (max)194.23sResponse Time (total)213.22s
General Intelligence
: 6.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.05sResponse Time (max)10.05sResponse Time (total)10.05s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.38sResponse Time (max)5.70sResponse Time (total)10.77s
Puzzle Solving
: 8.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.23sResponse Time (max)7.26sResponse Time (total)15.69s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.84sResponse Time (max)9.84sResponse Time (total)9.84s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.17sResponse Time (max)40.17sResponse Time (total)40.17s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)5.19sResponse Time (max)9.79sResponse Time (total)15.56s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.84sResponse Time (max)23.84sResponse Time (total)23.84s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.43sResponse Time (max)3.43sResponse Time (total)3.43s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.53sResponse Time (max)2.54sResponse Time (total)5.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.11sResponse Time (max)4.11sResponse Time (total)4.11s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.67sResponse Time (max)4.67sResponse Time (total)4.67s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.31sResponse Time (max)14.20sResponse Time (total)33.24s
Coding
: 4.6 A test is fully passed only if every run passed for that test.Extra formatting: 1No answer: 1Timed out: 1Response Time (avg)109.63sResponse Time (max)172.60sResponse Time (total)328.90s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)43.11sResponse Time (max)43.11sResponse Time (total)43.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.33sResponse Time (max)9.40sResponse Time (total)18.66s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)29.77sResponse Time (max)32.22sResponse Time (total)89.30s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.95sResponse Time (max)20.95sResponse Time (total)20.95s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.47sResponse Time (max)10.16sResponse Time (total)14.94s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)31.64sResponse Time (max)46.04sResponse Time (total)94.91s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)29.40sResponse Time (max)29.40sResponse Time (total)29.40s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.66sResponse Time (max)25.06sResponse Time (total)47.32s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)74.30sResponse Time (max)99.85sResponse Time (total)222.91s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.96sResponse Time (max)28.96sResponse Time (total)28.96s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)8.90sResponse Time (max)8.90sResponse Time (total)8.90s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.25sResponse Time (max)7.25sResponse Time (total)7.25s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.33sResponse Time (max)16.34sResponse Time (total)22.66s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.93sResponse Time (max)15.93sResponse Time (total)15.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)67.37sResponse Time (max)67.37sResponse Time (total)67.37s
A test is fully passed only if every run passed for that test.Wrong answer: 8Did not follow instructions: 1Response Time (avg)6.34sResponse Time (max)20.69sResponse Time (total)133.19s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.70sResponse Time (max)5.66sResponse Time (total)14.80s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.69sResponse Time (max)20.69sResponse Time (total)20.69s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.17sResponse Time (max)11.71sResponse Time (total)14.35s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.50sResponse Time (max)7.79sResponse Time (total)19.51s
General Intelligence
: 6.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.42sResponse Time (max)4.42sResponse Time (total)4.42s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.84sResponse Time (max)4.88sResponse Time (total)7.68s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.31sResponse Time (max)3.63sResponse Time (total)9.92s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.76sResponse Time (max)15.76sResponse Time (total)15.76s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
A test is fully passed only if every run passed for that test.Wrong answer: 13Invalid tool call: 1Response Time (avg)4.10sResponse Time (max)32.57sResponse Time (total)86.18s…
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.11sResponse Time (max)3.94sResponse Time (total)8.46s
Coding
: 3.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.96sResponse Time (max)9.79sResponse Time (total)14.89s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)32.57sResponse Time (max)32.57sResponse Time (total)32.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.08sResponse Time (max)1.62sResponse Time (total)2.15s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.99sResponse Time (max)3.99sResponse Time (total)5.98s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)790msResponse Time (max)790msResponse Time (total)790ms
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.98sResponse Time (max)2.28sResponse Time (total)3.97s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.45sResponse Time (max)2.09sResponse Time (total)4.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.68sResponse Time (max)10.68sResponse Time (total)10.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.34sResponse Time (total)2.34s
Coding
: 3.2 A test is fully passed only if every run passed for that test.Timed out: 2No answer: 1Response Time (avg)55.33sResponse Time (max)89.40sResponse Time (total)110.66s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)65.57sResponse Time (max)65.57sResponse Time (total)65.57s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)1.51sResponse Time (max)1.51sResponse Time (total)1.51s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)174.55sResponse Time (max)174.55sResponse Time (total)174.55s
General Intelligence
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.14sResponse Time (max)18.14sResponse Time (total)18.14s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.95sResponse Time (max)15.95sResponse Time (total)15.95s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.13sResponse Time (max)11.13sResponse Time (total)11.13s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)2.99sResponse Time (max)6.51sResponse Time (total)62.74s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.13sResponse Time (max)5.90sResponse Time (total)12.50s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.13sResponse Time (max)5.30sResponse Time (total)9.40s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.51sResponse Time (max)6.51sResponse Time (total)6.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)5.69sResponse Time (total)7.62s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.09sResponse Time (max)2.39sResponse Time (total)6.26s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)2.43sResponse Time (total)3.93s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23s
A test is fully passed only if every run passed for that test.Wrong answer: 13Did not follow instructions: 2Response Time (avg)2.82sResponse Time (max)8.21sResponse Time (total)59.29s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.84sResponse Time (max)4.15sResponse Time (total)11.35s
Coding
: 3.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.41sResponse Time (max)3.93sResponse Time (total)7.22s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.89sResponse Time (max)4.89sResponse Time (total)4.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.48sResponse Time (total)4.95s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.97sResponse Time (max)2.65sResponse Time (total)5.92s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.13sResponse Time (max)2.53sResponse Time (total)4.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.21sResponse Time (max)8.21sResponse Time (total)8.21s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.37sResponse Time (max)2.37sResponse Time (total)2.37s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)4.03sResponse Time (max)11.07sResponse Time (total)56.37s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.37sResponse Time (max)3.39sResponse Time (total)4.75s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.12sResponse Time (max)8.84sResponse Time (total)15.36s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.98sResponse Time (max)4.98sResponse Time (total)4.98s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.24sResponse Time (max)2.24sResponse Time (total)2.24s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.48sResponse Time (max)1.48sResponse Time (total)1.48s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.91sResponse Time (max)2.08sResponse Time (total)3.82s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.07sResponse Time (max)11.07sResponse Time (total)11.07s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.62sResponse Time (max)3.62sResponse Time (total)3.62s
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)5.51sResponse Time (max)6.59sResponse Time (total)11.02s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.54sResponse Time (max)5.57sResponse Time (total)7.62s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.22sResponse Time (max)3.22sResponse Time (total)3.22s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.82sResponse Time (max)4.82sResponse Time (total)4.82s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)744msResponse Time (max)744msResponse Time (total)744ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.59sResponse Time (max)1.59sResponse Time (total)1.59s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)888msResponse Time (max)888msResponse Time (total)888ms
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.05sResponse Time (max)7.05sResponse Time (total)7.05s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)692msResponse Time (max)692msResponse Time (total)692ms