Coding
: 6.8 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)6.73sResponse Time (max)9.79sResponse Time (total)13.46s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.84sResponse Time (max)23.84sResponse Time (total)23.84s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.43sResponse Time (max)3.43sResponse Time (total)3.43s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.53sResponse Time (max)2.54sResponse Time (total)5.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.11sResponse Time (max)4.11sResponse Time (total)4.11s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.67sResponse Time (max)4.67sResponse Time (total)4.67s
A test is fully passed only if every run passed for that test.Wrong answer: 13Did not follow instructions: 2Response Time (avg)5.47sResponse Time (max)16.45sResponse Time (total)109.43s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.46sResponse Time (max)9.94sResponse Time (total)17.83s
Coding
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.02sResponse Time (max)3.05sResponse Time (total)6.04s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)16.45sResponse Time (max)16.45sResponse Time (total)16.45s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.92sResponse Time (max)13.23sResponse Time (total)15.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.23sResponse Time (max)14.38sResponse Time (total)18.70s
General Intelligence
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)950msResponse Time (max)950msResponse Time (total)950ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)804msResponse Time (max)921msResponse Time (total)1.61s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)16.00sResponse Time (max)16.00sResponse Time (total)16.00s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.94sResponse Time (max)8.94sResponse Time (total)8.94s
A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)5.81sResponse Time (max)14.72sResponse Time (total)116.25s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)4.31sResponse Time (total)13.94s
Coding
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.66sResponse Time (max)6.94sResponse Time (total)13.31s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.40sResponse Time (max)14.72sResponse Time (total)18.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.05sResponse Time (max)14.40sResponse Time (total)24.15s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.68sResponse Time (max)3.68sResponse Time (total)3.68s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.02sResponse Time (max)7.35sResponse Time (total)14.03s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.77sResponse Time (max)10.27sResponse Time (total)17.32s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.99sResponse Time (max)4.99sResponse Time (total)4.99s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.28sResponse Time (max)2.09sResponse Time (total)5.13s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)3.83sResponse Time (max)7.07sResponse Time (total)7.66s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.70sResponse Time (max)2.21sResponse Time (total)3.41s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.49sResponse Time (max)4.23sResponse Time (total)7.48s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)690msResponse Time (max)878msResponse Time (total)1.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)57.10sResponse Time (max)57.10sResponse Time (total)57.10s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)778msResponse Time (max)778msResponse Time (total)778ms
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)6.13sResponse Time (max)18.33sResponse Time (total)122.61s…
Coding
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.52sResponse Time (max)11.72sResponse Time (total)21.03s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.21sResponse Time (max)2.52sResponse Time (total)4.42s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.01sResponse Time (max)18.33sResponse Time (total)39.04s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)4.60sResponse Time (total)7.01s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.99sResponse Time (max)3.16sResponse Time (total)8.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.36sResponse Time (max)8.36sResponse Time (total)8.36s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.38sResponse Time (max)4.38sResponse Time (total)4.38s
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)2.68sResponse Time (max)3.09sResponse Time (total)8.04s
Coding
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.36sResponse Time (max)14.36sResponse Time (total)14.36s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.92sResponse Time (max)15.92sResponse Time (total)15.92s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)9.34sResponse Time (max)16.71sResponse Time (total)18.68s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)11.12sResponse Time (max)29.11sResponse Time (total)33.35s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.68sResponse Time (max)2.03sResponse Time (total)3.36s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.93sResponse Time (max)1.97sResponse Time (total)3.87s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)6.82sResponse Time (max)38.52sResponse Time (total)136.34s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.40sResponse Time (max)4.78sResponse Time (total)13.59s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.05sResponse Time (max)8.97sResponse Time (total)16.09s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.12sResponse Time (max)9.12sResponse Time (total)9.12s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.05sResponse Time (max)3.33sResponse Time (total)6.10s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)17.78sResponse Time (max)38.52sResponse Time (total)53.33s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.51sResponse Time (max)6.55sResponse Time (total)11.02s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.10sResponse Time (max)5.04sResponse Time (total)12.31s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.68sResponse Time (max)4.68sResponse Time (total)4.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.89sResponse Time (max)6.89sResponse Time (total)6.89s
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)6.55sResponse Time (max)9.41sResponse Time (total)26.19s
Coding
: 4.2 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)10.57sResponse Time (max)10.57sResponse Time (total)10.57s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)23.53sResponse Time (max)23.53sResponse Time (total)23.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.37sResponse Time (total)2.73s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.04sResponse Time (max)1.08sResponse Time (total)3.11s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.36sResponse Time (max)9.81sResponse Time (total)10.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)25.72sResponse Time (max)25.72sResponse Time (total)25.72s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.30sResponse Time (max)34.82sResponse Time (total)165.92s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.57sResponse Time (max)3.60sResponse Time (total)10.27s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.62sResponse Time (max)34.82sResponse Time (total)49.24s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.37sResponse Time (max)22.37sResponse Time (total)22.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.43sResponse Time (max)8.51sResponse Time (total)12.87s
Domain specific
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.09sResponse Time (max)22.00sResponse Time (total)42.27s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.63sResponse Time (max)3.63sResponse Time (total)3.63s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.35sResponse Time (max)3.42sResponse Time (total)6.69s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.23sResponse Time (max)3.68sResponse Time (total)9.69s
Tool Calling
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.96sResponse Time (max)4.96sResponse Time (total)4.96s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.94sResponse Time (max)3.94sResponse Time (total)3.94s
Anti-AI Tricks
: 5.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.67sResponse Time (max)5.03sResponse Time (total)10.66s
Coding
: 5.1 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)44.82sResponse Time (max)59.15sResponse Time (total)89.64s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.25sResponse Time (max)25.25sResponse Time (total)25.25s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)1.23sResponse Time (max)1.96sResponse Time (total)2.46s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)6.11sResponse Time (max)13.72sResponse Time (total)18.34s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.38sResponse Time (max)1.61sResponse Time (total)2.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.50sResponse Time (max)3.50sResponse Time (total)3.50s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)39.68sResponse Time (max)47.10sResponse Time (total)79.37s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.74sResponse Time (max)21.74sResponse Time (total)21.74s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.60sResponse Time (max)3.92sResponse Time (total)7.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.00sResponse Time (max)4.69sResponse Time (total)8.99s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.63sResponse Time (max)2.77sResponse Time (total)5.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.78sResponse Time (max)22.78sResponse Time (total)22.78s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.50sResponse Time (max)2.50sResponse Time (total)2.50s
A test is fully passed only if every run passed for that test.API error: 3Wrong answer: 3Response Time (avg)9.05sResponse Time (max)64.36sResponse Time (total)153.86s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.53sResponse Time (max)3.43sResponse Time (total)10.12s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.62sResponse Time (max)64.36sResponse Time (total)79.24s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)8.10sResponse Time (max)8.10sResponse Time (total)8.10s
Domain specific
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.64sResponse Time (max)14.00sResponse Time (total)31.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.46sResponse Time (max)3.46sResponse Time (total)3.46s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.38sResponse Time (max)3.40sResponse Time (total)6.76s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.13sResponse Time (max)3.33sResponse Time (total)9.39s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.87sResponse Time (max)4.87sResponse Time (total)4.87s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.01sResponse Time (max)3.14sResponse Time (total)4.03s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.14sResponse Time (max)45.14sResponse Time (total)45.14s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.32sResponse Time (max)1.32sResponse Time (total)1.32s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)962msResponse Time (max)962msResponse Time (total)962ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.34sResponse Time (max)1.34sResponse Time (total)1.34s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.78sResponse Time (max)14.65sResponse Time (total)15.56s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)24.34sResponse Time (max)42.58sResponse Time (total)48.69s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.47sResponse Time (total)2.47s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)601msResponse Time (max)601msResponse Time (total)601ms
A test is fully passed only if every run passed for that test.API error: 3Wrong answer: 3Response Time (avg)9.05sResponse Time (max)26.24sResponse Time (total)90.53s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.99sResponse Time (max)26.24sResponse Time (total)29.99s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.37sResponse Time (max)10.37sResponse Time (total)10.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)10.84sResponse Time (total)10.84s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.01sResponse Time (max)7.01sResponse Time (total)7.01s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.34sResponse Time (max)9.34sResponse Time (total)9.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)3.26sResponse Time (total)3.26s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.88sResponse Time (max)4.23sResponse Time (total)7.77s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)1.17sResponse Time (max)1.69sResponse Time (total)2.34s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.28sResponse Time (max)4.28sResponse Time (total)4.28s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)81.80sResponse Time (max)81.80sResponse Time (total)81.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)638msResponse Time (max)638msResponse Time (total)638ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.49sResponse Time (max)13.67sResponse Time (total)14.99s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.64sResponse Time (max)2.64sResponse Time (total)2.64s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)399msResponse Time (max)399msResponse Time (total)399ms
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)11.21sResponse Time (max)11.21sResponse Time (total)11.21s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)35.34sResponse Time (max)35.34sResponse Time (total)35.34s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.48sResponse Time (max)12.71sResponse Time (total)16.96s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.95sResponse Time (max)7.65sResponse Time (total)14.84s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.45sResponse Time (max)1.45sResponse Time (total)1.45s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.52sResponse Time (max)8.19sResponse Time (total)11.04s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)18.80sResponse Time (max)18.80sResponse Time (total)18.80s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)1.06s
A test is fully passed only if every run passed for that test.Wrong answer: 3No answer: 1Response Time (avg)9.34sResponse Time (max)38.03sResponse Time (total)186.84s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.95sResponse Time (max)5.76sResponse Time (total)15.79s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.97sResponse Time (max)22.27sResponse Time (total)29.93s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)38.03sResponse Time (max)38.03sResponse Time (total)38.03s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.29sResponse Time (max)19.64sResponse Time (total)24.59s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)14.15sResponse Time (max)28.41sResponse Time (total)42.46s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)2.46sResponse Time (total)2.46s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.32sResponse Time (max)5.07sResponse Time (total)6.63s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.95sResponse Time (max)4.33sResponse Time (total)11.85s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.96sResponse Time (max)8.96sResponse Time (total)8.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)6.14sResponse Time (max)6.14sResponse Time (total)6.14s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)9.43sResponse Time (max)56.19sResponse Time (total)188.66s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.41sResponse Time (max)6.32sResponse Time (total)17.64s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.42sResponse Time (max)21.06sResponse Time (total)28.85s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.56sResponse Time (max)9.56sResponse Time (total)9.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)5.13sResponse Time (total)6.56s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)28.05sResponse Time (max)56.19sResponse Time (total)84.16s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.17sResponse Time (max)5.17sResponse Time (total)5.17s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)3.99sResponse Time (total)7.48s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)5.61sResponse Time (total)14.21s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.96sResponse Time (max)4.96sResponse Time (total)4.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.06sResponse Time (max)10.06sResponse Time (total)10.06s
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)3.46sResponse Time (max)4.38sResponse Time (total)13.86s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.11sResponse Time (max)27.11sResponse Time (total)27.11s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.54sResponse Time (max)7.51sResponse Time (total)11.08s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.52sResponse Time (max)3.80sResponse Time (total)7.04s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 1Response Time (avg)9.75sResponse Time (max)31.36sResponse Time (total)175.48s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.16sResponse Time (max)3.44sResponse Time (total)12.65s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.36sResponse Time (max)31.36sResponse Time (total)31.36s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.93sResponse Time (max)20.93sResponse Time (total)20.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.01sResponse Time (max)4.27sResponse Time (total)8.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.33sResponse Time (max)24.21sResponse Time (total)64.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.89sResponse Time (max)5.89sResponse Time (total)9.78s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.52sResponse Time (max)4.53sResponse Time (total)10.57s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)12.39sResponse Time (max)12.39sResponse Time (total)12.39s
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.75sResponse Time (max)7.62sResponse Time (total)19.00s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.16sResponse Time (max)26.55sResponse Time (total)46.33s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.18sResponse Time (max)4.46sResponse Time (total)8.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.33sResponse Time (max)17.33sResponse Time (total)17.33s
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.97sResponse Time (max)7.48sResponse Time (total)15.89s
Coding
: 6.6 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)19.08sResponse Time (max)30.81sResponse Time (total)38.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.01sResponse Time (max)10.01sResponse Time (total)10.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.64sResponse Time (max)29.16sResponse Time (total)43.28s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.58sResponse Time (max)9.48sResponse Time (total)25.74s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.15sResponse Time (max)15.94sResponse Time (total)20.30s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.26sResponse Time (max)8.26sResponse Time (total)8.26s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.38sResponse Time (max)2.38sResponse Time (total)2.38s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 2Response Time (avg)11.79sResponse Time (max)94.06sResponse Time (total)235.81s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.52sResponse Time (max)7.74sResponse Time (total)18.10s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.10sResponse Time (max)28.80sResponse Time (total)42.21s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.13sResponse Time (max)24.13sResponse Time (total)24.13s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.54sResponse Time (max)3.33sResponse Time (total)5.08s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.18sResponse Time (max)94.06sResponse Time (total)114.53s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.88sResponse Time (max)2.61sResponse Time (total)3.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.71sResponse Time (max)7.71sResponse Time (total)7.71s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.81sResponse Time (max)4.81sResponse Time (total)4.81s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.27sResponse Time (max)14.69sResponse Time (total)16.54s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.49sResponse Time (max)25.49sResponse Time (total)25.49s
Data parsing and extraction
: 6.9 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)30.54sResponse Time (max)58.65sResponse Time (total)61.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.17sResponse Time (max)6.59sResponse Time (total)9.52s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.23sResponse Time (max)13.43sResponse Time (total)16.45s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)15.95sResponse Time (max)27.12sResponse Time (total)47.86s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.59sResponse Time (max)15.59sResponse Time (total)15.59s
Coding
: 2.7 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)4.56sResponse Time (max)4.56sResponse Time (total)4.56s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)35.84sResponse Time (max)35.84sResponse Time (total)35.84s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)2.85sResponse Time (max)2.85sResponse Time (total)2.85s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)17.61sResponse Time (max)25.68sResponse Time (total)52.82s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)12.98sResponse Time (max)23.51sResponse Time (total)25.95s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.76sResponse Time (max)33.76sResponse Time (total)33.76s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.71sResponse Time (max)2.71sResponse Time (total)2.71s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 3Response Time (avg)13.82sResponse Time (max)238.89sResponse Time (total)276.39s…
Anti-AI Tricks
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.39sResponse Time (max)2.96sResponse Time (total)5.56s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)122.77sResponse Time (max)238.89sResponse Time (total)245.54s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.38sResponse Time (max)3.38sResponse Time (total)3.38s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.32sResponse Time (max)1.39sResponse Time (total)2.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.48sResponse Time (max)1.85sResponse Time (total)4.45s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.64sResponse Time (max)1.80sResponse Time (total)3.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.46sResponse Time (max)4.46sResponse Time (total)4.46s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.36sResponse Time (max)1.36sResponse Time (total)1.36s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.83sResponse Time (max)33.37sResponse Time (total)276.53s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.36sResponse Time (max)8.75sResponse Time (total)25.44s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.98sResponse Time (max)32.31sResponse Time (total)45.96s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.60sResponse Time (max)19.60sResponse Time (total)19.60s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.80sResponse Time (max)10.25sResponse Time (total)17.60s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.94sResponse Time (max)29.00sResponse Time (total)74.81s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.70sResponse Time (max)11.70sResponse Time (total)11.70s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.46sResponse Time (max)10.17sResponse Time (total)14.92s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.84sResponse Time (max)11.71sResponse Time (total)26.51s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)33.37sResponse Time (max)33.37sResponse Time (total)33.37s
A test is fully passed only if every run passed for that test.Wrong answer: 14Response Time (avg)14.06sResponse Time (max)42.13sResponse Time (total)182.72s…
Anti-AI Tricks
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)6.24sResponse Time (max)11.38sResponse Time (total)12.48s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)35.97sResponse Time (max)38.78sResponse Time (total)71.93s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)19.16sResponse Time (max)19.16sResponse Time (total)19.16s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)42.13sResponse Time (max)42.13sResponse Time (total)42.13s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.38sResponse Time (max)4.38sResponse Time (total)4.38s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.00sResponse Time (max)4.00sResponse Time (total)4.00s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.67sResponse Time (max)2.67sResponse Time (total)2.67s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.04sResponse Time (max)7.81sResponse Time (total)8.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.99sResponse Time (max)13.99sResponse Time (total)13.99s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.90sResponse Time (max)3.90sResponse Time (total)3.90s
Combined
: 6.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)115.89sResponse Time (max)115.89sResponse Time (total)115.89s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.42sResponse Time (max)16.20sResponse Time (total)18.84s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)4.17sResponse Time (max)9.09sResponse Time (total)12.51s
General Intelligence
: 4.7 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)9.32sResponse Time (max)9.32sResponse Time (total)9.32s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.99sResponse Time (total)3.04s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)6.91sResponse Time (max)10.09sResponse Time (total)20.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.85sResponse Time (max)11.85sResponse Time (total)11.85s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.23sResponse Time (max)17.23sResponse Time (total)17.23s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)4.87sResponse Time (max)6.30sResponse Time (total)14.62s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)35.61sResponse Time (max)35.61sResponse Time (total)35.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)53.14sResponse Time (max)53.14sResponse Time (total)53.14s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.93sResponse Time (max)5.03sResponse Time (total)9.86s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.14sResponse Time (max)45.83sResponse Time (total)72.43s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.30sResponse Time (max)6.00sResponse Time (total)8.59s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)10.19sResponse Time (max)14.92sResponse Time (total)20.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.31sResponse Time (max)6.31sResponse Time (total)6.31s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms