A test is fully passed only if every run passed for that test.Wrong answer: 7No answer: 1Response Time (avg)16.06sResponse Time (max)124.75sResponse Time (total)321.11s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.02sResponse Time (max)12.52sResponse Time (total)16.10s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.43sResponse Time (max)12.69sResponse Time (total)18.86s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.98sResponse Time (max)7.98sResponse Time (total)7.98s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.29sResponse Time (max)3.15sResponse Time (total)4.58s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)43.31sResponse Time (max)72.27sResponse Time (total)129.92s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.00sResponse Time (max)7.00sResponse Time (total)7.00s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.58sResponse Time (max)1.80sResponse Time (total)3.16s
Puzzle Solving
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.84sResponse Time (max)3.42sResponse Time (total)5.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.25sResponse Time (max)3.25sResponse Time (total)3.25s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)124.75sResponse Time (max)124.75sResponse Time (total)124.75s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.65sResponse Time (max)35.08sResponse Time (total)38.62s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.64sResponse Time (max)12.69sResponse Time (total)21.28s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.06sResponse Time (max)9.06sResponse Time (total)9.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)3.35sResponse Time (total)5.49s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.27sResponse Time (max)97.10sResponse Time (total)144.81s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.85sResponse Time (max)6.85sResponse Time (total)6.85s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.83sResponse Time (max)2.21sResponse Time (total)3.65s
Puzzle Solving
: 5.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.19sResponse Time (max)12.51sResponse Time (total)18.56s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.16sResponse Time (max)4.16sResponse Time (total)4.16s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)113.98sResponse Time (max)113.98sResponse Time (total)113.98s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)6.82sResponse Time (max)38.52sResponse Time (total)136.34s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.40sResponse Time (max)4.78sResponse Time (total)13.59s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.05sResponse Time (max)8.97sResponse Time (total)16.09s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.12sResponse Time (max)9.12sResponse Time (total)9.12s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.05sResponse Time (max)3.33sResponse Time (total)6.10s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)17.78sResponse Time (max)38.52sResponse Time (total)53.33s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.51sResponse Time (max)6.55sResponse Time (total)11.02s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.10sResponse Time (max)5.04sResponse Time (total)12.31s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.68sResponse Time (max)4.68sResponse Time (total)4.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.89sResponse Time (max)6.89sResponse Time (total)6.89s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)15.57sResponse Time (max)95.48sResponse Time (total)311.47s…
Anti-AI Tricks
: 8.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.30sResponse Time (max)15.56sResponse Time (total)25.21s
Coding
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)54.56sResponse Time (max)92.88sResponse Time (total)109.12s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.44sResponse Time (max)28.44sResponse Time (total)28.44s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.06sResponse Time (max)5.06sResponse Time (total)8.11s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)37.34sResponse Time (max)95.48sResponse Time (total)112.01s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.62sResponse Time (max)2.78sResponse Time (total)5.24s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.18sResponse Time (max)4.05sResponse Time (total)9.54s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)6.20sResponse Time (total)6.20s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.76sResponse Time (max)2.76sResponse Time (total)2.76s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Coding
: 6.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)244.54sResponse Time (max)409.98sResponse Time (total)489.08s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)33.13sResponse Time (max)64.81sResponse Time (total)99.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.35sResponse Time (max)177.35sResponse Time (total)177.35s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.62sResponse Time (max)18.61sResponse Time (total)50.50s
Coding
: 6.6 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)165.39sResponse Time (max)168.22sResponse Time (total)330.78s
Combined
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)83.07sResponse Time (max)83.07sResponse Time (total)83.07s
Data parsing and extraction
: 3.5 A test is fully passed only if every run passed for that test.No answer: 2Response Time (avg)37.30sResponse Time (max)54.01sResponse Time (total)74.60s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)73.38sResponse Time (max)101.55sResponse Time (total)220.15s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.96sResponse Time (max)47.48sResponse Time (total)75.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)61.14sResponse Time (max)97.76sResponse Time (total)183.42s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.88sResponse Time (max)16.88sResponse Time (total)16.88s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.99sResponse Time (max)80.99sResponse Time (total)80.99s
A test is fully passed only if every run passed for that test.API error: 3Wrong answer: 3Response Time (avg)9.05sResponse Time (max)26.24sResponse Time (total)90.53s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.99sResponse Time (max)26.24sResponse Time (total)29.99s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.37sResponse Time (max)10.37sResponse Time (total)10.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)10.84sResponse Time (total)10.84s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.01sResponse Time (max)7.01sResponse Time (total)7.01s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.34sResponse Time (max)9.34sResponse Time (total)9.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)3.26sResponse Time (total)3.26s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.88sResponse Time (max)4.23sResponse Time (total)7.77s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)6.13sResponse Time (max)18.33sResponse Time (total)122.61s…
Coding
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.52sResponse Time (max)11.72sResponse Time (total)21.03s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.21sResponse Time (max)2.52sResponse Time (total)4.42s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.01sResponse Time (max)18.33sResponse Time (total)39.04s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)4.60sResponse Time (total)7.01s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.99sResponse Time (max)3.16sResponse Time (total)8.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.36sResponse Time (max)8.36sResponse Time (total)8.36s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.38sResponse Time (max)4.38sResponse Time (total)4.38s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)54.83sResponse Time (max)95.88sResponse Time (total)109.65s
Combined
: 6.9 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)15.06sResponse Time (max)15.06sResponse Time (total)15.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.60sResponse Time (max)9.92sResponse Time (total)19.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.15sResponse Time (max)67.08sResponse Time (total)114.45s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.09sResponse Time (max)11.09sResponse Time (total)11.09s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)5.23sResponse Time (total)7.47s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.24sResponse Time (max)16.95sResponse Time (total)30.72s
Tool Calling
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)12.53sResponse Time (max)12.53sResponse Time (total)12.53s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.95sResponse Time (max)5.68sResponse Time (total)15.80s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)65.07sResponse Time (max)105.80sResponse Time (total)130.13s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.40sResponse Time (max)17.40sResponse Time (total)17.40s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)5.02sResponse Time (total)8.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.26sResponse Time (max)4.46sResponse Time (total)8.52s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)11.63sResponse Time (total)18.66s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)13.68sResponse Time (max)13.68sResponse Time (total)13.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)63.48sResponse Time (max)63.48sResponse Time (total)63.48s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.83sResponse Time (max)33.37sResponse Time (total)276.53s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.36sResponse Time (max)8.75sResponse Time (total)25.44s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.98sResponse Time (max)32.31sResponse Time (total)45.96s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.60sResponse Time (max)19.60sResponse Time (total)19.60s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.80sResponse Time (max)10.25sResponse Time (total)17.60s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.94sResponse Time (max)29.00sResponse Time (total)74.81s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.70sResponse Time (max)11.70sResponse Time (total)11.70s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.46sResponse Time (max)10.17sResponse Time (total)14.92s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.84sResponse Time (max)11.71sResponse Time (total)26.51s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)33.37sResponse Time (max)33.37sResponse Time (total)33.37s
A test is fully passed only if every run passed for that test.Wrong answer: 2Did not follow instructions: 1Response Time (avg)4.29sResponse Time (max)12.05sResponse Time (total)85.72s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.09sResponse Time (max)2.56sResponse Time (total)8.35s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.91sResponse Time (max)11.59sResponse Time (total)19.82s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.05sResponse Time (max)12.05sResponse Time (total)12.05s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.07sResponse Time (max)5.60sResponse Time (total)8.14s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.24sResponse Time (max)6.43sResponse Time (total)15.73s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.52sResponse Time (max)2.52sResponse Time (total)2.52s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.70sResponse Time (max)3.07sResponse Time (total)5.40s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.38sResponse Time (max)2.55sResponse Time (total)7.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)3.81sResponse Time (total)3.81s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 3Response Time (avg)22.10sResponse Time (max)138.75sResponse Time (total)442.09s…
Anti-AI Tricks
: 8.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.05sResponse Time (max)6.69sResponse Time (total)16.20s
Coding
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)73.25sResponse Time (max)138.75sResponse Time (total)146.51s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.81sResponse Time (max)17.81sResponse Time (total)17.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)3.39sResponse Time (total)4.87s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)65.31sResponse Time (max)102.91sResponse Time (total)195.92s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)2.45sResponse Time (total)4.25s
Puzzle Solving
: 7.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)4.37sResponse Time (max)7.27sResponse Time (total)13.11s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.62sResponse Time (max)9.62sResponse Time (total)9.62s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.10sResponse Time (max)30.10sResponse Time (total)30.10s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.75sResponse Time (max)49.95sResponse Time (total)79.01s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)123.86sResponse Time (max)177.36sResponse Time (total)247.71s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)163.96sResponse Time (max)163.96sResponse Time (total)163.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.26sResponse Time (max)32.03sResponse Time (total)60.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)79.53sResponse Time (max)95.52sResponse Time (total)238.59s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.66sResponse Time (max)32.25sResponse Time (total)39.32s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)59.60sResponse Time (max)123.57sResponse Time (total)178.80s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.45sResponse Time (max)7.45sResponse Time (total)7.45s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)85.11sResponse Time (max)85.11sResponse Time (total)85.11s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.15sResponse Time (max)31.19sResponse Time (total)46.30s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.06sResponse Time (max)14.06sResponse Time (total)14.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.15sResponse Time (max)3.15sResponse Time (total)3.15s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)77.80sResponse Time (max)77.80sResponse Time (total)77.80s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.12sResponse Time (max)3.12sResponse Time (total)3.12s
Puzzle Solving
: 7.5 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.80sResponse Time (max)6.45sResponse Time (total)11.61s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)10.30sResponse Time (max)10.30sResponse Time (total)10.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)28.18sResponse Time (max)28.18sResponse Time (total)28.18s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.02sResponse Time (max)18.27sResponse Time (total)57.44s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.75sResponse Time (total)8.50s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.84sResponse Time (max)2.84sResponse Time (total)2.84s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.27sResponse Time (max)18.27sResponse Time (total)18.27s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.15sResponse Time (max)2.33sResponse Time (total)4.29s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.19sResponse Time (max)1.40sResponse Time (total)3.58s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.47sResponse Time (max)3.47sResponse Time (total)3.47s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.46sResponse Time (max)1.68sResponse Time (total)2.91s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.46sResponse Time (max)3.72sResponse Time (total)7.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)4.74sResponse Time (total)4.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46s
A test is fully passed only if every run passed for that test.Wrong answer: 5Timed out: 2Response Time (avg)39.40sResponse Time (max)168.16sResponse Time (total)788.00s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.75sResponse Time (max)18.03sResponse Time (total)39.01s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)119.57sResponse Time (max)168.16sResponse Time (total)239.14s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)107.79sResponse Time (max)107.79sResponse Time (total)107.79s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.41sResponse Time (max)29.79sResponse Time (total)46.83s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)63.40sResponse Time (max)119.29sResponse Time (total)190.20s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)34.11sResponse Time (max)34.11sResponse Time (total)34.11s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.88sResponse Time (max)15.44sResponse Time (total)19.76s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.89sResponse Time (max)31.99sResponse Time (total)53.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.60sResponse Time (max)4.60sResponse Time (total)4.60s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)52.87sResponse Time (max)52.87sResponse Time (total)52.87s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)3.40sResponse Time (max)6.36sResponse Time (total)13.58s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.59sResponse Time (max)4.34sResponse Time (total)7.17s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.73sResponse Time (max)17.73sResponse Time (total)17.73s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.77sResponse Time (max)1.93sResponse Time (total)3.53s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.66sResponse Time (max)2.16sResponse Time (total)4.99s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)3.48sResponse Time (total)3.48s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.40sResponse Time (total)2.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.74sResponse Time (max)3.46sResponse Time (total)8.22s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.35sResponse Time (max)5.35sResponse Time (total)5.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.30sResponse Time (max)9.80sResponse Time (total)25.20s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.41sResponse Time (max)21.41sResponse Time (total)21.41s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.47sResponse Time (max)12.47sResponse Time (total)12.47s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.36sResponse Time (max)11.05sResponse Time (total)14.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)36.09sResponse Time (max)36.09sResponse Time (total)36.09s
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)16.50sResponse Time (max)117.26sResponse Time (total)330.06s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.88sResponse Time (max)5.73sResponse Time (total)15.53s
Coding
: 7.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)95.96sResponse Time (max)117.26sResponse Time (total)191.92s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.42sResponse Time (max)22.42sResponse Time (total)22.42s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.43sResponse Time (max)6.18sResponse Time (total)10.86s
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.27sResponse Time (max)34.09sResponse Time (total)45.80s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.19sResponse Time (max)5.19sResponse Time (total)5.19s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.04sResponse Time (max)4.70sResponse Time (total)8.08s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.05sResponse Time (max)5.64sResponse Time (total)12.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.60sResponse Time (max)12.60sResponse Time (total)12.60s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.50sResponse Time (max)5.50sResponse Time (total)5.50s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.83sResponse Time (max)11.20sResponse Time (total)35.31s
Coding
: 7.4 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)55.26sResponse Time (max)64.81sResponse Time (total)110.53s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.99sResponse Time (max)63.99sResponse Time (total)63.99s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.97sResponse Time (max)26.99sResponse Time (total)37.93s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)181.74sResponse Time (max)216.69sResponse Time (total)545.21s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.58sResponse Time (max)31.48sResponse Time (total)37.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.66sResponse Time (max)17.66sResponse Time (total)17.66s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)44.47sResponse Time (max)44.47sResponse Time (total)44.47s
A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)4.48sResponse Time (max)23.18sResponse Time (total)85.21s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.85sResponse Time (max)2.71sResponse Time (total)7.38s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.79sResponse Time (max)23.18sResponse Time (total)29.59s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.45sResponse Time (max)21.45sResponse Time (total)21.45s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.37sResponse Time (max)3.30sResponse Time (total)4.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)1.17sResponse Time (max)1.40sResponse Time (total)2.35s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.57sResponse Time (max)1.66sResponse Time (total)3.14s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.89sResponse Time (total)7.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)4.17sResponse Time (total)4.17s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.25sResponse Time (max)2.25sResponse Time (total)2.25s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)4.59sResponse Time (total)10.98s
Coding
: 3.4 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)183.89sResponse Time (max)299.23sResponse Time (total)367.78s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.87sResponse Time (max)25.87sResponse Time (total)25.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.04sResponse Time (max)4.12sResponse Time (total)6.07s
General Intelligence
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.61sResponse Time (max)3.61sResponse Time (total)3.61s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.98sResponse Time (max)13.98sResponse Time (total)13.98s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)234.19sResponse Time (max)234.19sResponse Time (total)234.19s
A test is fully passed only if every run passed for that test.Wrong answer: 4Did not follow instructions: 2Response Time (avg)15.95sResponse Time (max)100.93sResponse Time (total)319.08s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.16sResponse Time (max)6.68sResponse Time (total)16.63s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.45sResponse Time (max)27.96sResponse Time (total)36.91s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.56sResponse Time (max)19.56sResponse Time (total)19.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.07sResponse Time (max)3.59sResponse Time (total)6.15s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)64.31sResponse Time (max)100.93sResponse Time (total)192.94s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.04sResponse Time (max)3.44sResponse Time (total)6.07s
Puzzle Solving
: 9.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.05sResponse Time (max)8.73sResponse Time (total)15.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.37sResponse Time (max)6.37sResponse Time (total)6.37s
Trivia
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.43sResponse Time (max)14.43sResponse Time (total)14.43s
A test is fully passed only if every run passed for that test.Wrong answer: 4Extra formatting: 3Response Time (avg)42.39sResponse Time (max)252.69sResponse Time (total)847.76s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)7.43sResponse Time (max)10.89sResponse Time (total)29.72s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)62.62sResponse Time (max)94.25sResponse Time (total)125.23s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.81sResponse Time (max)32.81sResponse Time (total)32.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.72sResponse Time (max)12.13sResponse Time (total)21.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)158.00sResponse Time (max)252.69sResponse Time (total)474.01s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.41sResponse Time (max)18.41sResponse Time (total)18.41s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.36sResponse Time (max)20.80sResponse Time (total)24.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.26sResponse Time (max)44.40sResponse Time (total)54.79s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.12sResponse Time (max)13.12sResponse Time (total)13.12s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)53.51sResponse Time (max)53.51sResponse Time (total)53.51s
A test is fully passed only if every run passed for that test.Wrong answer: 3Did not follow instructions: 1Response Time (avg)9.75sResponse Time (max)31.36sResponse Time (total)175.48s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.16sResponse Time (max)3.44sResponse Time (total)12.65s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.36sResponse Time (max)31.36sResponse Time (total)31.36s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.93sResponse Time (max)20.93sResponse Time (total)20.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.01sResponse Time (max)4.27sResponse Time (total)8.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.33sResponse Time (max)24.21sResponse Time (total)64.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.89sResponse Time (max)5.89sResponse Time (total)9.78s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.52sResponse Time (max)4.53sResponse Time (total)10.57s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)12.39sResponse Time (max)12.39sResponse Time (total)12.39s
Coding
: 6.8 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)118.23sResponse Time (max)129.50sResponse Time (total)236.47s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.38sResponse Time (max)22.88sResponse Time (total)40.76s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)202.38sResponse Time (max)215.85sResponse Time (total)404.76s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.83sResponse Time (max)17.83sResponse Time (total)17.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.53sResponse Time (max)19.15sResponse Time (total)25.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.92sResponse Time (max)8.92sResponse Time (total)8.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)130.27sResponse Time (max)130.27sResponse Time (total)130.27s
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)9.43sResponse Time (max)56.19sResponse Time (total)188.66s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.41sResponse Time (max)6.32sResponse Time (total)17.64s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.42sResponse Time (max)21.06sResponse Time (total)28.85s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.56sResponse Time (max)9.56sResponse Time (total)9.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)5.13sResponse Time (total)6.56s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)28.05sResponse Time (max)56.19sResponse Time (total)84.16s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.17sResponse Time (max)5.17sResponse Time (total)5.17s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)3.99sResponse Time (total)7.48s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.74sResponse Time (max)5.61sResponse Time (total)14.21s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.96sResponse Time (max)4.96sResponse Time (total)4.96s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.06sResponse Time (max)10.06sResponse Time (total)10.06s
A test is fully passed only if every run passed for that test.Wrong answer: 5Response Time (avg)58.43sResponse Time (max)238.07sResponse Time (total)1168.66s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.13sResponse Time (max)28.70sResponse Time (total)88.50s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.97sResponse Time (max)238.07sResponse Time (total)355.94s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)121.49sResponse Time (max)121.49sResponse Time (total)121.49s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)41.15sResponse Time (max)48.02sResponse Time (total)82.30s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)95.91sResponse Time (max)186.74sResponse Time (total)287.73s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.24sResponse Time (max)32.24sResponse Time (total)32.24s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.31sResponse Time (max)27.94sResponse Time (total)48.63s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.32sResponse Time (max)37.68sResponse Time (total)72.96s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.32sResponse Time (max)18.32sResponse Time (total)18.32s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)60.56sResponse Time (max)60.56sResponse Time (total)60.56s
A test is fully passed only if every run passed for that test.API error: 3Wrong answer: 3Response Time (avg)9.05sResponse Time (max)64.36sResponse Time (total)153.86s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.53sResponse Time (max)3.43sResponse Time (total)10.12s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.62sResponse Time (max)64.36sResponse Time (total)79.24s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)8.10sResponse Time (max)8.10sResponse Time (total)8.10s
Domain specific
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.64sResponse Time (max)14.00sResponse Time (total)31.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.46sResponse Time (max)3.46sResponse Time (total)3.46s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.38sResponse Time (max)3.40sResponse Time (total)6.76s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.13sResponse Time (max)3.33sResponse Time (total)9.39s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.87sResponse Time (max)4.87sResponse Time (total)4.87s