A test is fully passed only if every run passed for that test.Wrong answer: 7Response Time (avg)1.30sResponse Time (max)3.92sResponse Time (total)27.21s…
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.08sResponse Time (max)1.39sResponse Time (total)4.30s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.35sResponse Time (max)1.63sResponse Time (total)4.04s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.17sResponse Time (max)2.17sResponse Time (total)2.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.35sResponse Time (max)1.43sResponse Time (total)2.69s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)975msResponse Time (max)1.08sResponse Time (total)2.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.04sResponse Time (total)1.04s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)974msResponse Time (total)1.89s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.29sResponse Time (total)3.40s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.92sResponse Time (max)3.92sResponse Time (total)3.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)856msResponse Time (max)856msResponse Time (total)856ms
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)46.36sResponse Time (max)189.38sResponse Time (total)973.57s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)15.11sResponse Time (total)43.36s
Coding
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)125.25sResponse Time (max)189.38sResponse Time (total)375.76s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.41sResponse Time (max)92.41sResponse Time (total)92.41s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)38.32sResponse Time (max)41.70sResponse Time (total)76.63s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)53.10sResponse Time (max)90.70sResponse Time (total)159.30s
General Intelligence
: 4.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.30sResponse Time (max)25.30sResponse Time (total)25.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.25sResponse Time (max)21.65sResponse Time (total)40.50s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.67sResponse Time (max)24.83sResponse Time (total)53.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.72sResponse Time (max)14.72sResponse Time (total)14.72s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)92.57sResponse Time (max)92.57sResponse Time (total)92.57s
A test is fully passed only if every run passed for that test.Wrong answer: 3No answer: 2Timed out: 2Response Time (avg)63.41sResponse Time (max)369.32sResponse Time (total)1268.28s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)9.64sResponse Time (total)24.78s
Coding
: 2.9 A test is fully passed only if every run passed for that test.No answer: 2Timed out: 1Response Time (avg)272.54sResponse Time (max)369.32sResponse Time (total)817.61s
Combined
: 9.6 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)73.55sResponse Time (max)73.55sResponse Time (total)73.55s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.51sResponse Time (max)20.57sResponse Time (total)33.02s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)23.62sResponse Time (max)27.00sResponse Time (total)47.23s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.76sResponse Time (max)29.76sResponse Time (total)29.76s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.54sResponse Time (max)21.25sResponse Time (total)35.08s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.79sResponse Time (max)6.85sResponse Time (total)17.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.01sResponse Time (max)9.01sResponse Time (total)9.01s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)180.87sResponse Time (max)180.87sResponse Time (total)180.87s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.83sResponse Time (max)11.20sResponse Time (total)35.31s
Coding
: 5.9 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)41.23sResponse Time (max)64.81sResponse Time (total)123.69s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.99sResponse Time (max)63.99sResponse Time (total)63.99s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.97sResponse Time (max)26.99sResponse Time (total)37.93s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)181.74sResponse Time (max)216.69sResponse Time (total)545.21s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.58sResponse Time (max)31.48sResponse Time (total)37.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.66sResponse Time (max)17.66sResponse Time (total)17.66s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)44.47sResponse Time (max)44.47sResponse Time (total)44.47s
A test is fully passed only if every run passed for that test.Wrong answer: 8Did not follow instructions: 1Response Time (avg)19.25sResponse Time (max)122.87sResponse Time (total)404.20s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.10sResponse Time (max)9.60sResponse Time (total)24.39s
Coding
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)42.85sResponse Time (max)78.01sResponse Time (total)128.55s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.28sResponse Time (max)20.28sResponse Time (total)20.28s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.65sResponse Time (max)10.35sResponse Time (total)19.31s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)14.65sResponse Time (max)26.85sResponse Time (total)43.95s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.88sResponse Time (max)9.88sResponse Time (total)9.88s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.05sResponse Time (max)6.94sResponse Time (total)12.10s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)6.29sResponse Time (max)8.18sResponse Time (total)18.87s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.00sResponse Time (max)4.00sResponse Time (total)4.00s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)122.87sResponse Time (max)122.87sResponse Time (total)122.87s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 1Response Time (avg)3.96sResponse Time (max)14.93sResponse Time (total)83.06s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.33sResponse Time (max)3.89sResponse Time (total)9.30s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.09sResponse Time (max)4.34sResponse Time (total)12.27s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.93sResponse Time (max)14.93sResponse Time (total)14.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.29sResponse Time (max)2.31sResponse Time (total)4.59s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.21sResponse Time (max)5.86sResponse Time (total)12.62s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.16sResponse Time (max)3.16sResponse Time (total)3.16s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.93sResponse Time (total)3.82s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.30sResponse Time (max)9.55sResponse Time (total)15.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.80sResponse Time (max)3.80sResponse Time (total)3.80s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.68sResponse Time (max)2.68sResponse Time (total)2.68s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.73sResponse Time (max)31.19sResponse Time (total)68.20s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.06sResponse Time (max)14.06sResponse Time (total)14.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.15sResponse Time (max)3.15sResponse Time (total)3.15s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)77.80sResponse Time (max)77.80sResponse Time (total)77.80s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.12sResponse Time (max)3.12sResponse Time (total)3.12s
Puzzle Solving
: 7.5 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)5.80sResponse Time (max)6.45sResponse Time (total)11.61s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)10.30sResponse Time (max)10.30sResponse Time (total)10.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)28.18sResponse Time (max)28.18sResponse Time (total)28.18s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)6.38sResponse Time (total)13.06s
Coding
: 6.2 A test is fully passed only if every run passed for that test.API error: 1Extra formatting: 1Response Time (avg)92.07sResponse Time (max)130.77sResponse Time (total)276.20s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)53.36sResponse Time (max)53.36sResponse Time (total)53.36s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.81sResponse Time (max)20.29sResponse Time (total)37.61s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)37.87sResponse Time (max)84.22sResponse Time (total)113.60s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.77sResponse Time (max)3.21sResponse Time (total)5.54s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.87sResponse Time (max)16.87sResponse Time (total)16.87s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.46sResponse Time (max)12.46sResponse Time (total)12.46s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 1Response Time (avg)3.23sResponse Time (max)10.87sResponse Time (total)67.80s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.39sResponse Time (max)3.58sResponse Time (total)9.57s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.81sResponse Time (max)4.25sResponse Time (total)11.44s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.87sResponse Time (max)10.87sResponse Time (total)10.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.60sResponse Time (max)2.69sResponse Time (total)5.19s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.16sResponse Time (max)3.89sResponse Time (total)9.49s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.60sResponse Time (max)2.60sResponse Time (total)2.60s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.59sResponse Time (max)3.04sResponse Time (total)5.17s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.95sResponse Time (max)2.48sResponse Time (total)5.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.55sResponse Time (max)4.55sResponse Time (total)4.55s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.08sResponse Time (max)3.08sResponse Time (total)3.08s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 3Response Time (avg)22.34sResponse Time (max)138.75sResponse Time (total)469.20s…
Anti-AI Tricks
: 8.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.05sResponse Time (max)6.69sResponse Time (total)16.20s
Coding
: 8.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)57.87sResponse Time (max)138.75sResponse Time (total)173.61s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.81sResponse Time (max)17.81sResponse Time (total)17.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)3.39sResponse Time (total)4.87s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)65.31sResponse Time (max)102.91sResponse Time (total)195.92s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)2.45sResponse Time (total)4.25s
Puzzle Solving
: 7.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)4.37sResponse Time (max)7.27sResponse Time (total)13.11s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.62sResponse Time (max)9.62sResponse Time (total)9.62s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.10sResponse Time (max)30.10sResponse Time (total)30.10s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.02sResponse Time (max)8.79sResponse Time (total)24.07s
Coding
: 7.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)50.55sResponse Time (max)86.11sResponse Time (total)151.65s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.99sResponse Time (max)13.75sResponse Time (total)25.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)22.50sResponse Time (max)45.02sResponse Time (total)67.51s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.50sResponse Time (max)10.22sResponse Time (total)15.00s
Puzzle Solving
: 8.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.95sResponse Time (max)8.42sResponse Time (total)17.84s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)32.90sResponse Time (max)32.90sResponse Time (total)32.90s
A test is fully passed only if every run passed for that test.Wrong answer: 5Extra formatting: 3Response Time (avg)49.90sResponse Time (max)252.69sResponse Time (total)1047.92s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)7.43sResponse Time (max)10.89sResponse Time (total)29.72s
Coding
: 5.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)108.46sResponse Time (max)200.16sResponse Time (total)325.39s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.81sResponse Time (max)32.81sResponse Time (total)32.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.72sResponse Time (max)12.13sResponse Time (total)21.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)158.00sResponse Time (max)252.69sResponse Time (total)474.01s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.41sResponse Time (max)18.41sResponse Time (total)18.41s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.36sResponse Time (max)20.80sResponse Time (total)24.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.26sResponse Time (max)44.40sResponse Time (total)54.79s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.12sResponse Time (max)13.12sResponse Time (total)13.12s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)53.51sResponse Time (max)53.51sResponse Time (total)53.51s
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)1.65sResponse Time (max)3.56sResponse Time (total)23.07s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.59sResponse Time (total)2.49s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.80sResponse Time (max)2.79sResponse Time (total)5.40s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.56sResponse Time (max)3.56sResponse Time (total)3.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.41sResponse Time (max)1.41sResponse Time (total)1.41s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)963msResponse Time (max)963msResponse Time (total)963ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.13sResponse Time (total)1.13s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.58sResponse Time (max)1.58sResponse Time (total)1.58s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.05sResponse Time (max)1.06sResponse Time (total)2.11s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.35sResponse Time (max)3.35sResponse Time (total)3.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)59.11sResponse Time (max)168.31sResponse Time (total)236.44s
Coding
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)58.87sResponse Time (max)68.14sResponse Time (total)176.60s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.78sResponse Time (max)17.78sResponse Time (total)17.78s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)56.99sResponse Time (max)80.14sResponse Time (total)113.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)146.50sResponse Time (max)234.29sResponse Time (total)439.49s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.49sResponse Time (max)111.61sResponse Time (total)126.98s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)27.61sResponse Time (max)31.84sResponse Time (total)55.21s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.33sResponse Time (max)10.33sResponse Time (total)10.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.98sResponse Time (max)48.98sResponse Time (total)48.98s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 1Response Time (avg)2.77sResponse Time (max)11.91sResponse Time (total)58.12s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.18sResponse Time (total)8.50s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.39sResponse Time (max)2.20sResponse Time (total)4.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.00sResponse Time (max)3.74sResponse Time (total)5.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.36sResponse Time (max)3.51sResponse Time (total)7.07s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.49sResponse Time (max)1.66sResponse Time (total)2.99s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.69sResponse Time (max)1.89sResponse Time (total)5.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.54sResponse Time (max)9.54sResponse Time (total)9.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.35sResponse Time (max)1.35sResponse Time (total)1.35s
A test is fully passed only if every run passed for that test.Wrong answer: 4Extra formatting: 3Timed out: 1Response Time (avg)17.06sResponse Time (max)46.35sResponse Time (total)221.83s…
Coding
: 5.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)33.29sResponse Time (max)35.76sResponse Time (total)99.86s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.35sResponse Time (max)46.35sResponse Time (total)46.35s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.90sResponse Time (max)13.90sResponse Time (total)13.90s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.94sResponse Time (max)4.94sResponse Time (total)4.94s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.61sResponse Time (max)2.61sResponse Time (total)2.61s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.31sResponse Time (max)6.24sResponse Time (total)10.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.09sResponse Time (max)30.09sResponse Time (total)30.09s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)37.16sResponse Time (max)140.53sResponse Time (total)148.65s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)137.63sResponse Time (max)137.63sResponse Time (total)137.63s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)149.23sResponse Time (max)149.23sResponse Time (total)149.23s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.49sResponse Time (max)4.96sResponse Time (total)8.98s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)139.90sResponse Time (max)141.40sResponse Time (total)419.69s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)23.26sResponse Time (max)43.87sResponse Time (total)46.51s
Puzzle Solving
: 5.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)50.83sResponse Time (max)144.85sResponse Time (total)152.49s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.44sResponse Time (max)6.44sResponse Time (total)6.44s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.63sResponse Time (max)38.31sResponse Time (total)82.89s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.15sResponse Time (max)88.15sResponse Time (total)88.15s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.58sResponse Time (max)13.87sResponse Time (total)25.16s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)44.63sResponse Time (max)82.55sResponse Time (total)133.89s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.59sResponse Time (max)13.66sResponse Time (total)23.18s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.64sResponse Time (max)18.64sResponse Time (total)18.64s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.99sResponse Time (max)9.99sResponse Time (total)9.99s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.14sResponse Time (max)12.41sResponse Time (total)16.57s
Coding
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)97.14sResponse Time (max)162.44sResponse Time (total)291.41s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.86sResponse Time (max)16.86sResponse Time (total)16.86s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)34.53sResponse Time (max)86.93sResponse Time (total)103.59s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.80sResponse Time (max)1.81sResponse Time (total)3.60s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)20.25sResponse Time (max)57.93sResponse Time (total)60.76s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.29sResponse Time (max)7.29sResponse Time (total)7.29s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)51.29sResponse Time (max)51.29sResponse Time (total)51.29s
A test is fully passed only if every run passed for that test.Wrong answer: 8No answer: 1Response Time (avg)15.74sResponse Time (max)124.75sResponse Time (total)330.63s…
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.02sResponse Time (max)12.52sResponse Time (total)16.10s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.46sResponse Time (max)12.69sResponse Time (total)28.38s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.98sResponse Time (max)7.98sResponse Time (total)7.98s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.29sResponse Time (max)3.15sResponse Time (total)4.58s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)43.31sResponse Time (max)72.27sResponse Time (total)129.92s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.00sResponse Time (max)7.00sResponse Time (total)7.00s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.58sResponse Time (max)1.80sResponse Time (total)3.16s
Puzzle Solving
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.84sResponse Time (max)3.42sResponse Time (total)5.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.25sResponse Time (max)3.25sResponse Time (total)3.25s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)124.75sResponse Time (max)124.75sResponse Time (total)124.75s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 2Response Time (avg)1.21sResponse Time (max)3.39sResponse Time (total)25.45s…
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)967msResponse Time (max)1.47sResponse Time (total)2.90s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.20sResponse Time (max)3.20sResponse Time (total)3.20s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.22sResponse Time (max)1.33sResponse Time (total)2.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)942msResponse Time (max)1.12sResponse Time (total)2.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.14sResponse Time (total)2.27s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)900msResponse Time (max)962msResponse Time (total)2.70s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)814msResponse Time (max)814msResponse Time (total)814ms
Coding
: 6.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)63.38sResponse Time (max)95.88sResponse Time (total)190.15s
Combined
: 6.9 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)15.06sResponse Time (max)15.06sResponse Time (total)15.06s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.60sResponse Time (max)9.92sResponse Time (total)19.19s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.15sResponse Time (max)67.08sResponse Time (total)114.45s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.09sResponse Time (max)11.09sResponse Time (total)11.09s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.74sResponse Time (max)5.23sResponse Time (total)7.47s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.24sResponse Time (max)16.95sResponse Time (total)30.72s
Tool Calling
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)12.53sResponse Time (max)12.53sResponse Time (total)12.53s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
Coding
: 5.7 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)214.42sResponse Time (max)406.78sResponse Time (total)643.25s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.38sResponse Time (max)22.88sResponse Time (total)40.76s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)202.38sResponse Time (max)215.85sResponse Time (total)404.76s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.83sResponse Time (max)17.83sResponse Time (total)17.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.53sResponse Time (max)19.15sResponse Time (total)25.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.92sResponse Time (max)8.92sResponse Time (total)8.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)130.27sResponse Time (max)130.27sResponse Time (total)130.27s
A test is fully passed only if every run passed for that test.Wrong answer: 9Response Time (avg)1.89sResponse Time (max)5.66sResponse Time (total)39.62s…
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.84sResponse Time (max)3.08sResponse Time (total)7.37s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.53sResponse Time (max)1.97sResponse Time (total)4.58s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.48sResponse Time (max)4.48sResponse Time (total)4.48s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.44sResponse Time (max)1.51sResponse Time (total)2.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.52sResponse Time (max)1.63sResponse Time (total)4.57s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.37sResponse Time (max)1.37sResponse Time (total)1.37s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.68sResponse Time (total)3.04s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.41sResponse Time (total)4.20s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.66sResponse Time (max)5.66sResponse Time (total)5.66s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.57sResponse Time (max)110.43sResponse Time (total)121.72s
Coding
: 3.5 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)258.38sResponse Time (max)453.94sResponse Time (total)516.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.57sResponse Time (max)29.57sResponse Time (total)29.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.01sResponse Time (max)15.01sResponse Time (total)15.01s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)170.45sResponse Time (max)170.45sResponse Time (total)170.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)108.45sResponse Time (max)108.45sResponse Time (total)108.45s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 2Response Time (avg)6.34sResponse Time (max)18.33sResponse Time (total)133.13s…
Coding
: 5.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)10.52sResponse Time (max)11.72sResponse Time (total)31.55s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.96sResponse Time (max)11.96sResponse Time (total)11.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.21sResponse Time (max)2.52sResponse Time (total)4.42s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)13.01sResponse Time (max)18.33sResponse Time (total)39.04s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)4.60sResponse Time (total)7.01s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.99sResponse Time (max)3.16sResponse Time (total)8.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.36sResponse Time (max)8.36sResponse Time (total)8.36s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.38sResponse Time (max)4.38sResponse Time (total)4.38s
Anti-AI Tricks
: 8.1 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)15.85sResponse Time (max)20.83sResponse Time (total)47.55s
Coding
: 6.0 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)10.71sResponse Time (max)17.72sResponse Time (total)32.13s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)75.68sResponse Time (max)75.68sResponse Time (total)75.68s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)96.01sResponse Time (max)96.01sResponse Time (total)96.01s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.28sResponse Time (max)7.37sResponse Time (total)8.55s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.87sResponse Time (max)5.26sResponse Time (total)7.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.78sResponse Time (max)27.78sResponse Time (total)27.78s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.95sResponse Time (max)5.68sResponse Time (total)15.80s
Coding
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)109.93sResponse Time (max)199.66sResponse Time (total)329.79s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.40sResponse Time (max)17.40sResponse Time (total)17.40s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)5.02sResponse Time (total)8.34s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.26sResponse Time (max)4.46sResponse Time (total)8.52s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.22sResponse Time (max)11.63sResponse Time (total)18.66s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)13.68sResponse Time (max)13.68sResponse Time (total)13.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)63.48sResponse Time (max)63.48sResponse Time (total)63.48s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.13sResponse Time (max)34.96sResponse Time (total)84.53s
Coding
: 5.9 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)206.65sResponse Time (max)409.98sResponse Time (total)619.94s
Combined
: 4.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)75.34sResponse Time (max)75.34sResponse Time (total)75.34s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)59.33sResponse Time (max)97.12sResponse Time (total)118.65s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)88.34sResponse Time (max)106.00sResponse Time (total)265.01s
General Intelligence
: 2.8 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.30sResponse Time (max)30.30sResponse Time (total)30.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.45sResponse Time (max)43.36sResponse Time (total)48.89s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)33.13sResponse Time (max)64.81sResponse Time (total)99.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.65sResponse Time (max)4.65sResponse Time (total)4.65s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.35sResponse Time (max)177.35sResponse Time (total)177.35s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)3.40sResponse Time (max)6.36sResponse Time (total)13.58s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.73sResponse Time (max)17.73sResponse Time (total)17.73s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.77sResponse Time (max)1.93sResponse Time (total)3.53s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.66sResponse Time (max)2.16sResponse Time (total)4.99s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)3.48sResponse Time (total)3.48s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.37sResponse Time (max)1.40sResponse Time (total)2.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.74sResponse Time (max)3.46sResponse Time (total)8.22s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.35sResponse Time (max)5.35sResponse Time (total)5.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s