Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)20.18sResponse Time (max)26.54sResponse Time (total)80.73s
Coding
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.47sResponse Time (max)24.90sResponse Time (total)48.94s
Combined
: 4.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)111.96sResponse Time (max)111.96sResponse Time (total)111.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.79sResponse Time (max)23.85sResponse Time (total)47.57s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)19.73sResponse Time (max)27.66sResponse Time (total)59.18s
General Intelligence
: 4.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)23.74sResponse Time (max)23.74sResponse Time (total)23.74s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)17.54sResponse Time (max)18.51sResponse Time (total)35.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)77.93sResponse Time (max)77.93sResponse Time (total)77.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.07sResponse Time (max)3.07sResponse Time (total)3.07s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.30sResponse Time (max)9.80sResponse Time (total)25.20s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.41sResponse Time (max)21.41sResponse Time (total)21.41s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.47sResponse Time (max)12.47sResponse Time (total)12.47s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.36sResponse Time (max)11.05sResponse Time (total)14.73s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)36.09sResponse Time (max)36.09sResponse Time (total)36.09s
Coding
: 6.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)54.73sResponse Time (max)91.27sResponse Time (total)109.46s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)41.03sResponse Time (max)41.03sResponse Time (total)41.03s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.95sResponse Time (max)24.88sResponse Time (total)43.89s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)19.00sResponse Time (max)21.63sResponse Time (total)38.01s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)12.05sResponse Time (max)12.05sResponse Time (total)12.05s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)22.77sResponse Time (max)22.77sResponse Time (total)22.77s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.66sResponse Time (max)25.06sResponse Time (total)47.32s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)89.47sResponse Time (max)99.85sResponse Time (total)178.93s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.96sResponse Time (max)28.96sResponse Time (total)28.96s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)8.90sResponse Time (max)8.90sResponse Time (total)8.90s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.25sResponse Time (max)7.25sResponse Time (total)7.25s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.33sResponse Time (max)16.34sResponse Time (total)22.66s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.93sResponse Time (max)15.93sResponse Time (total)15.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)67.37sResponse Time (max)67.37sResponse Time (total)67.37s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.31sResponse Time (max)14.20sResponse Time (total)33.24s
Coding
: 4.6 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)145.56sResponse Time (max)172.60sResponse Time (total)291.12s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)43.11sResponse Time (max)43.11sResponse Time (total)43.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.33sResponse Time (max)9.40sResponse Time (total)18.66s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)29.77sResponse Time (max)32.22sResponse Time (total)89.30s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.95sResponse Time (max)20.95sResponse Time (total)20.95s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.47sResponse Time (max)10.16sResponse Time (total)14.94s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)31.64sResponse Time (max)46.04sResponse Time (total)94.91s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)29.40sResponse Time (max)29.40sResponse Time (total)29.40s
Coding
: 3.4 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)55.33sResponse Time (max)89.40sResponse Time (total)110.66s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)65.57sResponse Time (max)65.57sResponse Time (total)65.57s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)1.51sResponse Time (max)1.51sResponse Time (total)1.51s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)174.55sResponse Time (max)174.55sResponse Time (total)174.55s
General Intelligence
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.14sResponse Time (max)18.14sResponse Time (total)18.14s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.95sResponse Time (max)15.95sResponse Time (total)15.95s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.13sResponse Time (max)11.13sResponse Time (total)11.13s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.89sResponse Time (max)26.66sResponse Time (total)51.55s
Coding
: 4.1 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)110.94sResponse Time (max)150.90sResponse Time (total)221.87s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.11sResponse Time (max)21.94sResponse Time (total)42.21s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)38.48sResponse Time (max)68.92sResponse Time (total)115.43s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.57sResponse Time (max)9.57sResponse Time (total)9.57s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.76sResponse Time (max)17.53sResponse Time (total)25.52s
Puzzle Solving
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)26.91sResponse Time (max)61.08sResponse Time (total)80.72s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)90.14sResponse Time (max)90.14sResponse Time (total)90.14s
A test is fully passed only if every run passed for that test.Wrong answer: 5Did not follow instructions: 2Response Time (avg)36.67sResponse Time (max)168.71sResponse Time (total)733.46s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.99sResponse Time (max)48.33sResponse Time (total)71.98s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)107.65sResponse Time (max)140.81sResponse Time (total)215.30s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.67sResponse Time (max)37.67sResponse Time (total)37.67s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.07sResponse Time (max)12.19sResponse Time (total)18.14s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)88.74sResponse Time (max)168.71sResponse Time (total)266.21s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.26sResponse Time (max)9.02sResponse Time (total)14.52s
Puzzle Solving
: 9.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)10.23sResponse Time (max)11.54sResponse Time (total)30.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.38sResponse Time (max)12.38sResponse Time (total)12.38s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.32sResponse Time (max)48.32sResponse Time (total)48.32s
A test is fully passed only if every run passed for that test.Wrong answer: 4Timed out: 1Response Time (avg)36.84sResponse Time (max)178.04sResponse Time (total)736.86s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.58sResponse Time (max)12.75sResponse Time (total)34.33s
Coding
: 6.5 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)122.40sResponse Time (max)178.04sResponse Time (total)244.81s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.24sResponse Time (max)65.24sResponse Time (total)65.24s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.75sResponse Time (max)23.18sResponse Time (total)43.49s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)45.35sResponse Time (max)88.89sResponse Time (total)136.04s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.48sResponse Time (max)25.48sResponse Time (total)25.48s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.13sResponse Time (max)17.18sResponse Time (total)32.26s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.38sResponse Time (max)19.42sResponse Time (total)49.14s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.02sResponse Time (max)15.02sResponse Time (total)15.02s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)91.07sResponse Time (max)91.07sResponse Time (total)91.07s
A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)37.88sResponse Time (max)332.10sResponse Time (total)757.66s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.66sResponse Time (max)6.74sResponse Time (total)18.65s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)69.68sResponse Time (max)130.26sResponse Time (total)139.35s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)19.29sResponse Time (max)19.29sResponse Time (total)19.29s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.18sResponse Time (max)4.35sResponse Time (total)8.36s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)164.14sResponse Time (max)332.10sResponse Time (total)492.41s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.16sResponse Time (max)4.16sResponse Time (total)4.16s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.36sResponse Time (max)3.46sResponse Time (total)6.73s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.76sResponse Time (max)10.54sResponse Time (total)20.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.57sResponse Time (max)10.57sResponse Time (total)10.57s
Trivia
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)37.86sResponse Time (max)37.86sResponse Time (total)37.86s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.99sResponse Time (max)109.60sResponse Time (total)139.95s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.54sResponse Time (max)34.54sResponse Time (total)34.54s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.30sResponse Time (max)9.30sResponse Time (total)9.30s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)114.12sResponse Time (max)114.12sResponse Time (total)114.12s
A test is fully passed only if every run passed for that test.Wrong answer: 5Timed out: 2Response Time (avg)39.40sResponse Time (max)168.16sResponse Time (total)788.00s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.75sResponse Time (max)18.03sResponse Time (total)39.01s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)119.57sResponse Time (max)168.16sResponse Time (total)239.14s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)107.79sResponse Time (max)107.79sResponse Time (total)107.79s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.41sResponse Time (max)29.79sResponse Time (total)46.83s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)63.40sResponse Time (max)119.29sResponse Time (total)190.20s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)34.11sResponse Time (max)34.11sResponse Time (total)34.11s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.88sResponse Time (max)15.44sResponse Time (total)19.76s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.89sResponse Time (max)31.99sResponse Time (total)53.68s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.60sResponse Time (max)4.60sResponse Time (total)4.60s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)52.87sResponse Time (max)52.87sResponse Time (total)52.87s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.00sResponse Time (max)11.53sResponse Time (total)39.99s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)47.38sResponse Time (max)47.38sResponse Time (total)47.38s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.36sResponse Time (max)26.57sResponse Time (total)34.71s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)128.15sResponse Time (max)309.02sResponse Time (total)384.46s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.60sResponse Time (max)14.49sResponse Time (total)23.20s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.19sResponse Time (max)11.19sResponse Time (total)11.19s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)36.98sResponse Time (max)36.98sResponse Time (total)36.98s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)4.59sResponse Time (total)10.98s
Coding
: 3.4 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)183.89sResponse Time (max)299.23sResponse Time (total)367.78s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.87sResponse Time (max)25.87sResponse Time (total)25.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.04sResponse Time (max)4.12sResponse Time (total)6.07s
General Intelligence
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.61sResponse Time (max)3.61sResponse Time (total)3.61s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.98sResponse Time (max)13.98sResponse Time (total)13.98s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)234.19sResponse Time (max)234.19sResponse Time (total)234.19s
A test is fully passed only if every run passed for that test.Wrong answer: 4Extra formatting: 3Response Time (avg)42.39sResponse Time (max)252.69sResponse Time (total)847.76s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)7.43sResponse Time (max)10.89sResponse Time (total)29.72s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)62.62sResponse Time (max)94.25sResponse Time (total)125.23s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.81sResponse Time (max)32.81sResponse Time (total)32.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.72sResponse Time (max)12.13sResponse Time (total)21.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)158.00sResponse Time (max)252.69sResponse Time (total)474.01s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.41sResponse Time (max)18.41sResponse Time (total)18.41s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.36sResponse Time (max)20.80sResponse Time (total)24.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.26sResponse Time (max)44.40sResponse Time (total)54.79s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.12sResponse Time (max)13.12sResponse Time (total)13.12s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)53.51sResponse Time (max)53.51sResponse Time (total)53.51s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.57sResponse Time (max)110.43sResponse Time (total)121.72s
Coding
: 3.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)62.83sResponse Time (max)62.83sResponse Time (total)62.83s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.57sResponse Time (max)29.57sResponse Time (total)29.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.01sResponse Time (max)15.01sResponse Time (total)15.01s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)170.45sResponse Time (max)170.45sResponse Time (total)170.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)108.45sResponse Time (max)108.45sResponse Time (total)108.45s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)25.50sResponse Time (max)37.73sResponse Time (total)51.00s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)47.80sResponse Time (max)54.86sResponse Time (total)95.59s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.96sResponse Time (max)65.96sResponse Time (total)65.96s
Data parsing and extraction
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.42sResponse Time (max)21.42sResponse Time (total)21.42s
Domain specific
: 5.2 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)204.02sResponse Time (max)204.02sResponse Time (total)204.02s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.64sResponse Time (max)15.64sResponse Time (total)15.64s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.30sResponse Time (max)33.30sResponse Time (total)33.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.13sResponse Time (max)20.13sResponse Time (total)20.13s
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)43.65sResponse Time (max)189.38sResponse Time (total)872.90s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.84sResponse Time (max)15.11sResponse Time (total)43.36s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)137.55sResponse Time (max)189.38sResponse Time (total)275.10s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.41sResponse Time (max)92.41sResponse Time (total)92.41s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)38.32sResponse Time (max)41.70sResponse Time (total)76.63s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)53.10sResponse Time (max)90.70sResponse Time (total)159.30s
General Intelligence
: 4.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.30sResponse Time (max)25.30sResponse Time (total)25.30s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.25sResponse Time (max)21.65sResponse Time (total)40.50s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.67sResponse Time (max)24.83sResponse Time (total)53.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.72sResponse Time (max)14.72sResponse Time (total)14.72s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)92.57sResponse Time (max)92.57sResponse Time (total)92.57s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)46.36sResponse Time (max)218.13sResponse Time (total)927.27s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)28.51sResponse Time (max)39.73sResponse Time (total)114.05s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)58.13sResponse Time (max)62.48sResponse Time (total)116.27s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)76.57sResponse Time (max)76.57sResponse Time (total)76.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)28.03sResponse Time (max)30.49sResponse Time (total)56.07s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)100.31sResponse Time (max)218.13sResponse Time (total)300.92s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.36sResponse Time (max)19.53sResponse Time (total)30.73s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)26.11sResponse Time (max)32.37sResponse Time (total)78.32s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)74.73sResponse Time (max)74.73sResponse Time (total)74.73s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)54.46sResponse Time (max)54.46sResponse Time (total)54.46s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.83sResponse Time (max)11.20sResponse Time (total)35.31s
Coding
: 7.4 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)55.26sResponse Time (max)64.81sResponse Time (total)110.53s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.99sResponse Time (max)63.99sResponse Time (total)63.99s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.97sResponse Time (max)26.99sResponse Time (total)37.93s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)181.74sResponse Time (max)216.69sResponse Time (total)545.21s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.58sResponse Time (max)31.48sResponse Time (total)37.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.66sResponse Time (max)17.66sResponse Time (total)17.66s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)44.47sResponse Time (max)44.47sResponse Time (total)44.47s
A test is fully passed only if every run passed for that test.Wrong answer: 6No answer: 3Response Time (avg)49.43sResponse Time (max)192.75sResponse Time (total)988.58s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.40sResponse Time (max)45.73sResponse Time (total)53.58s
Coding
: 3.7 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)126.82sResponse Time (max)192.75sResponse Time (total)253.65s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.01sResponse Time (max)13.01sResponse Time (total)13.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.72sResponse Time (max)24.97sResponse Time (total)29.43s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)149.64sResponse Time (max)163.21sResponse Time (total)448.91s
General Intelligence
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.17sResponse Time (max)4.17sResponse Time (total)4.17s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.89sResponse Time (total)3.03s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)10.22sResponse Time (max)23.65sResponse Time (total)30.66s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.79sResponse Time (max)2.79sResponse Time (total)2.79s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)149.34sResponse Time (max)149.34sResponse Time (total)149.34s
Coding
: 3.5 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)125.80sResponse Time (max)125.80sResponse Time (total)125.80s
Combined
: 4.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)60.39sResponse Time (max)60.39sResponse Time (total)60.39s
Data parsing and extraction
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)7.48sResponse Time (max)7.48sResponse Time (total)7.48s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)237.27sResponse Time (max)237.27sResponse Time (total)237.27s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)11.21sResponse Time (max)17.37sResponse Time (total)22.43s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.35sResponse Time (max)15.35sResponse Time (total)15.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.79sResponse Time (max)80.79sResponse Time (total)80.79s
A test is fully passed only if every run passed for that test.Wrong answer: 3Timed out: 2No answer: 1Response Time (avg)50.92sResponse Time (max)369.32sResponse Time (total)967.47s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)9.64sResponse Time (total)24.78s
Coding
: 2.9 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)258.40sResponse Time (max)369.32sResponse Time (total)516.79s
Combined
: 9.6 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)73.55sResponse Time (max)73.55sResponse Time (total)73.55s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.51sResponse Time (max)20.57sResponse Time (total)33.02s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)23.62sResponse Time (max)27.00sResponse Time (total)47.23s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.76sResponse Time (max)29.76sResponse Time (total)29.76s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.54sResponse Time (max)21.25sResponse Time (total)35.08s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.79sResponse Time (max)6.85sResponse Time (total)17.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.01sResponse Time (max)9.01sResponse Time (total)9.01s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)180.87sResponse Time (max)180.87sResponse Time (total)180.87s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)24.23sResponse Time (max)29.86sResponse Time (total)96.93s
Coding
: 3.9 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)184.97sResponse Time (max)189.03sResponse Time (total)369.94s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)93.11sResponse Time (max)93.11sResponse Time (total)93.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)36.09sResponse Time (max)39.12sResponse Time (total)72.18s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)24.27sResponse Time (max)33.91sResponse Time (total)72.82s
General Intelligence
: 3.4 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)58.29sResponse Time (max)58.29sResponse Time (total)58.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)35.78sResponse Time (max)47.30sResponse Time (total)71.56s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.81sResponse Time (max)34.81sResponse Time (total)34.81s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.99sResponse Time (max)83.99sResponse Time (total)83.99s
Coding
: 6.8 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)118.23sResponse Time (max)129.50sResponse Time (total)236.47s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.96sResponse Time (max)40.96sResponse Time (total)40.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)20.38sResponse Time (max)22.88sResponse Time (total)40.76s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 2Response Time (avg)202.38sResponse Time (max)215.85sResponse Time (total)404.76s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.83sResponse Time (max)17.83sResponse Time (total)17.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.53sResponse Time (max)19.15sResponse Time (total)25.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.92sResponse Time (max)8.92sResponse Time (total)8.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)130.27sResponse Time (max)130.27sResponse Time (total)130.27s
Anti-AI Tricks
: 9.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)43.33sResponse Time (max)71.76sResponse Time (total)173.31s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)143.82sResponse Time (max)143.82sResponse Time (total)143.82s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)73.40sResponse Time (max)90.09sResponse Time (total)220.20s
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.63sResponse Time (max)15.63sResponse Time (total)15.63s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.36sResponse Time (max)40.24sResponse Time (total)54.72s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)31.47sResponse Time (max)46.84sResponse Time (total)94.41s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)133.60sResponse Time (max)133.60sResponse Time (total)133.60s
A test is fully passed only if every run passed for that test.API error: 6Wrong answer: 3Response Time (avg)56.57sResponse Time (max)149.94sResponse Time (total)848.59s…
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)15.12sResponse Time (max)19.99sResponse Time (total)45.37s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)99.76sResponse Time (max)99.76sResponse Time (total)99.76s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)113.09sResponse Time (max)113.09sResponse Time (total)113.09s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)12.11sResponse Time (max)12.11sResponse Time (total)12.11s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)109.04sResponse Time (max)149.94sResponse Time (total)327.11s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.36sResponse Time (max)41.83sResponse Time (total)68.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)27.94sResponse Time (max)45.06sResponse Time (total)55.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.83sResponse Time (max)78.83sResponse Time (total)78.83s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.71sResponse Time (max)47.71sResponse Time (total)47.71s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.62sResponse Time (max)18.61sResponse Time (total)50.50s
Coding
: 6.6 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)165.39sResponse Time (max)168.22sResponse Time (total)330.78s
Combined
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)83.07sResponse Time (max)83.07sResponse Time (total)83.07s
Data parsing and extraction
: 3.5 A test is fully passed only if every run passed for that test.No answer: 2Response Time (avg)37.30sResponse Time (max)54.01sResponse Time (total)74.60s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)73.38sResponse Time (max)101.55sResponse Time (total)220.15s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.96sResponse Time (max)47.48sResponse Time (total)75.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)61.14sResponse Time (max)97.76sResponse Time (total)183.42s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.88sResponse Time (max)16.88sResponse Time (total)16.88s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.99sResponse Time (max)80.99sResponse Time (total)80.99s
A test is fully passed only if every run passed for that test.Wrong answer: 5Response Time (avg)58.43sResponse Time (max)238.07sResponse Time (total)1168.66s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)22.13sResponse Time (max)28.70sResponse Time (total)88.50s
Coding
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)177.97sResponse Time (max)238.07sResponse Time (total)355.94s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)121.49sResponse Time (max)121.49sResponse Time (total)121.49s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)41.15sResponse Time (max)48.02sResponse Time (total)82.30s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)95.91sResponse Time (max)186.74sResponse Time (total)287.73s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)32.24sResponse Time (max)32.24sResponse Time (total)32.24s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.31sResponse Time (max)27.94sResponse Time (total)48.63s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.32sResponse Time (max)37.68sResponse Time (total)72.96s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.32sResponse Time (max)18.32sResponse Time (total)18.32s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)60.56sResponse Time (max)60.56sResponse Time (total)60.56s
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)16.53sResponse Time (max)39.91sResponse Time (total)66.11s
Coding
: 2.7 A test is fully passed only if every run passed for that test.API error: 1Timed out: 1Response Time (avg)51.77sResponse Time (max)51.77sResponse Time (total)51.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.02sResponse Time (max)65.02sResponse Time (total)65.02s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)23.62sResponse Time (max)36.44sResponse Time (total)47.24s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)41.16sResponse Time (max)43.56sResponse Time (total)82.32s
Puzzle Solving
: 5.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)34.84sResponse Time (max)76.46sResponse Time (total)104.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.33sResponse Time (max)21.33sResponse Time (total)21.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.14sResponse Time (max)39.14sResponse Time (total)39.14s