|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
Stay one step ahead of the competition. Evaluate and give feedback
on some of the hottest web development tools on the market today.
Make your opinion heard! Click
Here
|
|
#31
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Hello,
i have been mostly lurking around, but i intend to compete. I am using Fidian's benchmark, but i have disabled my suicide function and set the time_limit to 500. Otherwise the scores depend on the CPU of the machine used (at least for games where the suicide comes into play). My scores in my Pentium4 3.0Ghz (with no time limit/suicide i repeat) are: apples: 6.4 robots: 277.5 moves: 73.5 Score: 3264.1 seconds: 8.6 and i go over 60 seconds to one game. Of course in the judges PC this may change (depending on whether it will be faster or slower than my PC). I have another robot that does: 6.4 300.7 76.5 3497.3 85.2 (SLOW) It is very slow, but it shows there is room for improvement! What are your scores with no timelimit/suicide? About Std... what i know and a question: Standard Deviation is a statistic showing the dispersion of the number set. For example lets say we run 4 games and we get these scores: 10, 10, 10, 10 The Mean is 10 and Std=0 If we got these scores: 0, 10, 10, 20 The Mean is 10 but Std=8,164965809 If we got these scores: 0, 0, 20, 20 The Mean is 10 but Std=11,54700538 So the Std is going up when the scores are far away from the Mean. In general a small Std is a good thing (it shows an algorithm is stable and will give mostly same results). But does anyone know what exactly it means? E.g. is Std=5 two times better than Std=10? Obviously not, but how to actually evaluate the Std? Regards, Dimitris |
|
#32
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
The variance of a sample population is:
(E(x-u)^2)/(n-1). (E means summation, x is a single value, u is the mean, and n is the number of values). The variance for 10,20,20,10 would be: ((10-15)^2 + (10-15)^2 + (20-15)^2 + (20-15)^2)/(4-1) = 33.33 The standard deviation is the square root of the variance. It is a measurement of how spread out the data is. The standard deviation of 10,20,20,10 would be 5.77. The Z-score, or standard score for a value, can be found with the following equation: (x-u)/S (S is the standard deviation). For 20, the z-score would be: (20-15)/5.77 = .87 This means that 20 is .87 positive standard deviations away from the mean. You can then find that z-score in a z-score table(http://techniques.geog.ox.ac.uk/mod...les/z-score.htm). This tells you that .8078, or 80.78% of the data should fall below 20. With more and more data, this becomes more and more accurate. |
|
#33
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Cypher - I think the reason for the difference in scores between the different runs is due to the robot getting near the 60-second death time and it commits suicide. I'm betting that the load average for your machine was different between the two runs and PHP got more of the CPU time in the latter runs.
What I'm curious about is if our machines are equivalent. I'm running stuff on a Pentium III, 733 mhz -- a little home computer that is set up to run PHP on Linux. What I could really use is a benchmarking program that would run a simulated load on a machine and give us a rating, especially so we know how fast the machine is for the final testing. So, if you have a fast machine, you would know that you have 1.8x the speed of the final box. If you have a dinky 486, you are probably at 0.4x the speed. Everyone can adjust their run times accordingly. daremon/Dimitris - Glad to see you here! I'll post later what my scores are without time limit and without suicide. |
|
#34
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Just a thought ... could a couple of you run the benchmark.php program for robotavoider and tell me what you get for the total (and maybe average + StD) time? That is about as good of a benchmark as any program I'd whip up. Might be a bit small, but would be an easy place to start.
Edit: The 100 runs of robotavoider takes 442.2 seconds on my Linux machine -- a PIII/733 with 512MB RAM. |
|
#35
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
hi all !
here are my 100-run results for robotavoider.php P4C 2.6GHz 512DDR400 winXP + php 4.3.5 Sum: 1 apl, 7495 rob, 1418 mov = 72214 pts, 174.9 sec, 0 fail Avg: 0 apl, 75 rob, 14.2 mov = 722.1 pts, 1.7 sec, 0% fail StD: 0.1 apl, 72.97 rob, 5.08 mov = 732.54 pts, 1.66 sec |
|
#36
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Hi!
Robotavoider on a 3.06Ghz Pentium4, with 1GB memory, WinXP and PHP 4.3.5: Sum: 1 apl, 7495 rob, 1418 mov = 72214 pts, 152.6 sec, 0 fail FINALAvg 0 apl, 75 rob, 14.2 mov = 722.1 pts, 1.5 sec, 0% fail StD: 0.1 apl, 72.97 rob, 5.08 mov = 732.54 pts, 1.48 sec P.S. Zack, thanks for the info on Standard Deviation! |
|
#37
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Who has the official testing machine? What are the stats there? If it's going to be about 3x faster than my machine (that's about how it looks, judging from my stats versus the other two), then I need to start using another machine or bump up the time limit.
|
|
#38
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Tyler,
I think that you are right about the suicide time being attributable to the change in my scores, but not because of the load on my machine. First trial I did was with the older robots lib, seccond trial is with the newer one (which was *way* faster). Also, it seems that if I run it with php5 RC2 (thats what I have on my linux box) it runs alot slower than if I upload it to an actual server. |
|
#39
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Quote:
Matt is going to be doing the judging. |
|
#40
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
here are the scores of a first not fully optimized AI:
Sum: 594 apl, 23569 rob, 6719 mov = 281652 pts, 670 sec, 0 fail Avg: 5.9 apl, 235.7 rob, 67.2 mov = 2816.5 pts, 6.7 sec, 0% fail StD: 10.83 apl, 322.11 rob, 92.73 mov = 3528.89 pts, 9.73 sec and I'm still working on a completely new one (more accurate) |
|
#41
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
the benchmarking thread seems to be rather quiet in the last week... perhaps more pple can start posting their scores ? would be nice to have something to compare against
anyway heres my scores (on my 1.3ghz machine): Sum: 608 apl, 21914 rob, 6431 mov = 267078 pts, 1825.8 sec, 0 fail Avg: 6.1 apl, 219.1 rob, 64.3 mov = 2670.8 pts, 18.3 sec, 0% fail StD: 10.54 apl, 272.7 rob, 93.86 mov = 2996.06 pts, 13.83 sec (not as impressive as DDY's =P) btw, i wanted to see if my program would perform as well using the official lib and i did more benchmarking using the official lib and tyler's benchmark script ... i got the following [its 10 runs of the same 100 games with different tele locations] : at a glance the max=296616 and min=225736 ... which is quite far apart... it makes me wonder if judging with 100 games is sufficient ... perhaps one alternative way of judging is to perhaps run N games M times ... and each of the M times, the teleport locations are different ... but for each X run, all the scripts tested get the same teleport location. has anyone done a similiar benchmark ? |
|
#42
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
My AI progress seems to have hit somewhat of a wall. That and I havent had much time to work on it in the past week. If anyone was interested (I noticed a few people from another thread were wondering this) my AI is currently 880 lines (including comments and line breaks, but there arent too many comments).
I have good algorithms for getting apples and killing robots, but the problem I am having now is determining *which* apple to get. I was thinking if implementing an antigravity movement algorithm, but havent really gotten around to it. Hopefully next week I will post some new benchmarks. Also, I agree with the new idea for benchmarking (posted by an Anonymous user?). The actual judgin script (as far as I can tell, maybe they will change this) uses random teleports for every bot, so some people will get luckier than others. Fidian, how about you mod your php benchmark once more to run the same hundred maps, each a hundred times with a hundred dif teleports? he he he he. That would give you the *ultimate* benchark average (but would take really long to process). I think that the benchmarking we have going now should be good enough, dont bother worrying about a few points here or there; I personally just look at the first two significant digits to get a rough idea of how it will perform. |
|
#43
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
I'm getting:
59: 0 apples, 88 robots, 14 moves = 852 points in 42.2 seconds For game #59. |
|
#44
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
I will post my latest benchmarks today... it is running right now...
My question is, it appears that the averages are suffering from sensitivity to outliers (16000+ point games and such.) What about using a quartile mean or a median score. Either of these may provide a better statistical measure and remove some of the 'luck' factor. -Jeff |
|
#45
|
|||
|
|||
|
[appleeaters]RE: Benchmarking
Just for comparison with game 59:
59: 6 apples, 331 robots, 50 moves = 3810 points in 55.4 seconds Quartile mean and median ... doesn't the average/standard deviation give you enough? You will get those wacky 16,000 point games in real testing also -- don't you want to see them? They would certainly affect the outcome. Especially if everyone else gets them and you don't get them because of how you tweaked your robot. If you want them, I'm pretty sure I could figure them out for you easily enough. |