Older Contests
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
User Name:
Password:
Remember me
Go Back   Codewalkers ForumsPHP ContestsOlder Contests

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Rate Thread Display Modes
 
Unread Codewalkers Forums Sponsor:
Stay one step ahead of the competition. Evaluate and give feedback on some of the hottest web development tools on the market today. Make your opinion heard! Click Here
  #31  
Old June 2nd, 2004, 09:41 PM
daremon daremon is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 19 daremon User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
[appleeaters]RE: Benchmarking

Hello,

i have been mostly lurking around, but i intend to compete. I am using Fidian's benchmark, but i have disabled my suicide function and set the time_limit to 500. Otherwise the scores depend on the CPU of the machine used (at least for games where the suicide comes into play).

My scores in my Pentium4 3.0Ghz (with no time limit/suicide i repeat) are:
apples: 6.4
robots: 277.5
moves: 73.5
Score: 3264.1
seconds: 8.6

and i go over 60 seconds to one game. Of course in the judges PC this may change (depending on whether it will be faster or slower than my PC).

I have another robot that does:
6.4
300.7
76.5
3497.3
85.2 (SLOW)
It is very slow, but it shows there is room for improvement!

What are your scores with no timelimit/suicide?

About Std... what i know and a question:
Standard Deviation is a statistic showing the dispersion of the number set. For example lets say we run 4 games and we get these scores:
10, 10, 10, 10
The Mean is 10 and Std=0
If we got these scores:
0, 10, 10, 20
The Mean is 10 but Std=8,164965809
If we got these scores:
0, 0, 20, 20
The Mean is 10 but Std=11,54700538

So the Std is going up when the scores are far away from the Mean. In general a small Std is a good thing (it shows an algorithm is stable and will give mostly same results).

But does anyone know what exactly it means? E.g. is Std=5 two times better than Std=10? Obviously not, but how to actually evaluate the Std?

Regards,
Dimitris

Reply With Quote
  #32  
Old June 2nd, 2004, 11:08 PM
zackcoburn zackcoburn is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 184 zackcoburn User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
[appleeaters]RE: Benchmarking

The variance of a sample population is:

(E(x-u)^2)/(n-1).

(E means summation, x is a single value, u is the mean, and n is the number of values). The variance for 10,20,20,10 would be:

((10-15)^2 + (10-15)^2 + (20-15)^2 + (20-15)^2)/(4-1) = 33.33

The standard deviation is the square root of the variance. It is a measurement of how spread out the data is. The standard deviation of 10,20,20,10 would be 5.77.

The Z-score, or standard score for a value, can be found with the following equation:

(x-u)/S

(S is the standard deviation).

For 20, the z-score would be:

(20-15)/5.77 = .87

This means that 20 is .87 positive standard deviations away from the mean.

You can then find that z-score in a z-score table(http://techniques.geog.ox.ac.uk/mod...les/z-score.htm). This tells you that .8078, or 80.78% of the data should fall below 20. With more and more data, this becomes more and more accurate.

Reply With Quote
  #33  
Old June 3rd, 2004, 03:01 PM
fidian fidian is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 45 fidian User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
[appleeaters]RE: Benchmarking

Cypher - I think the reason for the difference in scores between the different runs is due to the robot getting near the 60-second death time and it commits suicide. I'm betting that the load average for your machine was different between the two runs and PHP got more of the CPU time in the latter runs.

What I'm curious about is if our machines are equivalent. I'm running stuff on a Pentium III, 733 mhz -- a little home computer that is set up to run PHP on Linux. What I could really use is a benchmarking program that would run a simulated load on a machine and give us a rating, especially so we know how fast the machine is for the final testing. So, if you have a fast machine, you would know that you have 1.8x the speed of the final box. If you have a dinky 486, you are probably at 0.4x the speed. Everyone can adjust their run times accordingly.

daremon/Dimitris - Glad to see you here! I'll post later what my scores are without time limit and without suicide.

Reply With Quote
  #34  
Old June 3rd, 2004, 06:11 PM
fidian fidian is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 45 fidian User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
[appleeaters]RE: Benchmarking

Just a thought ... could a couple of you run the benchmark.php program for robotavoider and tell me what you get for the total (and maybe average + StD) time? That is about as good of a benchmark as any program I'd whip up. Might be a bit small, but would be an easy place to start.

Edit: The 100 runs of robotavoider takes 442.2 seconds on my Linux machine -- a PIII/733 with 512MB RAM.

Reply With Quote
  #35  
Old June 3rd, 2004, 11:05 PM
DDY DDY is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Location: Rouen, 76, FRANCE
Posts: 25 DDY User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
[appleeaters]RE: Benchmarking

hi all !

here are my 100-run results for robotavoider.php
P4C 2.6GHz 512DDR400
winXP + php 4.3.5

Sum: 1 apl, 7495 rob, 1418 mov = 72214 pts, 174.9 sec, 0 fail
Avg: 0 apl, 75 rob, 14.2 mov = 722.1 pts, 1.7 sec, 0% fail
StD: 0.1 apl, 72.97 rob, 5.08 mov = 732.54 pts, 1.66 sec

Reply With Quote
  #36  
Old June 4th, 2004, 07:44 AM
daremon daremon is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 19 daremon User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
[appleeaters]RE: Benchmarking

Hi!

Robotavoider on a 3.06Ghz Pentium4, with 1GB memory, WinXP and PHP 4.3.5:

Sum: 1 apl, 7495 rob, 1418 mov = 72214 pts, 152.6 sec, 0 fail
FINALAvg 0 apl, 75 rob, 14.2 mov = 722.1 pts, 1.5 sec, 0% fail
StD: 0.1 apl, 72.97 rob, 5.08 mov = 732.54 pts, 1.48 sec

P.S. Zack, thanks for the info on Standard Deviation!

Reply With Quote
  #37  
Old June 4th, 2004, 12:30 PM
fidian fidian is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 45 fidian User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
[appleeaters]RE: Benchmarking

Who has the official testing machine? What are the stats there? If it's going to be about 3x faster than my machine (that's about how it looks, judging from my stats versus the other two), then I need to start using another machine or bump up the time limit.

Reply With Quote
  #38  
Old June 4th, 2004, 03:35 PM
Cypher Cypher is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 40 Cypher User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
Send a message via ICQ to Cypher Send a message via Yahoo to Cypher
[appleeaters]RE: Benchmarking

Tyler,

I think that you are right about the suicide time being attributable to the change in my scores, but not because of the load on my machine. First trial I did was with the older robots lib, seccond trial is with the newer one (which was *way* faster). Also, it seems that if I run it with php5 RC2 (thats what I have on my linux box) it runs alot slower than if I upload it to an actual server.

Reply With Quote
  #39  
Old June 4th, 2004, 07:03 PM
zackcoburn zackcoburn is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 184 zackcoburn User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
[appleeaters]RE: Benchmarking

Quote:
Who has the official testing machine?


Matt is going to be doing the judging.

Reply With Quote
  #40  
Old June 5th, 2004, 03:09 PM
DDY DDY is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Location: Rouen, 76, FRANCE
Posts: 25 DDY User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 0
[appleeaters]RE: Benchmarking

here are the scores of a first not fully optimized AI:

Sum: 594 apl, 23569 rob, 6719 mov = 281652 pts, 670 sec, 0 fail
Avg: 5.9 apl, 235.7 rob, 67.2 mov = 2816.5 pts, 6.7 sec, 0% fail
StD: 10.83 apl, 322.11 rob, 92.73 mov = 3528.89 pts, 9.73 sec

and I'm still working on a completely new one (more accurate)

Reply With Quote
  #41  
Old June 10th, 2004, 01:30 PM
Anonymous Anonymous is offline
Registered User
Codewalkers God 35th Plane (22000 - 22499 posts)
 
Join Date: Apr 2007
Posts: 22,309 Anonymous User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 24
[appleeaters]RE: Benchmarking

the benchmarking thread seems to be rather quiet in the last week... perhaps more pple can start posting their scores ? would be nice to have something to compare against

anyway heres my scores (on my 1.3ghz machine):
Sum: 608 apl, 21914 rob, 6431 mov = 267078 pts, 1825.8 sec, 0 fail
Avg: 6.1 apl, 219.1 rob, 64.3 mov = 2670.8 pts, 18.3 sec, 0% fail
StD: 10.54 apl, 272.7 rob, 93.86 mov = 2996.06 pts, 13.83 sec

(not as impressive as DDY's =P)

btw, i wanted to see if my program would perform as well using the official lib and i did more benchmarking using the official lib and tyler's benchmark script ... i got the following [its 10 runs of the same 100 games with different tele locations] :

php Code:
Original - php Code
  1.  
  2. benchmark points     
  3. tylerslib 267078 pts 
  4. official1 291828 pts 
  5. official2 296616 pts  [max]
  6. official3 237362 pts 
  7. official4 252788 pts 
  8. official5 249416 pts 
  9. official6 225736 pts  [min]
  10. official7 268008 pts 
  11. official8 261790 pts 
  12. official9 246340 pts
  13.  
  14. average   259696 pts 


at a glance the max=296616 and min=225736 ... which is quite far apart... it makes me wonder if judging with 100 games is sufficient ...

perhaps one alternative way of judging is to perhaps run N games M times ... and each of the M times, the teleport locations are different ... but for each X run, all the scripts tested get the same teleport location.

has anyone done a similiar benchmark ?

Reply With Quote
  #42  
Old June 10th, 2004, 08:14 PM
Cypher Cypher is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 40 Cypher User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
Send a message via ICQ to Cypher Send a message via Yahoo to Cypher
[appleeaters]RE: Benchmarking

My AI progress seems to have hit somewhat of a wall. That and I havent had much time to work on it in the past week. If anyone was interested (I noticed a few people from another thread were wondering this) my AI is currently 880 lines (including comments and line breaks, but there arent too many comments).

I have good algorithms for getting apples and killing robots, but the problem I am having now is determining *which* apple to get. I was thinking if implementing an antigravity movement algorithm, but havent really gotten around to it.

Hopefully next week I will post some new benchmarks. Also, I agree with the new idea for benchmarking (posted by an Anonymous user?). The actual judgin script (as far as I can tell, maybe they will change this) uses random teleports for every bot, so some people will get luckier than others. Fidian, how about you mod your php benchmark once more to run the same hundred maps, each a hundred times with a hundred dif teleports? he he he he. That would give you the *ultimate* benchark average (but would take really long to process). I think that the benchmarking we have going now should be good enough, dont bother worrying about a few points here or there; I personally just look at the first two significant digits to get a rough idea of how it will perform.

Reply With Quote
  #43  
Old June 11th, 2004, 12:13 AM
Anonymous Anonymous is offline
Registered User
Codewalkers God 35th Plane (22000 - 22499 posts)
 
Join Date: Apr 2007
Posts: 22,309 Anonymous User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 24
[appleeaters]RE: Benchmarking

I'm getting:

59: 0 apples, 88 robots, 14 moves = 852 points in 42.2 seconds

For game #59.

Reply With Quote
  #44  
Old June 11th, 2004, 11:34 AM
jcaughel jcaughel is offline
Contributing User
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Location: Buffalo, NY, USA
Posts: 283 jcaughel User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: 11 m 4 sec
Reputation Power: 2
Send a message via ICQ to jcaughel Send a message via AIM to jcaughel Send a message via Yahoo to jcaughel
[appleeaters]RE: Benchmarking

I will post my latest benchmarks today... it is running right now...
My question is, it appears that the averages are suffering from sensitivity to outliers (16000+ point games and such.) What about using a quartile mean or a median score. Either of these may provide a better statistical measure and remove some of the 'luck' factor.
-Jeff

Reply With Quote
  #45  
Old June 11th, 2004, 11:28 PM
fidian fidian is offline
Codewalkers Newbie (0 - 499 posts)
 
Join Date: Apr 2007
Posts: 45 fidian User rank is Just a Lowly Private (1 - 20 Reputation Level) 
Time spent in forums: < 1 sec
Reputation Power: 2
[appleeaters]RE: Benchmarking

Just for comparison with game 59:
59: 6 apples, 331 robots, 50 moves = 3810 points in 55.4 seconds

Quartile mean and median ... doesn't the average/standard deviation give you enough? You will get those wacky 16,000 point games in real testing also -- don't you want to see them? They would certainly affect the outcome. Especially if everyone else gets them and you don't get them because of how you tweaked your robot.

If you want them, I'm pretty sure I could figure them out for you easily enough.

Reply With Quote
Reply

Viewi