|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
#1
|
|||
|
|||
|
python bites disk stacker !
Though I am late to the party, I still like to present a solution to the disk stacker problem that would have beaten the winning script#10 by 2989 vs. 3160 points, or 9 out of 10 times (see HBL in SUMMARY).
I don't code in PHP, but since the strategy is language independent, and the forum had many posts addressing strategy aspects, I will sketch out my approach in plain text and provide the code (python 2.x, console/file output only) at http://www.efn.org/~hbl/diskstacker/disks.py.zip The results (short list and entire flip/sort history) are at http://www.efn.org/~hbl/diskstacker/disks_results.zip Again, this is not about python versus PHP -- I had great fun working on the problem, and I am grateful to Matt for putting this together and moderating the discussion. And of course, congratulations to Joe Koszo, the winner! SUMMARY: ======== data disks Script10 HBL time/s(*) Input 1 106 128 126 1.986 Input 2 170 218 214 5.218 Input 3 191 250 234 6.311 Input 4 387 515 473 25.12 Input 5 244 295 302 11.09 Input 6 458 592 556 13.49(**) Input 7 260 336 317 12.86 Input 8 199 255 243 12.86 Input 9 124 163 152 4.307 Input 10 295 408 372 23.75 Total 2434 3160 2989 116.992 * - run time on AMD-K7 processor, 550 MHz ** - input6, stack with 458 disks: the 1st optimization exceeded an arbitrary threshold of 40% of 30 seconds; in that case my algorithm is set to terminate and wrap up the result. That's why the larger stack is reported with less run time. OBSERVATIONS, THOUGHTS and STRATEGY: ==================================== a) The entire stack may have unique disks, duplicate disks, and gaps in their numerical sequence. b) Gaps are removed by transforming the initial data into a 'reduced' data set (cuts down on code and increases performance, at least in the case of my algorithm); transformation back-and-forth is easily done by dictionaries (hash tables) c) Fragments of the stack that are in final order (or in reverse order) are 'grouped' together and will NOT be separated again. This is done by first inspecting the initial data, and then subsequently at each flip (flips that reduce the number of groups in a stack (i.e. the entropy) are referred to as 'joins'). This may sound both trivial and intuitive, but in reality it introduces some tough constrains which are responsible for 85% of the code :-( an ugly maze of conditionals). One advantage of that approach is that logical or coding errors inevitably show up as stacks that can never be completely sorted, i.e. you are forced to get it right, e.g. : 5,4,3,9,5,7,5 => grouping [5,4,3],9,[5,7],5 => 9,[3,4,5,5,7],5 (is a dead end) [5,4,3],9,[5,7],5 => [7,5],9,[3,4,5,5] => [5,7,9],[3,4,5,5]... OK d) Branches in the decision making process can only occur (at least from the point of my 'grouping' approach) 1) where no single flip leads to a successful join and you have to pick *one* extra flip that leads to the next join; 2) when duplicates of the same disk# are joined (fairly rare); 3) when a unique disk can join more than one duplicates of a different disk# (rare and tricky) e) A history of the sorting process is recorded by keeping track of the stack, the control data (i.e. un-joined duplicates), and branches that have been explored and found to be less efficient than the current optimum; i.e. we only keep 1.x stacks in memory (the best one and the one we are working on), and we are able to rollback history to any point and continue based on what we have already explored. f) We continue steps (c)-(e) until the stack is sorted. We 'look at the clock' (30 sec.) and, based on the first run, gestimate how many more branches we can explore ... SOUNDS GOOD ? -- well, at that point you may want to test a large number of data sets, iron out some flaws, and narrow in on a strategy about how to walk the tree of possibilities. However, the sobering discovery I made at that point was that a data set of about 500 disks takes 10 to 15 seconds to sort out (python 2.x, K7, 550 MHz). I am not sure how that compares to PHP on Matt's machine, but the bottom line is there won't be too much room to manuever. Search time scales about N^2 . In addition to a good algorithm you also need luck: some data sets are sorted more efficiantly than others. For instance, I flipped the entire input set at the beginning w/o having a need to do so; i.e. I added an additional, redundant step -- however, at half of those 'extra-flippos' the overall performance improved despite the additional step at the beginning. When I realized that (time limitation and luck factor) I lost interest and folded(***). Since I don't code PHP and produce lousy graphics I wouldn't have entered the contest anyhow - I did this all just for fun -- Then, after the weekend I saw results and input data finally posted, plugged those into my algorithm and... was quiet surprised how well this turned out :-) OK, why now doing this all, after the fact? --well, after spending a considerable amount of time in solitude on something that interesting I felt an urge to communicate the results somehow, and the codewalker forum looked like a nice crowd... so here you go. Horst Lueck, hbl-AT-efn-DOT-org (***) some features, like history rollback remained uncoded, though the hooks are largely in place. |
|
#2
|
|||
|
|||
|
RE: python bites disk stacker !
Don't you python people have your own contests??
http://uselesspython.com/programmingcontests.html Just kidding, excellent work. I look forward to you learning PHP and seeing your entries on future contests... |
|
#3
|
|||
|
|||
|
RE: python bites disk stacker !
Matt, thanks for the .py link and taking it with humor.
Actually it was a friend who suggested looking at this contest. He is coding in PHP and wants to learn python -- so he asked 'how would you do it in python' ...and that's how I got hooked (-: - Horst (I am actually not that long coding in python either, and have never participated in a contest -- but I'll check out their link) |
|
#4
|
|||
|
|||
|
RE: python bites disk stacker !
Ahaha, what a funny name.. useless python.. I didn't really get the point of the site, though :O
Thanks for posting, BTW. I hadn't thought about solving gaps like you suggested in part B... The only other thing that I'm still not sure about was how you would deal with duplicates; if there were no duplicates and gaps, it would've been much easier, IMO Also, I don't get how the winner dealt with dups; according to vertigo, it seems the winner somehow weighted the dups, so that the more duplicated numbers would be joined more early...? The joining part is pretty much what I psuedocoded... I do like your vocab though (the entropy part makes sense in explaining) |
|
#5
|
|||
|
|||
|
RE: python bites disk stacker !
Well, gaps aren't really a problem - just go through the numbers you get and numerate them sequentially.. For example:
3 6 1 7 --sort--> 1 3 6 7 --seq.--> 1 2 3 4 --unsort--> 2 3 1 4 The nice thing is that moves are exactly the same, so the solution doesn't need to be translated back in any way... Duplicates are a different thing, of course |
|
#6
|
|||
|
|||
|
RE: python bites disk stacker !
webhappy wrote:
""" The only other thing that I'm still not sure about was how you would deal with duplicates; ... it seems the winner somehow weighted the dups, so that the more duplicated numbers would be joined more early...? """ That's interesting! - at some early stage I considered 'vacuuming up' all duplicates first ('cuz they are trouble), and then flipping the rest. However, that is wasteful since you can handle duplicates(dups) properly at any stage if you consider *all* possibilities that require a distinct approach: - a dup can be by itself (or in a group of its same) - a dup can be part of an ascending/descending group - other members of that ascending/descending group can be dups too, but of a different disk# - ascending/descending is relative since any flip reverts it - all of this can get complicated by a factor of 4 since either, the top of the stack, and/or the target of the join can be any of the above. Having expanded on the complexity of the problem I assume there is no simple formula for handling dups (though I am open to be proven wrong). I think I handled all of those possibilities through a grid of conditionals which, BTW, are responsible for about 80% of my code. If you look at my full output data set you'll find a record of remaining(!) dups(=remDups, not just any dups) at the end(-1) of each stage of the flipping process; a plain number in that record is a dup, a 'number+' indicates that 'dup' is the max of that group, i.e. I eliminated the asc/descending issue since flipping doesn't change the max of a group; same for min(group) / 'number-' If you are curious about dups you can test my data set from the into above with python disks.py "5,4,3,9,5,7,5" or for random data you can look at lots of dups with: python disks.py --generate 10 20 1 15 - but remember, my intermediate output is in 'reduced' space OK, so much about dups .................... Horst |
|
#7
|
|||
|
|||
|
RE: python bites disk stacker !
Horst:
You say duplicates are easy to handle. You just try multiple paths and see which one is the best. I imagine you see PHP and think it is just like C++, but it is not. C and C++ are very quick at doing most simple things, for instance, the subscript operator [] for arrays. In PHP, most of these simple operations are alot slower. Sometimes if you have a few lines such as $this->data[2]->prev->is_set(), it can get quite slow if you do this a few thousand times, which is likely. The main problem with the disc stacking problem was the limited processing power. JoE won because he realised that trying multiple combinations of duplicate weights would be too costly (for PHP). I imagine Python runs a lot faster than PHP, as does C. We must remember that PHP is a CGI type language, and has no strict variable types, memory allocation, etc. These luxuries come at a price. It doesn't perform quite as quick. Nor should we expect it to. I think if you had tried to code it in PHP, you would have found it more difficult. |
|
#8
|
|||
|
|||
|
RE: python bites disk stacker !
how come all/most of you see dubs or gaps as problems?
I didn't have any problems what so ever with dubs and gaps. I guess my solution was just simple enough not to be bothered. But hey, I didn't win ... so maybe there is something about dubs and gaps... ;o) hendrik |
|
#9
|
|||
|
|||
|
RE: python bites disk stacker !
Hi Vertigo,
Could you explain what you mean by weighting duplicates? How is it beneficial? And how does he utilize his weighting? Am I correct when I think that he just flips a duplicated number before unduplicated ones, assuming they have the same priority; eg. they would fix the same number of joins, so we choose to join a duplicated one first |
|
#10
|
|||
|
|||
|
RE: python bites disk stacker !
Vertigo, you raised two interesting points.
a) what a computer language is intended for: yes, maybe we are pushing the limits a bit... b) execution speed: python is, for example like perl, an interpreted language, and ranks in speed somewhat comparable to perl, maybe a bit slower, but no comparison to C or C++. PHP is slower than python in most benchmarks, but not order(s) of magnitude like you find it with C. If you look at http://www.bagley.org/~doug/shootout/bench/hash/ you will find that hash tables can be even faster in PHP. (the link above is actually a very informative site with many short code examples :-) Now the good news is that I got much more exposure to PHP than I originally intended, the result being that I will start looking into PHP, and use it whenever I can avoid doing something in perl - the syntax is certainly much more appealing to me. - Horst |
|
#11
|
|||
|
|||
|
RE: python bites disk stacker !
webhappy:
When I say weight the dups, I mean deciding which one has more weight. Look at this stack: 1-2-3-4-2-3 the correct order would be 1-2-2-3-3-4. Look at the three's. Which three has more weight? In other words, which is the first three in the order, and which is the second? If you look at the 4, it has a 3 next to it. So if we make that 3 the second three in the order, it forms a sorted range. If you look at the other three now, you should know that it matches the 2, so that 2 has more weight than the other 2. Basicly, 1-2-3-4-2-3 becomes 1-2-5-6-3-4 to your algorithm. Now it is easy to sort, because the duplicates have been weighted optimally. JoE's algorithm doesn't do this. He just weights them in order. So 1-2-3-4-2-3 becomes 1-2-4-6-3-5. You see, the first 2 becomes the first 2 in the order, and the first 3 becomes the first 3 in the order. The weights now are not optimal. The algorithm will take longer to sort. The process of deciding how to weight the dups becomes very difficult if, for example, you have 500 discs with 50 or more duplicates. JoE didn't worry about how to assign the weights. You can see this in the first page of his source code. You can try all possible combinations of weights, simply by using nested for loops, but for PHP this is WAY to slow. In some cases, the weights change during the sort. In other words, during the sort, if you swap some of the weights, you can get a faster result. for example, 6-8-4-2-7-5-8-6. I failed to find a good way to handle duplicates. Do you understand better now? |
|
#12
|
|||
|
|||
|
RE: python bites disk stacker !
Oh, I understand now.
So you basically hash in a similar manner that we deal with gaps. In a more clever weighting scheme, it orders the duplicates according to other factors besides its current order. Thanks. |
|
#13
|
|||
|
|||
|
RE: python bites disk stacker !
Vertigo wrote:
""" When I say weight the dups, I mean deciding which one has more weight. Look at this stack: 1-2-3-4-2-3 """ In a 'grouping' approach this becomes at first inspection (w/o any extra flip) [1,2],[3,4],[2,3] // 2 and 3 are 'remaining' dups and can only be joined once, ascending or descending. 'groups' are data structures (some sort of container, array,...) Now I have to look at (loop through) only 3 rather than 6 objects, and that number is shrinking as I proceed. Also, fully joined dups are no 'remaining' problem any more and can be taken out of my list of things to check for (-: I'd call 'grouping' a design pattern which removes irrelevant data from your radar screen; there is more to it but we're just talking about the troubling dups. I think this can be done in PHP. 1,2,3,4,2,3 [1,2],[3,4],[2,3] [2,1],[3,4],[2,3] [4,3],[1,2,2,3] [3,2,2,1],[3,4] [1,2,2,3,3,4] or in 'reduced' space, [groups] :: [remaining dups] [[0, 1], [2, 3], [1, 2]] :: [1, 2, '1+', '2-', '1-', '2+'] [[1, 0], [2, 3], [1, 2]] :: [1, 2, '1+', '2-', '1-', '2+'] [[3, 2], [0, 1, 1, 2]] :: [2, '2-', '2+'] [[2, 1, 1, 0], [2, 3]] :: [2, '2-', '2+'] [[0, 1, 1, 2, 2, 3]] :: [] One last word about removing gaps, i.e. transforming the problem into 'reduced' space: by doing so you reduce the number of candidates for a potential join with 'topDisk' from several hundred to just three: topDisk -1, topDisk, topDisk +1 (with gaps, top+/-1 has no meaning). Depending on the syntax of your language you may be able to find those candidates w/o any loop at all. OK, I'm taking off and moving on --had far too much fun with this... - Horst |
|
#14
|
|||
|
|||
|
RE: python bites disk stacker !
Horst: Have a look at this stack:
2-1-4-3-6-5 There are basicly three methods to sort this stack, one taking 7 flips, one taking 6 flips, and one taking 5 flips. Do you have any ideas about how to make an algorithm smart enough to find the 5 flips result? |
|
#15
|
|||
|
|||
|
RE: python bites disk stacker !
ok I've decided I need to post here.
first of all welcome Horst second my entry did what you are saying (only you described it much better than me Have a look at http://www.jrasplin.ukshells.co.uk/discsphp.txt The way it is designed to deal with 2,1,4,3,6,5 is like this (regions (data structures as Horst described) are shown with square brackets): 1:[1,2][4,3][6,5] 2:[3,4][2,1][6,5] 3:[4,1][6,5] -- 3 and 2 have been removed from the problem 4:[1,4][6,5] 5:[5,6][4,1] 6:[6,1] --5,4 removed 7:[1,6] interestingly, for reasons that now escape me, my code finds the 6 move solution by breaking up a region That and the failure to find a solution for two of the contests input files suggests there is a problem in deciding if regions should be joined. Anyway thats me done on this subject I'll definately shut up now ;) Good luck to everyone entering the current contest - I'm probably not entering, prolog assignment due Thursday John |