[comp.lang.forth] All Sorts of Sorts

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/09/90)

 Date: 02-07-90 (17:50)              Number: 2869 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 This is not meant to detract from Ian Green's Modula-2 discussions 
 where he has been discussing SORTS, but rather may shed more light on 
 the topic; at least from a novice sorter's viewpoint. 

 I would also like to apologize at the outset for the lack of references 
 to the results of the Forth Sort contest that was held on GEnie.  I 
 would appreciate a note from anyone who recalls the name(s) of the 
 files that pertain to that contest.  I have looked over the various 
 files related to sorts on the BCFB and found the descriptions of sorts 
 by Norm Arnold in SORT_WIN.ZIP most helpful. 

 To begin using and understanding SORTS I needed a generalized approach 
 that would allow applying the SORT methods to a variety of data 
 elements.  This requires the use of a few variables and deferred 
 execution words.  Vector variables were chosen for the deferred words. 
 There is an obvious performance penalty in using a generalized approach 
 such as this but I felt it would give me an opportunity to understand 
 the various SORT methods as well as allow experimentation on different 
 data elements. 

 The most striking, though simple, find was discovered through 
 experimenting with consecutive SORTS.  Data elements containing several 
 fields were first sorted on one field and then sorted on a second 
 field.  Of the three sorts tried (BUBBLE, SHELL, and QWIK) only the 
 BUBBLE Sort preserved the ordering of the first sort within the second 
 sort.  Unfortunately, the BUBBLE sort is by far the slowest of the 
 three, taking five times longer than the QWIK sort and twice as long as 
 the SHELL sort. 

 The next few messages contain the Forth code used.  This is essentially 
 Forth-83 Standard which has implications in the DO...+LOOP where a 
 negative integer is used.  A simple example of an array sort was used 
 to initially test the functioning of the sorts.  This is illustrated 
 with the code but requires a RANDOM number generator, several of which 
 are available on the board. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/09/90)

 Date: 02-07-90 (17:50)              Number: 2870 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 \ SORTS.FLY     Part 1 - Generalizing the Sort Function 

 (( 
 For a sort utility to have flexibility some words have deferred 
 actions.  Essentially, when sorting a collection of data elements, 
 two elements are compared at a time.  Depending on the results of the 
 comparison the placement of two elements in the data collection may 
 require that they be exchanged. 

 The comparison test itself may need to be modified to suit the data 
 elements.  Also the exchange routine will need to be modified for the 
 particular data collection. 

 The following assumes the data elements are fixed length and uses the 
 space at PAD when comparing and exchanging elements. 

 )) 

 ONLY FORTH ALSO DEFINITIONS   DECIMAL 

 VARIABLE DATA.MAX# ( --adr)     \ maximum number of data elements 

 VARIABLE DATA.WIDTH ( --adr)    \ width of each data element 

 VARIABLE ~DATA@ ( --adr)        \ Vector for method to fetch data 

 : DATA@ ( adr,i--) ~DATA@ @EXECUTE ; 

 VARIABLE ~DATA! ( --adr)        \ Vector for method to store data 

 : DATA! ( adr,i--) ~DATA! @EXECUTE ; 

 (( 
 The following word uses PAD as the buffer and accesses different 
 parts of the buffer for temporary storage of different data elements. 

 )) 

 : data.buf ( n--adr) DATA.WIDTH @ * PAD + ; 

 : EXCHANGE ( n1,n2--)   \ n1,n2 are indexes to the data elements 
     2DUP >R >R >R >R 
     1 data.buf  R> DATA@    2 data.buf  R> DATA@ 
     2 data.buf  R> DATA!    1 data.buf  R> DATA! ; 

 (( 
 In the following comparison of data elements the flag returned shall 
 be -1, 0, or 1 to indicate if the data at adr1 is less than, equal 
 to, or greater than the data at adr2, respectively. 

 )) 

 VARIABLE ~SORT? ( --adr)        \ Vector action for comparison test 

 : SORT? ( adr1,adr2--?) ~SORT? @EXECUTE ; 

 (( 
 Various SORT algorhythms require the following variables be set by 
 any applications that call the sort. 

             DATA.MAX# 
             DATA.WIDTH 
             ~DATA@ 
             ~DATA! 
             ~SORT? 

 )) 

 \ The following is a sort of an array of 2 byte data elements 

 VARIABLE startdata ( --adr) 

 : CELL@ ( adr,i--) 2* startdata @ + @ SWAP ! ; 

 : CELL! ( adr,i--) >R @ R> 2* startdata @ + ! ; 

 : CELL.COMPARE ( adr1,adr2--?) SWAP @ SWAP @ - ; 

 : CELL.SORT ( --) 
     2 DATA.WIDTH ! 
     ['] CELL@        ~DATA@ ! 
     ['] CELL!        ~DATA! ! 
     ['] CELL.COMPARE ~SORT? ! ; 

                                                       Part 2... 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/09/90)

 Date: 02-07-90 (17:50)              Number: 2871 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 \ SORTS.FLY     Part 2 - Different Sort Methods 

 (( 
 The BUBBLE SORT compares the first element with the second element. 
 If necessary they are exchanged.  Then the second element is compared 
 with the third, etc.  Thus, the largest element will be moved to the 
 end of the list while most others only move down one position.  The 
 result is an unsorted list with a length one less than the previous 
 list.  The list size is decreased and the sort repeated until the 
 list size is zero. 

 Consecutive sorts will preserve the effects of prior sorts. 

 )) 

 : BUBBLE ( --) 
     1 DATA.MAX# @ 1- 
     ?DO I 0 
         ?DO I DUP 1+ 
             0 data.buf 1 data.buf 
             2OVER 2OVER 
             ROT DATA@ SWAP DATA@ SORT? 0> 
                 IF EXCHANGE ELSE 2DROP THEN 
             LOOP 
     -1 +LOOP ; 

 (( 
 The SHELL SORT is similar to the bubble sort except that instead of 
 comparing each element with the next element it is compared with an 
 element further down the list.  However, it does not preserve the 
 effects of prior sorts when consecutive sorts are performed. 
 When the sort is started the gap between the numbers being compared 
 is set equal to half the length of the list.  When the routine can 
 go through the list without trading the gap is halved and the 
 routine repeated.  When the gap equals zero the sort is finished. 

 )) 

 VARIABLE gap ( --adr) 
 VARIABLE traded ( --adr)    \ flag to indicate if elements exchanged 

 : SHL ( --) 
     gap @ 0= IF EXIT THEN 
     gap @ 2/ gap ! 
     1 DATA.MAX# @ 1- gap @ - 
         ?DO traded OFF I 0 
             ?DO I DUP gap @ + 
                 0 data.buf 1 data.buf 
                 2OVER 2OVER 
                 ROT DATA@ SWAP DATA@ SORT? 0> 
                     IF EXCHANGE traded ON   ELSE 2DROP  THEN 
                 LOOP 
             traded @ 0= ?LEAVE 
         -1 +LOOP 
     RECURSE ; 

 : SHELL.SORT ( --) DATA.MAX# @ gap ! SHL ; 

 (( 
 The QWIK SORT is much faster than either the Bubble or Shell sorts. 
 Furthermore, it does not preserve the effects of prior sorts when 
 consecutive sorts are performed. 

 The middle element of the unsorted list is used to divide the list 
 into two piles.  The highest pile is in turn divided into two and 
 this process repeated until the top pile is a collection of sorted 
 elements.  Then attention is focused on the preceeding pile to sort 
 it.  This is repeated until all the "piles" have been sorted. 

 )) 

 : QSORT| ( l,r--)                   \ QWIK SORT by Dan Phillips 
     2DUP < NOT IF 2DROP EXIT THEN 
     2DUP + 2/ 0 data.buf SWAP DATA@ 
     2DUP 
         BEGIN SWAP 
             BEGIN 1 data.buf OVER DATA@ 
                   1 data.buf 0 data.buf SORT? 0< 
             WHILE 1+ 
             REPEAT SWAP 
                 BEGIN 1 data.buf OVER DATA@ 
                       1 data.buf 0 data.buf SORT? 0> 
                 WHILE 1- 
                 REPEAT 2DUP > NOT 
                     IF 2DUP EXCHANGE SWAP 1+ SWAP 1- 
                     THEN 2DUP > 
         UNTIL -ROT SWAP 
     RECURSE RECURSE ; 

 : QSORT ( --) 0 DATA.MAX# @ 1-  QSORT| ;           \ Part 3... 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/09/90)

 Date: 02-07-90 (17:50)              Number: 2872 (Echo)
   To: ALL                           Refer#: NONE
 From: ZAFAR ESSAK                     Read: (N/A)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 \ SORTS.FLY     Part 3 - Testing the Sort Methods 

 \ Test the cell sort 

 INCLUDE \FORTH\RANDOM.TXT   \ Random number generator 

 100 DATA.MAX# ! 

 CELL.SORT               \ Initialize Sort Function Variables 

 CREATE testdata ( --adr) DATA.MAX# @ DATA.WIDTH @ * ALLOT 

 testdata startdata ! 

 : printdata ( --) 
     CR DATA.MAX# @ 0 
         ?DO ENOUGH? IF LEAVE THEN I 2* testdata + @ . LOOP ; 

 : BTEST ( --) 
     DATA.MAX# @ 0 ?DO random I 2* testdata + ! LOOP 
     printdata CR 
     ." Go " CR 
     BUBBLE 
     printdata CR ; 

 : STEST ( --) 
     DATA.MAX# @ 0 ?DO random I 2* testdata + ! LOOP 
     printdata CR 
     ." Go " CR 
     SHELL.SORT 
     printdata CR ; 

 : QTEST ( --) 
     DATA.MAX# @ 0 ?DO random I 2* testdata + ! LOOP 
     printdata CR 
     ." Go " CR 
     QSORT 
     printdata CR ; 

 (( 
 Test times: 
             BTEST = 9.76 secs on Turbo XT 
             STEST = 5.16 secs on Turbo XT 
             QTEST = 2.68 secs on Turbo XT 

 )) 

 \ SORTS.FLY   The End. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/10/90)

 Date: 02-08-90 (09:25)              Number: 2874 (Echo)
   To: ZAFAR ESSAK                   Refer#: 2869
 From: IAN GREEN                       Read: 02-08-90 (09:25)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    Sorting is a bit more subtle than you seem to realize. In my comm 
 program, I use two sorts, one is the bubble sort where I sort the phone 
 book, and the other is a stack oriented partition sort.
    Granted the bubble sort is slow on unordered random data, data that 
 is already partially sorted can be sorted much faster with the bubble 
 sort because it is able to exploit the partial ordering. Stack oriented 
 sorts like the Quick Sort, cannot exploit this and thus are not all that
 suitable to partially ordered data like real data-bases.
    My phone book (in my comm program) is 500 entries, rather small 
 compared to major data-bases, but still representative. Since I modify 
 at most 2 or 3 entries, it makes sense to use the bubble sort since the 
 data is already sorted from the last modification. Because it is always 
 partially ordered at each modification, the logic of algorithm choice 
 becomes obvious.
    My apologies for not having a Forth example for you.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/10/90)

[I'm porting this message in case anyone on the c.l.f side of the world
 would like to have access to any of the files mentioned in this post.
 I can arrange to have them uploaded to simtel20 (takes approx. 1 week)
 if you drop me a note at one of the addresses at the end of this message.
 -Doug]

 Date: 02-08-90 (13:44)              Number: 2875 (Echo)
 From: JACK BROWN                      Read: 02-08-90 (09:25)
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 >I would also like to apologize at the outset for the lack of references
 >to the results of the Forth Sort contest that was held on GEnie.  I 
 >would appreciate a note from anyone who recalls the name(s) of the 
 >files that pertain to that contest.  I have looked over the various 
 >files related to sorts on the BCFB and found the descriptions of sorts 
 >by Norm Arnold in SORT_WIN.ZIP most helpful. 
 > 
 Zafar...  I did a  ZIPpy dir scan on BCFB  "  ZIP  SORT  "  and came up
 with the following references to sorting Forth.

 DVD&KNKR.ZIP     5527  10-17-89  Entry in the Forth Dimensions sort 
 contest
 FIGSORT.ZIP      4470  11-09-89  Sort contest entry from H. Hansen
 RADXSORT.ZIP     2853  10-22-89  DK Elvey's Forth Radix sort. Contest 
 entry.
 SORTCONT.ZIP    10814  07-24-89  Instructions for FIG's  SORT contest by
                                  Dennis Ruffer.

 SORTDEMO.ZIP    12928  01-04-88  Multi-Forth SortDemo&Gloss--Five Diff 
 Sorts
 4THSORT.ZIP      2361  12-14-87  Some FORTH sorts (fig-FORTH)
 BATCHER.ZIP      1961  12-13-87  Batcher's Sort from Forth Dim (F83)
 SORT_WIN.ZIP     9318  09-27-88  Norm Arnold's Contest Winning Sorts.
 VISISORT.ZIP     9318  10-04-88  Visible sort routines - bubble, shell, 
 kwik
 QSORT312.ZIP    40960  02-14-88  Good Sort Package for DOS (used by 
 WORDNDX)
 SORT-OUT.ZIP   156155  08-02-89  Henning Hansen's wonderful sorting 
                    demo.  Written in LMI Forth.  Includes source for
                    for sorts and windowing package.

 SORTLM.ZIP       2255  12-15-87  Adaption of Will Baden's Quicksort & 
 SWORDS
 SORTZ80.ZIP      3211  12-15-87  Compact/fast sort for Z80 by S. Vance
 SORTURF.ZIP      1844  03-15-88  Sort and execution arrays for UR/Forth
 QSORT322.ZIP    41522  10-01-89  Great quick sort utility

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/11/90)

 Date: 02-09-90 (07:29)              Number: 2876 (Echo)
   To: IAN GREEN                     Refer#: 2874
 From: STEVE PALINCSAR                 Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 How does the selection sort compare to the bubble sort in your 
 application?  In my (*very*) limited experience I found the selection 
 sort to be considerably faster (between 2 & 4 times faster) in sorting 
 80 character long strings quantity 100-250.

 Selection sort is also, according to Knuth, a stable method, preserving 
 the original order for identical keys.  It's also extremely easy to code
 and understand -- far more so than the bubble sort.
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/11/90)

 Date: 02-09-90 (16:04)              Number: 2877 (Echo)
   To: IAN GREEN                     Refer#: 2874
 From: ZAFAR ESSAK                     Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 Ian, 

     Your assumption that the tests I did were on random lists is 
 absolutely correct.  Your comments on partly ordered lists resulting in 
 admirable performances for the BUBBLE Sort got me to review the code. 
 The following contains a minor revision to exit as soon as no 
 exchanging of items is required. 

 \ The BUBBLE Sort 

 VARIABLE gap ( --adr) 
 VARIABLE traded ( --adr)    \ flag to indicate if elements exchanged 

 : pair.sort ( n1,n2--)            \ : pair.sort ( n1,n2--) 
     0 data.buf 1 data.buf         \   2DUP  1 data.buf SWAP DATA@ 
     2OVER 2OVER                   \         0 data.buf SWAP DATA@ 
     ROT DATA@ SWAP DATA@ SORT? 0> \   0 data.buf 1 data.buf SORT? 0> 
         IF EXCHANGE traded ON     \         IF EXCHANGE traded ON 
         ELSE 2DROP THEN ;         \         ELSE 2DROP THEN ; 

 : BUB ( --) 
     1 DATA.MAX# @ gap @ - 
         ?DO traded OFF I 0 
             ?DO I DUP gap @ + pair.sort LOOP 
             traded @ 0= ?LEAVE 
         -1 +LOOP ; 

 : BUBBLE ( --) 1 gap ! BUB ; 

 \ The SHELL Sort 

 : SHL ( --) 
     gap @ DUP 0= IF DROP EXIT THEN 
         2/ gap ! BUB RECURSE ; 

 : SHELL.SORT ( --) DATA.MAX# @ gap ! SHL ; 

 This provides for some better factoring as well as performance. 

 I have also gone through an early (1986) issue of Forth Dimensions and 
 have added the BATCHER'S Sort to the set.  Here it is. 

 (( 
 BATCHER'S SORT is slightly slower, but more robust than Qwik sort 
 and has consistent sorting times.  Depending on the input data, in 
 some cases Batcher's sort may be faster than Qwik sort. 

 Unfortunately, it too does not preserve the effects of prior sorts 
 when consecutive sorts are performed. 

 Batcher's sort consists of three nested loops.  The innermost loop 
 compares pairs of keys that are completely independent.  Thus 
 parallel computing could be applied for very fast sorting. 

 From Forth Dimensions Nov/Dec 86 p 39 by John Konopka 

 The names of the variables were chosen to be consistent with the 
 description of the algorithm given by Knuth. 

 )) 

 CREATE 2** ( --adr) 
        1 ,    2 ,    4 ,     8 ,    16 ,    32 ,    64 ,  128 , 
      256 ,  512 , 1024 ,  2048 ,  4096 ,  8192 , 16384 , 

 : 2**N ( n -- 2**n ) 2* 2** + @ ; 

 VARIABLE DD ( --adr)    VARIABLE NN ( --adr)    VARIABLE PP ( --adr) 
 VARIABLE QQ ( --adr)    VARIABLE RR ( --adr)    VARIABLE TT ( --adr) 

 : SELECT-T ( --) 
     NN @ 15 0 DO DUP I 2**N <= IF DROP I LEAVE THEN LOOP 
     1- 14 MIN TT ! ; 

 : INNER-LOOP ( --) 
     NN @ DD @ - 0 
         ?DO I PP @ AND RR @ = 
             IF I DUP DD @ + pair.sort 
             THEN LOOP ; 

 : Q-TEST ( --) 
     QQ @ PP @ <> 
         IF  QQ @ PP @ - DD ! 
             QQ @ 2/ QQ ! 
             PP @ RR ! 0 
         THEN ; 

 : QRD-SET ( --) TT @ 2**N QQ !  RR OFF  PP @ DD ! ; 

 : BATCHER ( --) 
     DATA.MAX# @ NN ! 
     SELECT-T TT @ 2**N PP ! 
     BEGIN QRD-SET QQ @ 
         BEGIN INNER-LOOP Q-TEST 
         UNTIL PP @ 2/ DUP PP ! 0= 
     UNTIL ; 

 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/12/90)

 Date: 02-10-90 (08:33)              Number: 2880 (Echo)
   To: ZAFAR ESSAK                   Refer#: 2877
 From: IAN GREEN                       Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    Even though I am not much of a Forther, I am still willing to help 
 Forth programmer's make the best of what they are doing.
    The added exit in the program will indeed improve the performance 
 dramatically, especially in partially ordered data. If you saw my 
 earlier discourse on sorts in the Modula-2 conference, you may have 
 noticed how the use of a guard could improve the bubble sort even more. 
 The reason is simple, if the zero-eth element of the array is set to the
 minimum value of the data type in question, then one of the two compares
 can be effectively eliminated by virtue of the fact that the bubbling 
 process is now guaranteed to terminate at the top of the array.
    Later I plan to post so additional sorts in the Modula-2 conference 
 so keep your eyes open.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/12/90)

 Date: 02-10-90 (08:39)              Number: 2881 (Echo)
   To: STEVE PALINCSAR               Refer#: 2876
 From: IAN GREEN                       Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    In the real world, the selection sort is so inefficient it isn't 
 funny. Real databases are massive. Consider Brokerage XYZ whose has an
 army of analysists working on finding the next LBO candidate.
    If we needed to extract from the database some information and sort 
 it so that it could be readily browsed and furthur parsed, huge amounts 
 of sorting and selecting is required. It therefor behooves you to use 
 the fastest algorithms available in order to get so 'performance' out of
 the query lest the client get impatient. Combined with insider 
 information it is then practicle to make hundreds of millions of 
 dollars, all tax free (at least under present rules). You don't need 
 lots of money, all you need is smart software and a few well placed 
 friends.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/13/90)

 Date: 02-11-90 (08:41)              Number: 2884 (Echo)
   To: IAN GREEN                     Refer#: NONE
 From: STEVE PALINCSAR                 Read: NO
 Subj: SORTING ALGORITHMS]           Status: PUBLIC MESSAGE

 As a matter of fact, even as august an authority as Knuth says that for 
 some sorts of data, the selection sort happens to be the best one 
 available.  But besides that, your assumption that "the real world" 
 consists only of massive databases simply takes my breath away. 

 As it happens, the application that I translated KNuth into Forth for is
 a real-world application for a task I do at work.  Every month I do a 
 search on a Library of Congress bibliographic database on their inhouse 
 SCORPIO system and I download a list of their reports added since the 
 previous issue.  (That's usually between 60 and 100 or so items.)  The 
 entries are downloaded in report number order.  My program strips off 
 the unnecessary material from each citation, and puts the cites in 
 author title order.

 In fact, I think the requirement to order the data came about because 
 one of the editors of the publication secretly wished to eliminate the 
 column, and thought that the amount of manual effort required to do so 
 would cause me to simply abandon the effort.  Ha!  Using this program it
 takes me about 30 seconds to process the data.  As it so happens, she's 
 taking over production of the column, and was astounded at how easy it 
 was...

 Incidentally, if you'll stop to consider the situation, you'll see that 
 the bubble sort is less efficient than the selection sort - so by your 
 own criteria you oughtn't be using it!  But as we both know, there are 
 times in the real world that you need to sort relatively small numbers 
 of items, and in such cases more complex algorithms may be even slower
 than the simple ones.  Even Knuth says so...  ;-)
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/16/90)

 Date: 02-13-90 (14:57)              Number: 2901 (Echo)
   To: ZAFAR ESSAK                   Refer#: 2898
 From: STEVE PALINCSAR                 Read: NO
 Subj: SELECTION SORT                Status: PUBLIC MESSAGE

 To quote from Knuth (The Art of Computer Programming, vol. 3, 
 Sorting and Searching), p. 139:
 .
 "5.2.3  Sorting by Selection
 .
 An important famly of sorting techniques is based on the idea 
 of repeated selection.  The simplest selection method is 
 perhaps the following:
     i) Find the smallest key; transfer the corresponding 
 record to the output area; then replace the key by the value 
 "" (which is assumed to be higher than any actual key).
     ii) Repeat step (i).  This time the second smallest key 
 will be selected, since the smallest key has been replaced by 
 .
     iii) Continue repeating step (i) until N records have 
 been selected.
 .
 The above method involves N - 1 comparisons...  There is an 
 obvious way to improve upon the situation, avoiding the use 
 of : we can take the selcted value and move it into its 
 proper position by exchanging it with the record currently 
 occupying that position.  Then we need not consider that 
 position again in future selections.  This idea yields our 
 first selection sorting algorithm."
 .
 [He goes on to describe Algorithm S (Straight selection sort) 
 which I won't quote.]  He goes on, on p. 141, to  say:
 .
 "It is interesting to compare Algorithm S to the bubble 
 sort...  Bubble sorting does less comparisons than straight 
 selection and it may seem to be preferable; but in fact 
 Program 5.2.2B [the bubble sort] is more than twice as slow 
 as Progam S!  Bubble sorting is handicapped by the fact that 
 it does so many exchanges, while straight selection involves 
 very little data movement."
 .
 To quote from Robert Sedgewick's _Algorithms_ (p. 95):  
 "Despite its simplicity, selection sort has quite an 
 important application: it is the method of choice for sorting 
 files with very large recoords and small keys."  Here'ss his 
 code (inn Pascal):
    procedure selection;
       var i,j,min,t: integer;
       begin
       for i:=1 to N do
          begin
          min:=i;
          for j:=1+1 to N do
             if a[j]<a[min] then min:=j;
          t:=a[min]; a[min]:=a[i]; a[i]:=t
          end;
       end;
 .
 Sedgewick describes the algorithm as follows:  "find the smallest
 element in the array and exchange it with the element in the first
 position, then find the second smallest element and exchange it with the 
 element in the second position, continuing in this way until the entire
 array is sorted. ...This is among the simplest of sorting methods, and
 it will work very well for small files.  Its running time is
 proportional  to N2.  ...Still selection sort is quite attractive for
 sorting (say) a thousand 1000-word records on one-word keys."
 (p. 95 for all the Sedgewick quotes & code)
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/19/90)

 Date: 02-17-90 (13:52)              Number: 2918 (Echo)
   To: IAN GREEN                     Refer#: 2881
 From: DON MADSON                      Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

 Except for a list that's already sorted the selection sort
 performs better than the bubble sort.

 The bubble sort (with a flag) only performs well on an ordered
 list. If the list is already ordered why sort it?  When a new
 record is added insert it where it belongs and the list is still
 sorted. If a record is deleted close the gap.

 I suppose one option is to include every possible sort in your
 program and before sorting your database perform an analysis of
 that data to determine which sorting procedure to call. Or ...
 just include one that performs well most of the time.

 There is a good chapter on sorting in Wirth's book "Algorithms +
 Data Structures = Programs."
 ---
  ~ EZ-Reader 1.14 ~ ><><><><><><><><><><><><><><<>><><>
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/22/90)

 Date: 02-19-90 (10:10)              Number: 2933 (Echo)
   To: DON MADSON                    Refer#: 2918
 From: IAN GREEN                       Read: NO
 Subj: LEARN TO SORT                 Status: PUBLIC MESSAGE

    Well, in my application where I used the bubble sort, I decided to 
 use that algorithm because of the data-structure I chose. It is simply 
 an array of records. Now since it is an array (to be sure a simple data 
 structure), insertion and deletion is trivial; to insert just add a 
 record to the first 'blank' entry, to delete, change an entry to 
 'blank'. Then after editing is completed the bubble sort quickly 
 'cleans' up the array as the bulk of it is still ordered. The blank 
 entry is designed to represent the highest possible key value so it 
 naturally moves to the end of the list where it belongs. Simple and 
 efficient.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/22/90)

 Date: 02-19-90 (22:24)              Number: 2934 (Echo)
   To: STEVE PALINCSAR               Refer#: NONE
 From: ZAFAR ESSAK                     Read: NO
 Subj: SELECTION SORT                Status: PUBLIC MESSAGE

 Thanks for the descriptions. 

 I have also been looking at the SORT_OUT.ZIP that contains the program 
 by Henning Hansen and Niels Jergen Christensen.  That is an awesome 
 collection of sorts complete with an executable program that 
 demonstrates a large number of sorts incluing a couple of their own. 
 Unfortunately, for novices of the sort like me, there are no 
 descriptive narratives of the different sorts. 

 The list of sorts contained in that file are: 

 \ Bubblesort 

 \ Shakersort 
 \ Shakersort with a flag 
 \ Shakersort with interval reduction 

 \ Shuttlesort - Sifting 

 \ Straight Insertionsort 
 \ Insertionsort, assymmetric search 
 \ Insertionsort with bisection search 

 \ Selectionsort 
 \ Selectionsort, stable 

 \ Stacksort (Selectionsort w. stack) 
 \ Stacksort, stable 

 \ Heapsort - Treesort 

 \ Shellsort, dim. incr. insertion 
 \ Shellsort with stack-seletion 

 \ Splicesort 
 \ Splicesort with insertion 
 \ Splicesort with selection 

 \ Batchersort 

 \ Mergesort, binary subdivision 
 \ Mergesort, natural 

 \ Quicksort, first pivot element 
 \ Quicksort, random pivot element 
 \ Quicksort, middle pivot element 
 \ Quicksort, median-of-three pivot 
 \ Quicksort-partition + other sort 

 It will be awhile before I have enough time to explore this variety of 
 sorts. 
 ---
  * Via Qwikmail 2.01

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/25/90)

 Date: 02-23-90 (06:44)              Number: 2950 (Echo)
   To: BILL MCCARTHY                 Refer#: 2943
 From: STEVE PALINCSAR                 Read: 02-23-90 (18:12)
 Subj: SELECTION SORT                Status: PUBLIC MESSAGE

 No, adding a few unordered records wasn't my application.  My stuff 
 comes down in item number order, which corresponds roughly to when the 
 records were created -- actually, when the bibliographers & analysts 
 applied for the number, I think, not when they got the job finished! -- 
 and I have to rearrange them into author title order.  Since the 
 selection sort handles my data sets in under 2 seconds, it's hard to see
 how _any_ improvement in sort methodology would produce a meaningful
 improvement in run time!

 OTOH, the insertion sort also got a very good write-up in both Knuth &
 Sedgewick, and I do intend one of these days to take a look at it.

 BTW, nice to hear from you again!  How's it going?
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/27/90)

 Date: 02-25-90 (09:15)              Number: 2965 (Echo)
   To: IAN GREEN                     Refer#: NONE
 From: STEVE PALINCSAR                 Read: NO
 Subj: DO LOOPS IN FORTH             Status: PUBLIC MESSAGE

 Ian, you're probably looking for something like this:
 : LOOP_DEMONSTRATION  ( n2 n1 -- )
 \ That is, both the inner and outer loop limits are on the stack
 \ before this word executes.  You could put them there by fetching
 \ from a couple of variables, or whatever other means you like.
    1 DO        ( Takes n1 from param stack, puts in on Rstack )
       1 DO     ( Takes n2 from param stack, puts it on Rstack )
           <SOMETHING>  ( whatever you want to execute )
       LOOP
    LOOP  ;

 The "FOR-NEXT" construct in Forth is written as DO - LOOP.  
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (02/28/90)

 Date: 02-26-90 (22:34)              Number: 2971 (Echo)
   To: STEVE PALINCSAR               Refer#: NONE
 From: BILL MCCARTHY                   Read: NO
 Subj: SHELL SORT                    Status: PUBLIC MESSAGE

 I uploaded HS-SHELL.ZIP which contains both a selection
 and shell sort in HS/Forth.

 On a 16 Mhz 386, I got the following timings (seconds)
 sorting random 40 character strings.

 Items   Selection  Shell

  100        .44      .11
  200       1.58      .21
  300       3.61      .43
  400       6.37      .65
  500       9.98      .82

 You mentioned trying insertion sort.  That method is
 awful except when adding to a sorted set of when the
 elements of the set are all near their final positions
 (such as a cleanup to a minimum partition quicksort).
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (03/06/90)

Category 3,  Topic 37
Message 31        Sun Mar 04, 1990
F.SERGEANT [Frank]           at 21:52 CST
 
 To Steve & Ian,

SP> : LOOP_DEMONSTRATION  ( n2 n1 -- ) SP> \ That is, both the inner and outer
loop limits are on the stack SP> \ before this word executes.  You could put
them there by fetching SP> \ from a couple of variables, or whatever other
means you like. SP>    1 DO        ( Takes n1 from param stack, puts in on
Rstack ) SP>       1 DO     ( Takes n2 from param stack, puts it on Rstack )
SP>           <SOMETHING>  ( whatever you want to execute ) SP>       LOOP SP>
LOOP  ;

  Nope, the above will not work (unless n1 is 1).  The 2nd & subsequent 
passes thru the outer loop will have no limit available on the stack  for the
inner loop.  You need a DUP after the 1st DO and a DROP after  the last LOOP.

 Also, I keep seeing these "1"s as beginning indexes for your loops in  these
messages.  Presumably you want to say something like  10 STARS  and have your
word STARS print ten of 'em.  If you define STARS with a  built-in index of 1,
you'd have to say 11 STARS to get 10 of 'em.  

   : STARS  ( #-of-stars-desired - )   0  DO   STAR   LOOP  ;
   or
   : STARS  ( 1-more-than-#-of-stars-desired  - )  1  DO STAR LOOP ;

 On the other hand, you might want to use a built-in index of 1 when  you plan
to use the index inside of the loop, eg for addressing a one  based array.  I
would think the more common situation would be to use a  zero based array, in
which case you would still use a starting index   of 1.  

 Any questions?  (By the way, I don't use DO LOOP anymore, just FOR  NEXT.)

  -- Frank
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (03/08/90)

 Date: 03-06-90 (18:12)              Number: 3005 (Echo)
   To: FRANK SERGEANT                Refer#: NONE
 From: IAN GREEN                       Read: NO
 Subj: ALL SORTS OF SORTS            Status: PUBLIC MESSAGE

    As you explained it, I am a bit confused. In Modula-2, I can use what
 is called a subrange to index arrays, consequentky I often use indicies 
 whick are more meaningful, such as the following:

    Sales: ARRAY [1970..1990] OF REAL;

 If I pass that array to what is called an open array procedure, 1970 
 becomes 0 (the arrays internally are zero indexed). Consequently I am 
 oriented to zero-based arrays, simply because I make such extensive use 
 of open-array parameters.

 Now you mentioned that you use FOR-NEXT over DO-LOOP so could you 
 illustrate examples of nested loops that use the control variables 
 inside, say in a shell-sort of an array.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (03/12/90)

Category 3,  Topic 37
Message 33        Sun Mar 11, 1990
F.SERGEANT [Frank]           at 10:41 CST
 
 Ian, you asked about nested FOR-NEXT loops.  Suppose we have a two 
dimensional array of quarterly net profit figures for the years 1970 thru 
1975.  Suppose further that we made very small profits (so they will  fit in
16-bits).  We'll hard-code the values to start with.

 CREATE  PROFITS
 ( year   1stQ   2ndQ   3rdQ  4thQ  )
 ( 1970)  10 ,   5 ,    3 ,   7 ,     ( each entry takes 2 bytes)
 ( 1971)   8 ,   9 ,   -3 ,  -4 ,     ( each year takes 8 bytes)
 ( 1972)   2 ,   3 ,    4 ,   8 ,
 ( 1973)   6 ,   6 ,    6 ,   6 , 
 ( 1974)   5 ,   3 ,    3 ,  -7 ,
 ( 1975)  12 ,  11 ,    9 ,  10 ,

 We can address the array various ways.  Suppose we want to say
    3rdQ 1972 PROFIT
 and thereby place the number 4 on the stack.  We could do it with the 
following definitions:

    1 CONSTANT 1stQ
    2 CONSTANT 2ndQ
    3 CONSTANT 3rdQ
    4 CONSTANT 4thQ 

    : PROFIT  ( quarter  year  -  n)
      1970 -  8 *  ( quarter  offset-to-start-of-the-year)
      1- SWAP 2* +    ( offset-to-quarter)
      PROFITS      ( offset-to-quarter  starting-addr-of-the-array)
      +  @    ;    ( profit-for-the-quarter)

  In memory, if you were to dump the contents of the array, on a PC  with
Intel-like byte order, you would see

  HEX PROFITS  DUMP  DUMP  DUMP    ( one way to do it in Pygmy)
 XXXX  0A 00  05 00  03 00  07 00    08 00 09 00 FD FF FC FF
 XXXX  02 00  03 00  04 00  08 00    06 00 06 00 06 00 06 00
 XXXX  05 00  03 00  03 00  F9 FF    0C 00 0B 00 09 00 0A 00

 (The XXXXs stand for the actual addresses where the array is stored.)

  Now let's step thru the entire array using FOR-NEXT loops.

  : PRINT-PROFITS  ( -)   
    CR  ." Quarterly Profits for 6 Years"  CR CR
    1970  ( starting-year)
    5 FOR ( we'll do this outer loop 6 times)
      1   ( starting-quarter)
      3 FOR ( we'll do this inner loop 4 times)
          ( year  quarter)   2DUP SWAP  ( y q q y)
          PROFIT  ( y q profit)  .  ( y q )
          ( year  old-quarter)  1+  ( year  new-quarter) 
      NEXT  ( old-year  old-quarter)  
      DROP  ( old-year)  1+  ( new-year)
      CR  ( so each year will get a separate line)
    NEXT
    DROP   (   )  ;

   Notice that I have used nested loops, but I haven't used the built  in
"control variables" named 'I' and perhaps 'J'.  But, I have used  control
variables that are kept on the stack.  

   The above should work in Pygmy ver 1.2 and with most Forths that  have FOR-
NEXT.  My latest thinking is that I prefer 5 FOR to do it 5  times, not 6, so
I have changed my copy of Pygmy to do it the new way.   Robert Berkey thinks
we should call the new word ?FOR.

   Pygmy has the word 'I' that returns the value of the built-in index  in the
current loop.  This index counts down.

    : TST1  5 FOR  I .  NEXT  ;   TST1  would print  5  4  3  2  1  0
    : TST2  5 FOR  I .  NEXT  ;   TST2  would print  4  3  2  1  0

    Also you can't directly get to the built-in index of outer loops  from
within inner loops (but, you can get it from the outer loop while  just in the
outer loop and save it on the stack for later use by the  inner loop).  This
facility could be built into Pygmy (named 'J' 'K'  etc) just as it is built in
to some  DO-LOOP implementations.  Because  of the downward counting nature,
you might not always want to use 'I'  in a FOR-NEXT loop.  

     Here is a re-write using DO-LOOP where J accesses the outer loop's 
index:

  : PRINT-PROFITS2 ( -)  
    CR  ." Quarterly Profits for 6 Years"  CR CR
    1976 1970 DO ( we'll do this outer loop 6 times)
      5 1 DO ( we'll do this inner loop 4 times)
          I J PROFIT  ( profit)  .  (   )
      LOOP  
      CR  ( so each year will get a separate line)
    LOOP   (   )  ;

     I've presented all of this in hopes of making it clear how it all works
rather than to convince you that FOR-NEXT is better than DO-LOOP. Even I have
to admit that the example with DO-LOOP looks more straight  forward.  I just
immediately switched to FOR-NEXT loops and never use  DO-LOOP loops any more. 
I'm pointing this out as a curiosity as I  don't fully understand why I've
done it, but am content with the  situation.  Naturally I haven't tested any
of the above examples.
  -- Frank
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'

ForthNet@willett.UUCP (ForthNet articles from GEnie) (03/14/90)

 Date: 03-12-90 (10:12)              Number: 3020 (Echo)
   To: FRANK SERGEANT                Refer#: 3017
 From: IAN GREEN                       Read: NO
 Subj: ALL SORTS OF SORTS            Status: PUBLIC MESSAGE

    Well, you're latest examples have clarified one aspect of Forth; it 
 lacks standardization of even the most basic control structure. To this 
 end it appears as though translation of existing code may not be too 
 easy to achieve as the looping control structures do not behave in a 
 regular and predictable mannor. Oh well.

 Ian Green

 NET/Mail : British Columbia Forth Board - Burnaby BC - (604)434-5886   
-----
This message came from GEnie via willett through a semi-automated process.
Report problems to: 'uunet!willett!dwp' or 'willett!dwp@gateway.sei.cmu.edu'