[comp.lang.modula2] Sorts

BOTCHAIR@UOGUELPH.BITNET (Alex Bewley) (09/10/89)

    Can anybody out there send me any kind of sorting procedure.  I need one
  badly for a project I am working on.  All the books on sorting always seem
  to be out.

      Alex Bewley
      BOTCHAIR@UOGUELPH

MARKV@UKANVAX.BITNET ("MARK GOODERUM - UNIV. OF KANSAS ACS - MARKV@UKANVAX") (09/10/89)

Hmm, sorting can be very implementaion dependant.  Are you sorting numbers,
strings, etc.  Are they a linked list, an array, etc.  Are your sorting
keys unique or not, etc.  You get the idea.

However, give me one or more details and I might dig out a procedure or
algorithm around.  I have a couple of Quicksort routines lying around that
aren't too bad and will give you much better performance than the old
quick and dirty bubble sorts.  (Of course if your sorting LARGE amounts of
data I have a nice Heapsort routine I did last semester.)

Adios,

Mark Gooderum
MARKV@UKANVAX
markv@kuhub.cc.ukans.edu

ok@cs.mu.oz.au (Richard O'Keefe) (09/10/89)

In article <INFO-M2%89091001300983@UCF1VM>, MARKV@UKANVAX.BITNET ("MARK GOODERUM - UNIV. OF KANSAS ACS - MARKV@UKANVAX") writes:
> I have a couple of Quicksort routines lying around that
> aren't too bad and will give you much better performance than the old
> quick and dirty bubble sorts.

*WARNING*  If your comparisons are costly (e.g. comparing strings, or
almost _anything_ more than a single in-line numeric comparison),
Quicksort was known to do 40% more comparisons than Merge sort _when_
_it_ _was_ _invented_.  It was invented for a machine whose main memory
was something like 512 words (its disc was a massive 16k words), so that
minimal extra space was vital.  If you have room for one extra word per
record (and if you are using variable length records you are almost
certainly using a link word of some sort anyway) then the space advantage
vanishes.  Even with the median-of-three hack, the average case of
"Quick" sort is slower than the *worst* case of merge sort.  If you are
sorting numbers, radix sort is _grossly_ faster than "quick" sort.

I have posted some sorting routines in C to the original requester.
(I accidentally posted them to MARKV too, sorry 'bout that.)

Bubble sort?  Surely nobody has ever used bubble sort except as an
example in a textbook of how badly it is possible to do it.

heilpern@VAX1.ACS.UDEL.EDU (HEILPERN) (09/12/89)

As I am only in a first level modula course, I can't yet describe the
specifics, but here is the simple method of performing a bubble sort,
not the best sort in the world, but effective, especially as a learning
tool...

        for (I = 0; I < NUMITEMS-1; I++)
          for (J = I+1; J < NUMITEMS; J++)
            if (item[J] 'goesbefore' item[I] 'swapthem')

This is written similar to the C construct. I++ means 'I:=I+1', or INC(I).
Let me know if this is too vauge...
        Mark A. Heilpern
        --M.
        respond to heilpern@brl rather than as the mailer says!!!

lins@Apple.COM (Chuck Lins) (09/12/89)

I have a highly optimize d quicksort implementation that will appear in
"The Modula-2 Software Component Library" Volume 4. As long as you are sorting
arrays, it is independent of the data being sorted (since the caller provides
the routines necessary for comparison and swapping of items).

Link me if your're interested.

Chuck Lins


-- 
Chuck Lins                      | "Exit right to funway."
Internet:   lins@apple.com      | AppleLink: LINS
Snail Mail: 20525 Mariani Ave. M/S: 41-K Cupertino, CA 95014
"I like the future - I'm in it." - Firesign Theater

BOTCHAIR@UOGUELPH.BITNET (Alex Bewley) (09/12/89)

    Bubble sort is a little bit too slow for my application.  I need to sort
  a linked list in memory.  The algorithm I want must not use or
  require a FIXED array to sort, as the list is dynamic.
    Lists are not like arrays either, if you go swapping elements in the list
  pointers become mixed up and you end up trying to sort the interrupt table
  or screen (which isnt't very good).
    For the time being I have to use a fixed ARRAY OF POINTERS and then sort
  the array using a comparison routine that checks the information
  that the pointers are pointing to, which suprisingly works, but strikes me
  as being a rather ridiculous way to sort the list.  Besides, the array
  limits the amount of information I can store, which I don't want to do.

      Alex
      Just this guy...

MARKV@UKANVAX.BITNET ("MARK GOODERUM - UNIV. OF KANSAS ACS - MARKV@UKANVAX") (09/12/89)

>Bubble sort? Surely nobody has ever used bubble sort ...

Except maybe a non-computerese technical person hacking out quick programs.
My poor friend the ME major couldn't figure out why the simple little program
he wrote to sort 10000 or so lines of a parts list by differnet numbers took
so long... Oh well.  "The book showed it so it must be okay..."

But what sort to use as we all (should) know depends on many factors, which
is why answering the original question in anything mor than a vague reply
was difficult.  My personal favorite is Heapsort or Binary Insertion (what I
think you refer to as Merge) sorting.

Cordially,

Mark Gooderum
MARKV@UKANVAX

ODX@PSUVM.BITNET (Tim Larson) (09/12/89)

In article <INFO-M2%89091120313809@UCF1VM>, BOTCHAIR@UOGUELPH.BITNET (Alex
Bewley) says:
>
>    Bubble sort is a little bit too slow for my application.  I need to sort
>  a linked list in memory.  The algorithm I want must not use or
>  require a FIXED array to sort, as the list is dynamic.

Still not quite enough information to tell, but you may not need to sort at
all if it's a linked list!  That is, as you load each element in the list,
simply link it in in-order.  When the last element is added, the list is
sorted already.  I used this technique once in school, when a grade depended
on it (we were graded partially on how fast our program was in comparison to
the rest of the class).  In that case, my program outperformed all the
programs that performed sorts.  If the list is long, and you can afford a bit
more space, this can be made more efficient by using two links and a tree
structure.  I don't know if this applies to your situation, but maybe there
is a germ of an idea you can use here.

-Tim Larson
odx@psuvm.bitnet

GRANGERG@VTVM1.BITNET (Greg Granger) (09/12/89)

AB>From: Alex Bewley <BOTCHAIR@UOGUELPH>
AB>Bubble sort is a little bit too slow for my application.  I need to
AB>...
AB>pointers become mixed up and you end up trying to sort the interrupt
AB>...
AB>which I don't want to do.

Well, I was having the same problem a few months back.  I was using
TopSpeed Modula-2 to read a directory (queue the filenames of a dir.)
and wanted to sort the results.  TopSpeed provides a very useful
QuickSort and HeapSort with their 'standard' libraries, but these
routines quite reasonably expected a indexed list.  (Note that the
TopSpeed canned sorts require as arguments the list size, a Less than
procedure and a Swap procedure, therefore the use of ARRAYs is not
mandated, this is not true with all such canned sorts)
In any case, what I did was write a module that allowed GETs, PUTs
and SWAPs on a heap based indexed list.  This allows dynamic array
handling, which solved my problem.  I have been reworking this module
lately to include, unlike types (not `true' arrays), dropping
elements, deinit'ing (kill the array and recovering all heap space
used), etc... I would be happy to share my source code with you
with the following understanding:
(A) this is a module being actively tested (deinit'ing is at this
    time not functional)
(B) it's written in TopSpeed Modula-2 with absolutely no attention
    given to keeping with the Wirth defined standard (ie if TS gave
    me a special non-standard function that made things easier I
    used it).
(C) Please don't pass the source code around, while I believe that
    it will help you, it is not finished to the point that I would
    like to have it floating around with my name on it.
-----
To interested other readers.
  If other are interested in this code, let me know and I will
consider releasing it to the list when I have finished testing
and debugging.

Greg  (GRANGERG @ VTVM1)

lins@Apple.COM (Chuck Lins) (09/12/89)

In article <INFO-M2%89091120313809@UCF1VM> Modula2 List <INFO-M2%UCF1VM.bitnet@jade.berkeley.edu> writes:
>
>    Bubble sort is a little bit too slow for my application.  I need to sort
>  a linked list in memory.  The algorithm I want must not use or
>  require a FIXED array to sort, as the list is dynamic.
>    Lists are not like arrays either, if you go swapping elements in the list
>  pointers become mixed up and you end up trying to sort the interrupt table
>  or screen (which isnt't very good).
>    For the time being I have to use a fixed ARRAY OF POINTERS and then sort
>  the array using a comparison routine that checks the information
>  that the pointers are pointing to, which suprisingly works, but strikes me
>  as being a rather ridiculous way to sort the list.  Besides, the array
>  limits the amount of information I can store, which I don't want to do.
>

Alex,

There are several alternative approaches you can take:
1. use quicksort for lists. Originally developed by Dalia Motzkin, in an
article that appeared in Communications of the ACM I think in the mid-70's. A
Pascal implementation appears in G.H. Gonnet's book "Handbook of Algorithms
and Data Structures", Addison-Wesley, 1983.

2. Use a dynamically allocated array of pointers instead of a fixed size one.
An outline of how this could be done follows:

MDOULE SortPointers;

TYPE ptrData = POINTER TO Data;
TYPE arrayOfPtrs = ARRAY [0..100000] OF ptrData;
     (* or some other very large number for the maximum index *)

VAR  myArray : POINTER TO arrayOfPtrs;

BEGIN
  (* allocate an array of the size you need *)
  Allocate(myArray, SIZE(ptrData) * "number of pointers in the array");

  (* fill the array with the pointers *)

  (* sort it *)
  (* do whatever else you need to do with the sorted data *)

  (* get rid of the array of pointers when done *)
  Deallocate(myArray, SIZE(ptrData) * "number of pointers");
END SortPointers.

The only limitation depends on the compiler - will it allow a very large index
or does it limit the data space used for arrays (e.g., 32/64K limits on arrays
that appear in microcomputer Modula and Pascal compilers).

-- 
Chuck Lins                      | "Exit right to funway."
Internet:   lins@apple.com      | AppleLink: LINS
Snail Mail: 20525 Mariani Ave. M/S: 41-K Cupertino, CA 95014
"I like the future - I'm in it." - Firesign Theater

lins@Apple.COM (Chuck Lins) (09/12/89)

Binary Insertion sort and Merge sort are _NOT_ the same thing. They are
different algorithms. Wirth shows both algorithms in "Algorithms and Data
Structures". Merge sort requires 2N space (e.g., to sort 1000 element array
you need a 2000 element array). Binary Insertsion sort is an in-pace sort
needing only a single temporary.

HeapSort is a good algorithm that guarantees sorting in O(n log n) time
regardless of the input. Though, "most" of the time Quicksort will outperform
it. Quicksort's main problem is that is has a worst-case asymptotic complexity
of O(n**2) which typically occurs when the array is already sorted in either
ascending or descending sequence. Optimizations can reduce the chances of poor
performance in the worst case but cannot eliminate it.


-- 
Chuck Lins                      | "Exit right to funway."
Internet:   lins@apple.com      | AppleLink: LINS
Snail Mail: 20525 Mariani Ave. M/S: 41-K Cupertino, CA 95014
"I like the future - I'm in it." - Firesign Theater

ok@cs.mu.oz.au (Richard O'Keefe) (09/13/89)

In article <34666@apple.Apple.COM>, lins@Apple.COM (Chuck Lins) writes:
> Binary Insertion sort and Merge sort are _NOT_ the same thing. They are
> different algorithms. Wirth shows both algorithms in "Algorithms and Data
> Structures". Merge sort requires 2N space (e.g., to sort 1000 element array
> you need a 2000 element array). Binary Insertsion sort is an in-pace sort
> needing only a single temporary.

It is certainly true that Binary Insertion sort and Merge sort are not the
same.  It is seriously misleading to say that Merge sort requires 2N space.
The truth of the matter is that Merge sort requires N (pointers or integers)
*in addition to the data being sorted*, so that if you are sorting records
that contain 100N bytes Merge sort needs 102N bytes total.  Even more to
the point: if you want to sort a *LIST* then the pointer fields Merge sort
needs are already there, and Merge sort doesn't need any more.  (It does
need a stack with O(lg N) pointers, but so does "Quick" sort.)  If you have
N dynamically allocated objects, you are already paying N pointers to refer
to them; if you put those pointers in the objects themselves and link them
into a list, you (a) won't have trouble with fixed-size arrays of pointers
and (b) will be able to use Merge sort with no extra overheads.

Binary Insertion sort does O(N**2) moves, which means that it is seldom a
good idea.  Merge sort costs O(N.lg N) for moves/links as well as comparisons.

diamond@csl.sony.co.jp (Norman Diamond) (09/13/89)

In article <INFO-M2%89091000074411@UCF1VM> Modula2 List <INFO-M2%UCF1VM.bitnet@jade.berkeley.edu> writes:

>    Can anybody out there send me any kind of sorting procedure.  I need one
>  badly for a project I am working on.  All the books on sorting always seem
>  to be out.

If the library's copies are always checked out then buy one.  They're
too big to fit through the wires going to your terminal.  Anyway, you
might take a course some day in algorithms or databases.

--
-- 
Norman Diamond, Sony Corporation (diamond@ws.sony.junet)
  The above opinions are inherited by your machine's init process (pid 1),
  after being disowned and orphaned.  However, if you see this at Waterloo or
  Anterior, then their administrators must have approved of these opinions.

MARKV@UKANVAX ("MARK GOODERUM - UNIV. OF KANSAS ACS - MARKV@UKANVAX") (09/15/89)

Good points.  Here are some quick notes I looked up last night to add to
this discussion.

Quick sort has a worse case time of O(n^2) but an average case of
O(n log n).  Heapsort is both worst and average case O(n log n) times.
Quicksorts average times are usually a little better than Heapsort.

I should be noted that if often times if you take a set and randomly
distribute the order the elements, then you get a random distribution
that Quicksort can do in n log n quicker than most algorithms.

It should be noted that ANY algorithm for sorting an arbitrary set using
only s LessThan comparison can not be quicker than n log n time.

If something special is known about the keys, like they are all unique,
then times can be improved.  If the keys are all unique and can be reduced
to a contiguous, ordered sequence, then you can sort in O(n) time.

For now,

Mark