[comp.lang.c] Sorting Algorithm

wayned@wddami.spoami.com (Wayne Diener) (08/17/90)

[ I posted this to alt.sources and received a suggestion that
  I repost to comp.lang.c.  If you have trouble mailing to me
  you might try:

  To: isc-br!hawk!wddami!wayned@uunet.uu.net
]

I have been unable to find any references to the sorting algorithm
used in this program.  If anyone has seen it elsewhere, I would 
appreciate hearing about it.  None of my "guru" programmer friends,
my professors (I'm a hardware type who went back to college to learn
a little software) or a literature search have turned up an equivalent
so I'm posting it here to see if any one already knows of it.  If it's
original with me, I claim a copywrite on it and release it for use
by anyone.  I'm certain it can be improved - feel free to do so, but
send me a hard copy via US mail, OK?

I have done empirical comparisons of this sort against bubble,
insertion, selection and queuemerge sort.  It's a lot faster than
the first three but slower than the queuemerge sort, however, it
does the sort "in-place" (requires very little additional memory,
other than a few more variables).  I'm not really terribly proficient
at "Big O" analysis, but it LOOKS like it might be O(N*log(N))
instead of O(N^2).  Anyone want to analyse it?  

I 'trimmed' this file from the original form (massive documentation
for class requirements) to include only what I think is really 
useful.  I don't think I cut out anything important, but I can't
be 100% sure since I haven't re-compiled.  The sorting algorithm itself 
should be okay.  I compiled and ran the program using Turbo C, 
cc on a Sun 386i and under Ultrix, so it should work in most environments. 

The sorting is accomplished using a binary search of a linked list
to find the the proper insertion point (just running up and down
the pointers and dividing the list in half each time) and then 
moving the pointers to change an item's location.

Have fun, and not too many flames, OK? (Remember this was a class
assignment (for string manipulation actually) and I had to demonstrate
some concepts other than just the sorting.)


X-----------------------  CUT  ----------------------------------------X
/***********************************************************************
   A "Binary Insertion Sorting Technique for Linked-Lists"
   Wayne D. Diener
   South 3415 Myrtle
   Spokane, WA  99223
   (509) 535-4670

	This program reads a file of input words (as you might type
	them in with any programming editor), prints the word count
	and the list of words, sorts the list and then prints the
	list again.

    (Oh, yea...  Bi_In_Sort() is the actual function)


   *inp         FILE          Used to read the characters from the input
                              file.
   ch           char          Used for input to "hold" the screen long
                              enough to see the results.
   *mnlst       main_list    A pointer to the header for the list.

************************************************************************/

#include <stdio.h>
#include <fcntl.h>
typedef char data;

typedef struct node              /* Each node contains a character. */
{
   data character;
   struct node *next_letter;
   struct node *prev_letter;
} node;

typedef struct header            /* Each header starts a word. */
{
   int letter_count;
   struct node *word_head;
   struct node *word_tail;
   struct header *next_word;
   struct header *prev_word;
} header;

typedef struct main_list         /* This is the main list header. */
{
   int word_count;
   struct header *head_word;
   struct header *tail_word;
} main_list;


main(argc,argv)

   int argc;
   char *argv[];

{
   FILE       *inp,*fopen();
   int        ch;
   main_list  *mnlst;
   void       read_list();
   void       print_list();
   void       erase_list();
   void       Bi_In_Sort();
   int        compare_words();

   if (argc != 2)
   {
      printf("Error.  Useage:  %s filename",argv[0]);
      exit(1);
   }

   if ((inp = fopen(argv[1],"r")) == NULL)
   {
      printf("Could not open %s for input.",argv[1]);
      exit(1);
   }
   mnlst = (main_list *) malloc (sizeof(main_list));

   read_list(mnlst,inp);  /* Read the words into a list. */

   fclose(inp);     /* Close the input file. */

   printf(" Word count = %d\n",mnlst->word_count);

   printf("\n The input word list printed forward is:\n\n");
   print_list(mnlst->tail_word,0);  /*It's recursive so start at wrong end*/

   Bi_In_Sort(mnlst,compare_words);              /* Sort the list. */

   printf("\n The sorted word list is:\n\n");
   print_list(mnlst->tail_word,0);
   printf("\n\n\n Press return to continue.\n");
   scanf("%c",&ch);      /* Leave the screen up until a <cr> */
   erase_list(mnlst);    /* Clean up the memory. */

}

void
print_word(ptr)
   header *ptr;
/***********************************************************************
   print_word() - accepts a pointer to the header of a word it prints
   out all the characters contained in the list at that node and then
   prints a space character.

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   *p           node           Points at the characters to print.
   i            int            A loop control variable that counts letters.
***********************************************************************/
{
   node *p = ptr->word_head;
   int i = ptr->letter_count;
   while (i-- != 0)
   {

      printf("%c",p->character);
      p = p->prev_letter;
   }
   printf("%c",' ');     /* Put a space after it. */
}

void
print_list(point,dir)
   header *point;
   int   dir;
/***********************************************************************
   print_list() - accepts a pointer to one of the word nodes on the list
   and a variable that determines which direction to print (forward or
   reverse).  It then traverses the list of words recursively and prints
   the word contained at each node.

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   none
***********************************************************************/
                         /* If dir = 0 we'll print the list normally. */
                         /* If dir != 0 we'll print backward. */
{                        /* It works either direction. */
   void Print_word();

   if((dir ? point->prev_word : point->next_word) != NULL)
      print_list((dir ? point->prev_word : point->next_word),dir);
   print_word(point);
}

void
erase_list(list)
   main_list *list;
/***********************************************************************
   erase_list() -  accepts a pointer to the main word list.  It traverses
   the list erase each word list associated with the node, goes to the
   next node and erases the previous header.  Finally it erase the last
   word node and then the header for the main list.

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   *p            header     Points at the word node to be erased.
***********************************************************************/
{
   header  *p = list->head_word;
   void erase_word();
   while ( p != NULL)  /* p is not passed list->tail_word*/
   {
      erase_word(p);        /* Erase the word. */
      p = p->prev_word;     /* Go to the next word. */
      if (p != NULL)
         free(p->next_word);   /* Free the previous word header. */
   }
   free(list->tail_word);  /* Free the last word header. */
   free(list);             /* Free the list header. */
}

void
erase_word(word_node)
   header *word_node;       /* word_node is the header for the word. */
/***********************************************************************
   erase_word() -  Accepts a pointer to a word node.  It traverses the 
   list of character nodes and frees the memory associated with each
   character.

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   *p           header        A helper pointer.
   *q           header        The pointer used to free the memory.
   i            int           A loop counter used to count the letters.
***********************************************************************/
{
   node *p,*q = word_node->word_head;       /* p points at the letters. */
   int i;
   for (i=0;i < word_node->letter_count;i++)
   {
       p = q->prev_letter;   /* Save the next letter pointer. */
       free(q);              /* Free the letter. */
       q = p;                /* Point at the next letter. */
   }
}

int
compare_words(first,second)
   header   *first,*second;
/***********************************************************************
   compare_words() -  Accepts two pointer to word headers.  It compares
   the letters contained at each node of the word in succession.  If it
   encounters a letter in one word that is greater than the corrsponding
   letter in the other word, it returns the appropriate value.  If the end
   of either (or both) word(s) is reached a determination of the longer
   of the two words is attempted by comparing the lengths of the words.
   If the lengths are different, the function returns the appropriate
   value, if the word lengths are the same, it returns the "equal value".

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   *p           header        Points to the header of the first word
                              to compare.
   *q           header        Points to the header of the second word
                              to compare.

***********************************************************************/
/*  if first > second, return 0.  If second > first, return 2
if first = second, return 1 */

{
   node *p = first->word_head,*q = second->word_head;

   while ((p != NULL) && (q != NULL))  /* As long as letters are there. */
   {
      if ( p->character > q->character)        /* First > Second. */
         return(0);
      else if ( p->character < q->character)   /* Second > First. */
         return(2);
      else                                     /* Equal so far! */
      {
         p = p->prev_letter;     /* Go to the next letters. */
         q = q->prev_letter;
      }
   }
 /* To get here, one or both of the words is out of letters and they
 are equal to this point. */

   if (first->letter_count > second->letter_count)       /* First > */
      return(0);
   else if (first->letter_count < second->letter_count)  /* Second > */
      return(2);
   else return(1);          /* The words are equal. */
}

void
Bi_In_Sort(big_list,compare)
   main_list *big_list;
   int    (*compare)();

/***********************************************************************
   Bi_In_Sort() - Accepts a pointer to the header of a list to process.
   First, a sorted portion is created at the end of the list that is 2
   items long then a loop is entered that repeatedly takes the next item
   at the "head" of the list and uses a binary search of the items in the
   sorted portion to determine the correct location for the new item.
   The new item is then removed from the head of the list and inserted
   at the appropriate location.  This process is repeated until the last
   item has been processed.

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   test         int           Used as a flag to signal the instance
                              where the new item should be inserted
                              prior to the present smallest member of
                              the list.
   count        int           Used to keep track of the number of words
                              already in the sorted portion of the list.
   middle       int           Used as a counter control variable to
                              determine how far "up" or "down" the list
                              to travel during the binary search.
   i            int           The count control variable for list traversal.
   up           int           Used as a boolean control variable to determine
                              if the next movement on the list should be
                              "up" the list or "down" the list.
   *current     header        The pointer that is moved "up" and "down" the
                              list while searching for the proper insertion
                              location.
   *newitem     header        A pointer that is used as a "handle" during
                              the movement/insertion of the head of the list.
   *sortbound   header        A pointer that points to the "lowest" item of
                              the sorted portion of the list.

***********************************************************************/
{
   int      test,count=1,middle,i,up;
   header   *current,*newitem,*sortbound;
   void     insert();

   if (big_list->word_count > 1)  /* A one item list is already sorted. */
   {
      current = big_list->tail_word->next_word;
      if ( (*compare)(current,big_list->tail_word) == 0)
      {
         current->next_word->prev_word = big_list->tail_word;
         big_list->tail_word->next_word = current->next_word;
         current->next_word = big_list->tail_word;
         big_list->tail_word->prev_word = current;
         current->prev_word = NULL;
         big_list->tail_word = current;
      }
      /*   The sorted part is now two items long. */
      sortbound = big_list->tail_word->next_word;
      do
      {
         up = 1;                         /*"Outside loop" initializations.*/
         newitem = big_list->head_word;
         count++;
         middle = (count +1) / 2;
         current = sortbound;
         test = 0;
         do
         {
            for ( i=0; i < middle ; i++)
            /* Go to the appropriate "middle". */
            {                          /* Either up or down the list. */
               current = up ?   current->prev_word : current->next_word;
               if (current == sortbound)
               {
                  test = 1;          /* If we get to the sort boundary, */
                  break;             /* we have to quit. */
               }
            }
            if (((*compare)(newitem,current) == 0) && current != NULL)
            {
               if (current == big_list->tail_word)
               /* Place the item at the tail. */
               {
                  big_list->head_word = newitem->prev_word;
                  big_list->head_word->next_word = NULL; /* the new head */
                  newitem->prev_word = NULL; /* the new end of the list */
                  newitem->next_word = big_list->tail_word;
                  big_list->tail_word->prev_word = newitem;
                  big_list->tail_word = newitem;
                  break;
                  /* The sortbound stays in the same place. */
               }
               else
                  up = 1;     /* Otherwise we have to look further "up". */
            }
            /*  The case of inserting after the tail_word is finished. */
            else if(test)   /* To get here, newitem <= current */
                            /* and current = sortbound */
                            /* so we insert before current */
                            /* and move sortbound to the new item. */
            {
               if ( current == newitem->prev_word)
                  return;   /* This is the actual finishing point. */
               else
               {
                  insert(big_list,newitem,current);
                  sortbound = newitem;
                  break;
               }
            }
            /*  Inserting before sortbound is finished. */
            /*  To get here, newitem <= current and somewhere in the middle*/
            else if ( (*compare)(newitem,current->next_word) <= 1)
            {
               insert(big_list,newitem,current);
               break;
            }
            else
            /*  newitem is strictly less than current->next, so: */
            /*  go down the list.  */
               up = 0;
       middle /=2;
       if (middle == 0)
          middle++;
       }
       while (1);
    }
    while (sortbound != big_list->head_word);
    }
}

void
read_list(ml,inp)
   main_list *ml;
   FILE     *inp;
{
   node       *dat;
   header     *list;
   char       a, ch = 32;    /* This makes the initial while loop work. */
   void       print_word();

   while ((ch == 32) || (ch == 13) || (ch == 10))
      ch = getc(inp);  /* This halts formation of words from */
   ungetc(ch,inp);   /* leading spaces, etc. (no letters). */

   if((ch = getc(inp)) != EOF)
   {
      dat = (node *) malloc (sizeof(node));
      list = (header *) malloc (sizeof(header));
      ml->head_word = list;      /* Points to the head of the whole list. */
      ml->tail_word = list;      /* Points to the tail of the whole list. */
      ml->word_count = 1;        /* There's at least one word in the list. */

      list->letter_count = 1;    /* There's at least one letter in the word. */
      list->word_head = dat;     /* This points at the first letter. */
      list->word_tail = dat;     /* This points at the last letter. */
      list->next_word = NULL;    /* This is the only word so far. */
      list->prev_word = NULL;


      dat->character = ch;  /* This is the first letter of the first word. */
      dat->next_letter = NULL;     /* It's also the only letter right now. */
      dat->prev_letter = NULL;

      while(ch != EOF)
      {
         if((ch=getc(inp)) != EOF)
         {
            if((ch == 32) || (ch == 13) || (ch == 10))
            {                   /* New word.  Make a new word header node. */
                                /* The following while, ungetc and if(ch..) were also necessary
                                        if allowing <cr> and <lf> as word seperators. */
               while ((ch == 32) || (ch == 13) || (ch == 10))
                                                ch = getc(inp);  /* This halts formation of words from */
               ungetc(ch,inp);   /* extra spaces, etc. (no letters). */
                                 /* but put the last character back. */
               if (ch == EOF)    /* This protects agains a final word */
                                                break;         /* being generated at EOF. */
               list = (header *) malloc (sizeof(header));
               /* link it into the main_list. */
               list->prev_word = NULL;               /* It's the new end. */
               ml->tail_word->prev_word = list;
               list->next_word = ml->tail_word;
               ml->tail_word = list;    /* Adjust the end of the word list. */
               ml->word_count++;        /* Increment the word count. */
               list->letter_count = 0;  /* No letters in the word yet. */
            }
            else if((ch != 10) && (ch != 13))
            /* Disallow <cr> and <lf> as actual parts of the words. */
            {
               dat = (node *) malloc (sizeof(node));
               dat->prev_letter = NULL; /* This is the new end of the word*/
               dat->character = ch;     /* Place the letter in the node. */
               if (list->letter_count == 0)  /* No letters in the word yet*/
               {
                  list->word_head = dat;   /* Point at the end of the word*/
                  dat->next_letter = NULL; /* The first letter has no next*/
               }
               else
               {
                  list->word_tail->prev_letter = dat; /* Link to end word */
                  dat->next_letter = list->word_tail;
               }
               list->letter_count++;  /* Increment the letter count. */
               list->word_tail = dat; /* The new end of the word. */
               }
         }
      }
   }
}

void
insert(bg_lst,new,cur)      /* These are the common statements required*/
   main_list *bg_lst;       /* for any insertion prior to current pointer*/
   header    *new,*cur;
/***********************************************************************
   insert() - Accepts the pointers to the main list and the current
    and newitem and performs an isertion of the new item prior to the
    current location.

   variables used:
   name         type                     Description
   -------------------------------------------------------------------
   none
***********************************************************************/
{

   bg_lst->head_word = new->prev_word;
   bg_lst->head_word->next_word = NULL;
   new->next_word = cur->next_word;
   cur->next_word->prev_word = new;
   cur->next_word = new;
   new->prev_word = cur;
}

wayned@wddami.spoami.com (Wayne Diener) (08/23/90)

>In article <3612@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
>In article <wayned.0932@wddami.spoami.com>,
>wayned@wddami.spoami.com (Wayne Diener) writes:

[ text deleted ]

>The C code presented is inordinately complicated, which makes it hard to
>see what is going on.  Basically, methods of inserting something into a
>sequence using binary search fall foul of one of two problems:
>	- if you use an array, you can *get* anywhere fast,
>	  but it takes O(N) time to move other stuff out of the
>	  way to *insert* anything
>	- if you use a linked list, you can *insert* fast,
>	  but it takes O(N) to *get* anywhere by following links.
>(There is a compromise position:  use a binary tree.  Binary trees make
>an excellent representation of sequences, and handle insertion well.
>That would yield a variant of TreeSort.)

Sorry if the code seems complicated.  As I mentioned, I was a _rank_ amateur
at the time.  (Hopefully, I've moved up to _full_ amateur status by now.)
I find the code easy to understand, but perhaps that's because I wrote it.

>
>Wayne Diener's version of the algorithm runs into the second problem.
>To be sure, it is doing O(NlgN) *comparisons*, but in order to get anywhere
>it has to follow pointers O(N**2) times.  The bottle-neck is the code
>
>	for (i = 0; i < middle; i++) {
>	    current = up ? current->prev_word : current->next_word;
>	    if (current == sortbound) { test = 1; break; }
>	}
>
>So it is an O(N**2) member of the family of insertion sorts.

A couple of points here:

1)  The _worst_ case number of pointer movements for each search is:
    N/2 + N/4 + N/8 + .... = N, and it has to do it N times, yielding
    O(N**2).  However, I believe you'll find that the average case
    will yield O(K * log base 2 (N)) (where K is probably about 1.4)
    number of pointer movements per element.

2) Although it yields only a constant improvement (and doesn't change
   the Big O class), the time cost of a pointer movement is (as a 
   normal rule) _substantially_ less than the time cost of the comparison. 
   If the comparison is quite costly, then for reasonable numbers of N
   (say 10K to 50K max?) the efficiency of the sort would still be
   approximately O(N * Log(N)) _even IF the pointer movement is O(N**2)_.

 
--

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/24/90)

In article <wayned.0932@wddami.spoami.com>,
wayned@wddami.spoami.com (Wayne Diener) writes:
> I have been unable to find any references to the sorting algorithm
> used in this program.

It is a member of the _family_ of sorts known as insertion sorts
(as Wayne Diener acknowledges in the name).

> If it's original with (sic) me, I claim a copywrite on it and release
> it for use by anyone.

That's copyRIGHT, and you can't claim copyright on an algorithm,
only on a program (or, sigh, look-and-feel).  What you could try for
is a *patent*.  This is not only fairly obvious, it's Prior Art, but
that hasn't stopped some other software patents...

> I have done empirical comparisons of this sort against bubble,
> insertion, selection and queuemerge sort.  It's a lot faster than
> the first three but slower than the queuemerge sort, however, it
> does the sort "in-place" (requires very little additional memory,
> other than a few more variables).

List merge doesn't require any extra memory either, and it only requires
*half* the number of pointers per record that Bi_In_Sort requires.  If
your method isn't substantially faster than the qsort() that came with
your C compiler (which a simple list merge can beat handsomely) why bother?

The C code presented is inordinately complicated, which makes it hard to
see what is going on.  Basically, methods of inserting something into a
sequence using binary search fall foul of one of two problems:
	- if you use an array, you can *get* anywhere fast,
	  but it takes O(N) time to move other stuff out of the
	  way to *insert* anything
	- if you use a linked list, you can *insert* fast,
	  but it takes O(N) to *get* anywhere by following links.
(There is a compromise position:  use a binary tree.  Binary trees make
an excellent representation of sequences, and handle insertion well.
That would yield a variant of TreeSort.)

Wayne Diener's version of the algorithm runs into the second problem.
To be sure, it is doing O(NlgN) *comparisons*, but in order to get anywhere
it has to follow pointers O(N**2) times.  The bottle-neck is the code

	for (i = 0; i < middle; i++) {
	    current = up ? current->prev_word : current->next_word;
	    if (current == sortbound) { test = 1; break; }
	}

So it is an O(N**2) member of the family of insertion sorts.
-- 
The taxonomy of Pleistocene equids is in a state of confusion.

colin@array.UUCP (Colin Plumb) (08/25/90)

Binary insertion sorts are known.  They are close to
minimum-comparison, but are O(n^2).  Your linked list implementation
traverses, half of the sorted list to find the node to make the first
comparison with, a quarter for the seocnd comaprison, etc., all
traversals being linear.  The total is the length list, or twice the
amount of traversal for a linear insertion sort, but with many fewer
comparisons.

A more usual implementation uses an array, which requires constant
time to do each comparison, for a total of log N to find where the
node goes, but has linear insert time.

There are data structures that allow O(log N) insertion and O(log N)
retrieval of the k'th element, which would make this log N * log N
per node (for N * log N * log N total), but it's simpler just to
use heapsort or something.

From memory: (arrays are 0-based, bottom is a valid element, I'm sure
you can optimize the tail calls and things)

void siftdown(int array[], int top, int bottom)
{
	int next = 2*top+1;

	if (next > bottom)
		return;
	if (next < bottom && array[next+1] > array[next])
		next++;
	if (array[next] > array[top]) {
		int temp = array[next];
		array[next] = array[top];
		array[top] = temp;
		siftdown(array, next, bottom);
	}
}

void heapsort(int array[], int bottom)
{
	int i;
	for (i = bottom/2; i > 0; --i) {
		siftdown(array, i, bottom);
	}
	for (i = bottom; i > 0; --i) {
		int temp = array[i];
		array[i] = array[1];
		array[1] = temp;
		siftdown(array, 1, i-1);
	}
}
-- 
	-Colin

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (08/25/90)

In article <wayned.0936@wddami.spoami.com> wayned@wddami.spoami.com (Wayne Diener) writes:
\\\
>   If the comparison is quite costly, then for reasonable numbers of N
>   (say 10K to 50K max?) the efficiency of the sort would still be
>   approximately O(N * Log(N)) _even IF the pointer movement is O(N**2)_.

If *part* of an algorithm is O(n**2), and all other parts are ``<''
O(n**2) then the algorithm is O(n**2). The idea is that O signifies the
asymptotic complexity of the algorithm -- for sufficiently large
n the O(n**2) term will dominate other complexity terms regardless of 
how tiny its ``coefficient'' is wrt that of other parts of the complexity
expression.

If you want to talk an algorithm that operates with ``reasonable'' values 
of some parameter then it is a bit misleading to use the O notation.

-Kym Horsell.

chris@mimsy.umd.edu (Chris Torek) (08/25/90)

Just for fun, here is yet another linked list sort.  This attempts to
use the fewest instructions possible for those n-squared loops :-)

#include <stddef.h>

/*
 * Sort a linked list in which the `next' pointer is the first entry.
 * A pointer to the (new) list head is returned.
 *
 * lsort() performs a binary merge sort.  The length parameter is
 * optional; if 0, lsort runs a first pass over the list to find the
 * length.
 */
struct list {			/* pseudo */
	struct list *next;
};

void *
lsort(list, listlen, compare)
	void *list;
	int listlen;
	register int (*compare)();
{
	struct list *hd;
	register struct list *p, **xp, *a, *b;
	register int i, left, mergelen;
	register struct list **ea, **eb;

	hd = list;
	if (listlen == 0) {
		for (i = 0, p = hd; p; p = p->next)
			i++;
		listlen = i;
	}

	/* if list is empty, this loop does not run */
	for (mergelen = 1; mergelen < listlen; mergelen <<= 1) {
		/*
		 * Merge ceil(listlen/mergelen) lists, pairwise.
		 *
		 * On each trip through the loop below, we split the
		 * list headed by p into two sublists a and b of length
		 * mergelen or less, followed by a trailing part p.
		 * List a will always be complete, but list b may be
		 * short when we near the tail.  (It can even be empty;
		 * we handle this as a special case for speed reasons.)
		 * We then merge lists a and b, sticking each next
		 * element at *xp and tracking xp along.  Eventually
		 * either a or b runs out; we can then tack on what
		 * remains of the other.
		 */
		left = listlen;
		p = hd;
		xp = &hd;
		do {
			/*
			 * Make list a, length mergelen, and figure
			 * out how many are left after that.  If none
			 * or negative, list b will be empty; stop.
			 */
			i = mergelen;
			if ((left -= i) <= 0) {
				*xp = p;
				break;
			}
			for (a = p; --i > 0; p = p->next)
				/* void */;
			ea = &p->next, p = p->next, *ea = NULL;

			/* make list b, length min(mergelen,left) */
			i = mergelen;
			if ((left -= i) < 0)
				i += left;
			for (b = p; --i > 0; p = p->next)
				/* void */;
			eb = &p->next, p = p->next, *eb = NULL;

			/* tail in p, empty iff left<=0 */
			for (;;) {
				/* append from appropriate sublist */
				if ((*compare)((void *)a, (void *)b) <= 0) {
					*xp = a;
					xp = &a->next;
					if ((a = a->next) == NULL) {
						*xp = b;
						xp = eb;
						break;
					}
				} else {
					*xp = b;
					xp = &b->next;
					if ((b = b->next) == NULL) {
						*xp = a;
						xp = ea;
						break;
					}
				}
			}
		} while (left > 0);
	}
	return ((void *)hd);
}
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/26/90)

In article <wayned.0936@wddami.spoami.com>, wayned@wddami.spoami.com (Wayne Diener) writes:
 [I wrote]
> >So it is an O(N**2) member of the family of insertion sorts.

> A couple of points here:

> 1)  The _worst_ case number of pointer movements for each search is:
>     N/2 + N/4 + N/8 + .... = N, and it has to do it N times, yielding
>     O(N**2).  However, I believe you'll find that the average case
>     will yield O(K * log base 2 (N)) (where K is probably about 1.4)
>     number of pointer movements per element.

What we're doing is inserting elements one-by-one into an ordered
collection.  Only the order matters, so we might as well use numbers.
Suppose at some time we have M elements in the ordered collection:
	[1|2|3|...|M]
The cursor will be pointing at the last element we inserted, and since
I'm assuming uniform random input, that is equally likely to be any of
the M items.  The next item will be 0.5, 1.5, 2.5, ... M.5 each with
equal likelihood, and the cursor will have to move to the appropriate
position.  The expected distance is something like

	sum{i=1..M} sum{j=0..M} abs(i-j)
	--------------------------------
	sum{i=1..M} sum{j=0..M} 1

Either there is a mistake in my algebra (which is *very* likely)
or this works out to an average movement of about M/3, for a total
of roughly (1/6)N**2 pointer movements, *average*.

> 2) Although it yields only a constant improvement (and doesn't change
>    the Big O class), the time cost of a pointer movement is (as a 
>    normal rule) _substantially_ less than the time cost of the comparison. 

Yes, of course.  That's why list merge outperforms quicksort.

>    If the comparison is quite costly, then for reasonable numbers of N
>    (say 10K to 50K max?) the efficiency of the sort would still be
>    approximately O(N * Log(N)) _even IF the pointer movement is O(N**2)_.

You can't call something O(NlgN) when you _know_ it contains an O(N**2)
component.  Why not stick with list merge, which not only has no O(N**2)
component, but has a smaller constant factor than quicksort?
-- 
The taxonomy of Pleistocene equids is in a state of confusion.