[net.math] YASMP

throopw@rtp47.UUCP (Wayne Throop) (06/28/85)

> 	Finally, radix sorting is O(n log n), at best.  The
> reason is that the keys must be at least log n bits long to
> represent n distinct keys.
> 			Richard Harter, SMDS Inc.
> 			(Running at g-rh@cca-unix)

Not quite.  Radix sorting is O(n * m), where n is the number of records,
and m is some muliplier derived from the length of the key.  However,
the multiplier m is NOT related to the number of records, but to the way
the instance of the radix sort algorithm chooses to view the keys.
While it is true that the keys must be at least log n bits long to get
unique keys, this does not limit the speed of the radix sort, since a)
more than one bit is processed "at a time", and b) the keys need not be
unique.

Again, *comparison sorting* is limited to O(n log n), because of the
fundamental nature of sorting by comparing keys.  Other forms of sorting
are *not* limited to O(n log n), but they *cannot* depend (totally) on
comparison.  Radix sorting does not in any way depend on comparing keys,
in the same sense that a person processing mail doesn't need to compare
the zip codes on one of the letters to the zip codes on the others.

[Also, read Knuth's "Sorting and Searching"... it contains
 all of the above, and a lot more.  It explicitly states these
 properties of radix sort.]

[As a trivial example to show that the m in the O(n * m) radix sort
 can be smaller than log n, with more than n possible keys, consider
 sorting keys which are integers from 0 to 3.  There are (log2 4) bits
 in these keys, but clearly a radix sort in this case still takes only n
 operations, not 2n.]
-- 
Wayne Throop at Data General, RTP, NC
<the-known-world>!mcnc!rti-sel!rtp47!throopw

sjs@u1100s.UUCP (Stan Switzer) (06/28/85)

In article <80@rtp47.UUCP> throopw@rtp47.UUCP (Wayne Throop) writes:
> > 	Finally, radix sorting is O(n log n), at best.  The
> > reason is that the keys must be at least log n bits long to
> > represent n distinct keys.
> > 			Richard Harter, SMDS Inc.
> 
> Not quite.  Radix sorting is O(n * m), where n is the number of records,
> and m is some muliplier derived from the length of the key.  However,
> the multiplier m is NOT related to the number of records, but to the way
> the instance of the radix sort algorithm chooses to view the keys.
> While it is true that the keys must be at least log n bits long to get
> unique keys, this does not limit the speed of the radix sort, since a)
> more than one bit is processed "at a time", and b) the keys need not be
> unique.
> ....
> [As a trivial example to show that the m in the O(n * m) radix sort
>  can be smaller than log n, with more than n possible keys, consider
>  sorting keys which are integers from 0 to 3.  There are (log2 4) bits
>  in these keys, but clearly a radix sort in this case still takes only n
>  operations, not 2n.]
> -- 
> Wayne Throop at Data General, RTP, NC

Sorry Wayne, but this one is getting old.

1) How many bins?    Call it 'a'
2) How much duplication?   d = <unique keys>/n   ( n = #records )

Notice that:

     n log  dn  =  log 2  n ( log d + log n )  
          a           a          2       2

But since we have 'a' and 'd' (and 2 :-) constant,
it is pretty clear that

    RADIX SORT is O(n log dn) == O(n log n)
                         a

Enough?

Stan Switzer  ihnp4!u1100s!sjs  "Now where DID I leave my fish?"

g-rh@cca.UUCP (Richard Harter) (06/30/85)

Me:

>> 	Finally, radix sorting is O(n log n), at best.  The
>> reason is that the keys must be at least log n bits long to
>> represent n distinct keys.

Wayne Throop:

>Not quite.  Radix sorting is O(n * m), where n is the number of records,
>and m is some muliplier derived from the length of the key.  However,
>the multiplier m is NOT related to the number of records, but to the way
>the instance of the radix sort algorithm chooses to view the keys.
>While it is true that the keys must be at least log n bits long to get
>unique keys, this does not limit the speed of the radix sort, since a)
>more than one bit is processed "at a time", and b) the keys need not be
>unique.
>
>Again, *comparison sorting* is limited to O(n log n), because of the
>fundamental nature of sorting by comparing keys.  Other forms of sorting
>are *not* limited to O(n log n), but they *cannot* depend (totally) on
>comparison.  Radix sorting does not in any way depend on comparing keys,
>in the same sense that a person processing mail doesn't need to compare
>the zip codes on one of the letters to the zip codes on the others.
>
>[Also, read Knuth's "Sorting and Searching"... it contains
> all of the above, and a lot more.  It explicitly states these
> properties of radix sort.]
>
>[As a trivial example to show that the m in the O(n * m) radix sort
> can be smaller than log n, with more than n possible keys, consider
> sorting keys which are integers from 0 to 3.  There are (log2 4) bits
> in these keys, but clearly a radix sort in this case still takes only n
> operations, not 2n.]

You are quite correct in pointing out that the coefficient of n in the
lower bound for the radix sort depends on the number of unique keys 
rather than the size of n.  However the point about processing more
than one bit at a time is not well taken.  Suppose that we have M
distinct keys and that m is the number of bits needed to express the
M keys.  m is bounded below by log2(M) but may be greater.  Suppose
that we process p bits at a time.  Then the sort requires O(n * m / p)
operations which is still O (n * m). [n and m are the variables for
the O operator; p is a constant.]  Changing the base of the log in
an O(n log n) algorithm does not change the order of the algorithm,
even though it may speed it up by a magnificent constant factor.

It is also, I think, a little misleading to say that radix sorting
does not use comparison.  In effect, each key in the list is compared
with a test value in each pass.  (This is concealed in the usual
low end to high end implementation of a radix sort; however it is
explicit if you use the equivalent high end to low end implementation
with sublists.)  Quicksort also compares each key against a test value;
the difference is that the quicksort test values are always members
of the set of keys.

The interesting thing about the radix sort is that it permits us to
do multiple comparisons in a single operation.  (This can be done
either sorting low end to high end or high end to low end.)  In
practice radix sorts win (where applicable) for this reason.  For
example, I can sort a list of 32 bit integers using 4 passes with
a radix sort which processes 8 bits at a time.  Using quicksort it
would take me 10 passes (on average) to sort 1024 keys.  Note also
that it would take 32 passes with the radix sort if I processed one
bit at a time.  The fundamental reason that one can 'batch' comparisons
in an operation is that the radix sort uses data-independent test
values rather than test values drawn from data as is done in
'comparison' sorts.  The data-independent test values used by the
radix sort are such that the intrinsic parallelism (at the bit
level within a word) of the CPU can be exploited.

			Richard Harter
			g-rh@cca.UUCP

roy@gitpyr.UUCP (Roy Mongiovi) (06/30/85)

> Sorry Wayne, but this one is getting old.
> 
> 1) How many bins?    Call it 'a'
> 2) How much duplication?   d = <unique keys>/n   ( n = #records )

What does duplication have to do with anything?

> 
> Notice that:
> 
>      n log  dn  =  log 2  n ( log d + log n )  
>           a           a          2       2
> 
> But since we have 'a' and 'd' (and 2 :-) constant,
> it is pretty clear that
> 
>     RADIX SORT is O(n log dn) == O(n log n)
>                          a
> 
> Enough?
> 
> Stan Switzer  ihnp4!u1100s!sjs  "Now where DID I leave my fish?"

Consider the lowly card sorter.  You put your stack of cards through the
machine once for each column in the key on which you want to sort, collect
the bins in order, and run the cards through again using the next column.

It takes 80 runs through the machine to sort a stack of cards on all columns.
It doesn't matter whether you have 10 or 1000000 cards.  They go through the
machine 80 times.

That's O(80 * n), since all "n" cards must go through the machine 80 times.

Radix sort is NOT O(n log n).
-- 
Roy J. Mongiovi.	Office of Computing Services.		User Services.
Georgia Institute of Technology.	Atlanta GA  30332.	(404) 894-6163
 ...!{akgua, allegra, amd, hplabs, ihnp4, masscomp, ut-ngp}!gatech!gitpyr!roy

			The Map is Not the Territory

chuck@dartvax.UUCP (Chuck Simmons) (07/02/85)

> > > 	Finally, radix sorting is O(n log n), at best.  The
> > > reason is that the keys must be at least log n bits long to
> > > represent n distinct keys.
> > > 			Richard Harter, SMDS Inc.
> > 
> > Not quite.  Radix sorting is O(n * m) ... the keys need not be unique.
> > Wayne Throop at Data General, RTP, NC
> 
> Sorry Wayne, but this one is getting old.
> 1) How many bins?    Call it 'a'
> 2) How much duplication?   d = <unique keys>/n   ( n = #records )
> Notice that:
>      n log  dn  =  log 2  n ( log d + log n )  
>           a           a          2       2
> But since we have 'a' and 'd' (and 2 :-) constant,
> it is pretty clear that
>     RADIX SORT is O(n log dn) == O(n log n)
>                          a
> Enough?
> Stan Switzer  ihnp4!u1100s!sjs  "Now where DID I leave my fish?"

Oops!  'd' is not constant.  'd' is more or less inversely proportional
to 'n'.  Typically, when we use a radix sort, 'n' will be large compared
to the numnber of unique keys, 'd' will be small, and 'log d' will be
negative.
     For the cases when we will be interested in using a radix sort, we
will have the number of <unique keys> be about a**k for some positive integer
'k'.  In this case:

    n loga (dn) = n loga ((a**k/n*)n) = n loga (a**k) = nk

Now if we hold k constant and allow n to grow toward infinity, we have
a sorting algorithm which is O(n).  All this is pretty much as Wayne
implied.  

-- Chuck

wcs@ho95e.UUCP (x0705) (07/06/85)

From the ongoing discussiopns of RADIX sort:
> >> 	Finally, radix sorting is O(n log n), at best.
> 
> >Not quite.  Radix sorting is O(n * m), where n is the number of records,
> >and m is some muliplier derived from the length of the key.  ....

Let's compare apples with apples.  There are at least three important
measurements on input length: (For now assume we're sorting character strings)
	n - the number of records
	m - the "length" of a record, either average or max or whatever...
	N - the total amount of input data, in bytes or bits.
		N =~= n*m

A crudely written radix sort uses n * m steps, where m is the length of the
longest key.  This is how a mechanical card sorter works - sort the whole
batch on the mth column, then the m-1th, then... the 1st.  The total effort
is > N steps, possibly >>N if you have nasty data.

An efficiently written radix sort uses n * m steps where m is less than the
average key length.  Total work <=N steps, and does especially well on the
kind of data that trashes the crude version.  More later...

Radix sorts can be done in fewer steps if you use a bigger radix, i.e. use
2 or 4 bytes of the data instead of 1 at a time.  This uses more memory.

A comparison sort needs at least n log n comparisons, where each comparison
takes m steps, for a total of <= N log n work.  m is less than the average
key length, and is the "average" number of identical bytes between the two
words being compared.  The basic comparison algorithm looks like this:

	compare(string1, string2)
	char string1[MAX], string2[MAX];
	{
		int i;
		for (i=0 ; i<MAX ; i++)
		{
			if (string1[i] < string2[i]) return (LESSTHAN);
			if (string1[i] > string2[i]) return (GREATER);
		}
		return (EQUAL);
	}
Of course you would write better C code than that, and you can compare 2 or
4 bytes at once with no space penalty.

So the radix sort takes m*n steps, while the comparison sort takes m*n*log n,
for slightly different m's.
The comparison sort requires about N + O(n)*POINTERSIZE space;
the radix sort can require   about N + O(m+n+BINSIZE)*POINTERSIZE space,
where BINSIZE is the number of bins for your radix (e.g. 256).

> >[Also, read Knuth's "Sorting and Searching"... 
Yes!

============
Here's a rough description of the efficient radix sort (written in
pseudo-C - please excuse any dangling pointers and handwaving )

struct LIST {	char *	string;
		struct LIST * next_guy;
		struct LIST * last_item_in_list;
	    }
SORT(input)	/* The main sort routine - calls the real sort1 routine */
	char **input;
{
	struct LIST * input_list, * output_list;
	make input into a LIST;
	output_list = sort1( input_list, 0)
	return ( make_output_list_into_a_char** );
}
struct LIST *
sort1( input, column )
	struct LIST * input;
	int column;
{
	struct LIST * answer,
		    * bucket[256],	/* one for each ASCII character */
		    item;
	int i;

	for (item=input.string; item!=NULL; item = item->next_guy)
	{
		append( bucket[item->string[column]], item);
		/* put the item in the bucket for the columnth character */
	}

	answer = NULL;
	for ( i=0; i<256; i++ )	/* for each bucket, do */
	{
		if (bucket[i]==NULL) continue;		/* empty bucket */
		if (bucket[i]->next_guy == NULL)	/* one item	*/
			append( answer, bucket[i] );
		else					/* many		*/
		{
			sort1( bucket[i], column+1 );
			append( answer, bucket[i] );
		}
	}
	return (answer);
}

void append( list, item )
	struct LIST list;
	struct LIST item;
{
	Append item to end of list by magic in O(1) time;
	/* really should be named (nconc ) or (tconc ) if you like LISP	*/
}
-- 
Bill Stewart, AT&T Bell Labs, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

g-rh@cca.UUCP (Richard Harter) (07/09/85)

One further note on Bill Stewart's nice article on radix/comparison
sorting:  Bill says that a comparison sort is m*n*log n where n is
the number of records and m is, roughly, the key length, and that
radix sorting is m*n.  (The m's are somewhat different in each case.)
Bill restricts his discussion to character strings.  He gives code
the comparison step in comparison sorts and pseudocode for the radix
sort.

In principle the 'm' of the comparison sort can be reduced to O(1).
The reason is that in the early passes almost all comparisons differ
in the first byte (assuming reasonably random keys).  In later passes
the byte at which they differ is a later than the first.  Within
a sublist, however, the high order bytes of the keys will all be the
same so the comparison loop can start at a later point.  (Of course,
this amounts to using radix sort techniques in a comparison sort.)

				Richard Harter

RH:rn