[net.lang.c] Mathsort

greg@utcsri.UUCP (Gregory Smith) (04/14/86)

In article <2554@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith)[me] writes:
>Now for something completely different...
>From: g-rh@cca.UUCP (Richard Harter)
>>	Grumble, grumble.  Sorry, this is a real practical problem.
>>	Let me give some context.  Suppose you are doing a high speed
>>	sort of character strings.  Being a clever fellow you have
>>	read the glossed over parts of the literature and have seen
>>	that a math sort is O(n) rather than O(n log n).  You also
>>	notice that it is cheaper to compare 4 bytes at once with
>>	an 32-bit integer compare than to make 4 byte compares.
>
>I'm not sure on this - isn't it ridiculous to do a mathsort of 32-bit
>ints? In O(n) above ( for the mathsort) doesn't n become 2^32 rather
>than the list size? If so, it would of course be even sillier for
>larger ( i.e. reasonably-sized ) char strings.

Sorry for posting stuff in my sleep - I'm straight on this now. A mathsort
is really O(n+m) where n is the list size and m is the difference between the
largest and the smallest numbers in the list. (If you don't know the min and
max, then m is the largest possible range ). Mathsort is only useful when
m is considerably less than n - e.g. a list of 10K numbers, all in the range
1 to 100. ( n=10K,m=100). So when a mathsort is useful, it is O(n). Of
course, you also need a table of size m. In the above example, though, m
is on the order of 2^32 and thus the time would be O(m), since m would
dominate, and and the table would be ridiculously huge. Do you really
think the mathsort would be 'glossed over' as much as it is if it was capable
of sorting arbitrary 32-bit numbers in O(n) time?
-- 
"If you aren't making any mistakes, you aren't doing anything".
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg

g-rh@cca.UUCP (Richard Harter) (04/20/86)

In article <> greg@utcsri.UUCP (Gregory Smith) writes:
>
>Sorry for posting stuff in my sleep - I'm straight on this now. A mathsort
>is really O(n+m) where n is the list size and m is the difference between the
>largest and the smallest numbers in the list. (If you don't know the min and
>max, then m is the largest possible range ). Mathsort is only useful when
>m is considerably less than n - e.g. a list of 10K numbers, all in the range
>1 to 100. ( n=10K,m=100). So when a mathsort is useful, it is O(n). Of
>course, you also need a table of size m. In the above example, though, m
>is on the order of 2^32 and thus the time would be O(m), since m would
>dominate, and and the table would be ridiculously huge. Do you really
>think the mathsort would be 'glossed over' as much as it is if it was capable
>of sorting arbitrary 32-bit numbers in O(n) time?
>-- 

	I've already gone over this in another article.  However to deal
with the O(n+m) question:  you map the actual range, m, into a smaller
range by a transformation.  The number of operations is, indeed, O(n),
subject to restrictions on the distribution of keys.

	The subject of math sorts is glossed over.  I suspect that there
there are two reasons for this.  The first is that the theory of
comparison sorts is (a) elegant and (b) independent of the distribution
of key values.  This is a real factor -- many authors focus on material
where the underlying theory is interesting from a theoretical point of
view.  The second reason is that implementation of a good math sort is
tricky -- it is very easy to have performance blow up for a variety of
subtle reasons.

	On a more general note, it is naive to suppose that the most
commonly used techniques and algorithms are the best available.  There
are so many possible algorithms and so much that has been done that
everybody is drowned in information.

		Richard Harter, SMDS Inc.