[comp.lang.misc] Lying

jlg@lanl.gov (Jim Giles) (11/16/90)

A certain other contributor to this newsgroup claims to have _never_
asserted that sorting could be done in linear time _without_ mentioning
it was linear in the number of bytes and not the number of keys.
This was the _original_ remark that I referred to when I mentioned
the issue to begin with:

  > (Actually, you might speed the _test_ up by sorting the addresses of all
  > potentially aliased objects and then moving through the sorted list and
  > comparing adjacent items.  This is still _at_least_ N*log(N).

  Great, now you don't even know how to sort a collection of numbers in
  linear time.

This was a direct quote from an email message dated 23 Jul, 1990.
The person who sent the message challenged me to find any quote that
he made of the above variety.  I took that as implicit permission to
quote private mail communications over the net.

As usual, I don't expect _any_ apology (or indeed, any response at all)
from the other party to this travesty.  _I_ apologise to all the other
readers of the net for ever having responded to the person in question
at all.

J. Giles

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/16/90)

The saga continues: Jim finally finds a reference for Dan's supposed
statement that sorting is linear time (though it was in mail, not news).
But he fails to notice the word ``numbers''! Fans, keep the word
``numbers'' in your head. Remember ``numbers.'' Remember that machine
addresses are fixed length (typically length 4). Remember that sorting
numbers is not the same as sorting in general. Dan points out Jim's
mistake in excruciating detail and flames Jim for it. Life goes on.

In article <6097@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> A certain other contributor to this newsgroup claims to have _never_
> asserted that sorting could be done in linear time _without_ mentioning
> it was linear in the number of bytes and not the number of keys.

Not in news, anyway. But I'll accept your supposed example, then flame
you to death for your mistake.

>   > (Actually, you might speed the _test_ up by sorting the addresses of all
>   > potentially aliased objects and then moving through the sorted list and
>   > comparing adjacent items.  This is still _at_least_ N*log(N).
>   Great, now you don't even know how to sort a collection of numbers in
>   linear time.

Notice, everybody, the word ``numbers'' there. ``Sort a collection of
numbers.'' Notice that both Jim and I are talking about machine
addresses, which are numbers typically of length 2 or 4 bytes.

> This was a direct quote from an email message dated 23 Jul, 1990.

Correct.

> I took that as implicit permission to
> quote private mail communications over the net.

No problem. Fair use is rarely a problem in quoting short sections of
e-mail. Libel is.

> As usual, I don't expect _any_ apology (or indeed, any response at all)
> from the other party to this travesty.

I will not apologize, because you have not demonstrated me wrong.
Please, somebody with some intelligence, confirm that the word
``numbers'' does indeed appear in the above description? That I was
talking about sorting numbers, not about sorting in general? That my
statement is correct, and that sorting (fixed-length, as in machine)
numbers is, indeed, a linear problem?

Somebody who's been paying attention to this discussion, please confirm
that Jim has accused me of saying that sorting is linear time? Please
express an opinion on whether the above quote---which has to do with
sorting a collection of numbers, specifically machine addresses---is or
is not applicable to Jim's accusation that I have said sorting is linear
time.

Jim, I expect an apology from you. What you have just done in this
discussion is an example of what I mean by perversion. You take an issue
(whether my statements about sorting have or have not been correct). You
then change the issue into something else. By failing to quote enough of
previous articles, you introduce enough discontinuity that some readers
don't notice your perversion. And you continue to pervert what was said,
in order to cover your wounded ass.

I have not said that sorting is linear without qualification. I believe
that in news I have not said that sorting is linear without adding the
specific phrase ``in the number of bytes,'' though I am not so sure
about this. I am sure that my perceptions of radix sorting are correct.
I see no explanation for Jim's article but intentional perversion.

---Dan

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/16/90)

In article <6097@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
>G> (Actually, you might speed the _test_ up by sorting the addresses of all
>G> potentially aliased objects and then moving through the sorted list and
>G> comparing adjacent items.  This is still _at_least_ N*log(N).

 X> Great, now you don't even know how to sort a collection of numbers in
 X> linear time.

In all fairness, the quote from X here explicitly says "numbers",
G having said "addresses", and we may reasonably take it that this
means hardware addresses and hardware numbers.  (I understood X's
claim not to have omitted the qualification "in the number of bytes"
to refer to generic sorts that didn't know they had numbers to
work with.  Evidently I misunderstood.)

There is a new book
	Introduction to Algorithms
	Thomas H. Cormen, Charles E. Leiserson, & Ronald L. Rivest
	MIT Press, 1990
	ISBN 0-262-03141-8	(MIT Press, world-wide)
	ISBN 0-07-013143-0	(McGraw-Hill, USA)
I like it a lot.  Section 9.4 says on p180
	Bucket sort runs in linear time on the average.  ...
	bucket sort assumes that that the input is generated by
	a random process that distributes elements uniformly
	over the interval [0,1).
They explicitly use the formula O(n), where n is the number of keys.

Clearly addresses aren't random floating-point numbers, but
	(ptr - ptrmin)/scale
where double scale = ptrmax - ptrmin;
_is_ in the interval [0,1) and might be close enough to random.

This is not a defence of X.  I'm just pointing out to anyone who
might have missed it that there _is_ a method described in the
literature as sorting numbers in time linear in the number of keys.
-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

oz@yunexus.yorku.ca (Ozan Yigit) (11/17/90)

In article <4298@goanna.cs.rmit.oz.au> ok@goanna.cs.rmit.oz.au
(Richard A. O'Keefe) writes:

>There is a new book
>	Introduction to Algorithms
>	Thomas H. Cormen, Charles E. Leiserson, & Ronald L. Rivest
>	MIT Press, 1990
>	ISBN 0-262-03141-8	(MIT Press, world-wide)
>	ISBN 0-07-013143-0	(McGraw-Hill, USA)
>I like it a lot. ...

Yes, I think it is a very nice book also. Kinda big, heavy and expensive
but well worth it. 

The title of Chapter 9 is "Sorting in Linear Time".
Last paragraph of the intro reads:

	Sections 9.2, 9.3 and 9.4 examine three sorting algorithms --
	counting sort, radix sort, and bucket sort -- that run in linear
	time. Needless to say, these algorithms use operations other than
	comparisons to determine the sorted order.  Consequently, the O(n
	log n) lower bound does not apply to them.

I think Dan does seem to know what he is talking about, regardless of his
overall level of (ahem) obnoxiousness. :-)

oz
---
The king: If there's no meaning             internet:  oz@nexus.yorku.ca
in it, that saves a world of trouble        usenet: ....!utai!yunexus!oz
you know, as we needn't try to find any.    bitnet: oz@[yulibra|yuyetti]
Lewis Carroll (Alice in Wonderland)         Phonet: +1 416 7365257x33976

rh@smds.UUCP (Richard Harter) (11/17/90)

In article <4298@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:

> I like it a lot.  Section 9.4 says on p180
> 	Bucket sort runs in linear time on the average.  ...
> 	bucket sort assumes that that the input is generated by
> 	a random process that distributes elements uniformly
> 	over the interval [0,1).
> They explicitly use the formula O(n), where n is the number of keys.

The good news is that you don't need uniformity - almost any relatively
smooth distribution will do.  The bad news is that many real life
sorting problems are distinctly not smooth.

It should also be noted that bucket sorts, distribution sorts, math sorts,
et. al. are linear only within a range.  [If the file size is large enough
the implicit assumptions about short data access times fail.]
-- 
Richard Harter, Software Maintenance and Development Systems, Inc.
Net address: jjmhome!smds!rh Phone: 508-369-7398 
US Mail: SMDS Inc., PO Box 555, Concord MA 01742
This sentence no verb.  This sentence short.  This signature done.

dbc@bushido.uucp (Dave Caswell) (11/18/90)

.
.This is not a defence of X.  I'm just pointing out to anyone who
.might have missed it that there _is_ a method described in the
.literature as sorting numbers in time linear in the number of keys.


There isn't anything special about that book; many books have sections
about linear time sorts.  What's scary is that some people don't seem
to know it.  I feel some justification for my practice of discussing
sorting complexity during job interviews.


-- 
David Caswell                             dbc%bushido.uucp@umich.edu

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/20/90)

In article <246@smds.UUCP> rh@smds.UUCP (Richard Harter) writes:
> The good news is that you don't need uniformity - almost any relatively
> smooth distribution will do.  The bad news is that many real life
> sorting problems are distinctly not smooth.

The point of what I call ``adaptive distribution sort'' is that MSD
radix sorting can take advantage of all the same tricks as adaptive
quadrature. But any MSD radix sort is linear in the number of bytes.

> It should also be noted that bucket sorts, distribution sorts, math sorts,
> et. al. are linear only within a range.  [If the file size is large enough
> the implicit assumptions about short data access times fail.]

All immediately accessible storage, whether internal or external, is
accessible within a fixed time. Any computational model that accurately
reflects this fact will admit sorting linear in the number of bytes.

In practice the large constant factor makes record movement desirable,
so a properly designed merge-radix sort is one of the best choices for
external data sets.

---Dan

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/26/90)

In article <246@smds.UUCP>, rh@smds.UUCP (Richard Harter) writes:
> It should also be noted that bucket sorts, distribution sorts, math sorts,
> et. al. are linear only within a range.  [If the file size is large enough
> the implicit assumptions about short data access times fail.]

This applies to *all* "in-core" sorting methods.  It's not that the sorting
method breaks down, but that the "array access is constant time" assumption
breaks down.  That also applies to things like matrix multiplication!
One can use samples (as one does in linear-time selection algorithms) to
partition the collection into manageable pieces.

-- 
I am not now and never have been a member of Mensa.		-- Ariadne.