[comp.lang.c] sorting strings

jimmy@blake.acs.washington.edu (Jim Li) (11/17/89)

I want to sort 10000 records, each of 80 bytes long, in ascending order.
I put all the records in 'record' which was defined as follows:

char *record[10000];

What sorting algorithm shoud I use? Should I use the built-in 'qsort()'?
How about mergesort and heapsort?
(I want my program to run as FAST as possible!)

Thanks plenty.

Jim.
.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/18/89)

In article <4496@blake.acs.washington.edu> jimmy@blake.acs.washington.edu (Jim Li) writes:

| What sorting algorithm shoud I use? Should I use the built-in 'qsort()'?
| How about mergesort and heapsort?
| (I want my program to run as FAST as possible!)

  One technique commonly used for sorting strings is to create a vector
of pointers to the individual strings and then sort the pointers. You
can then write or move the data is you must.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

CMH117@PSUVM.BITNET (Charles Hannum) (11/18/89)

In article <1637@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen
Jr) says:
>
>In article <4496@blake.acs.washington.edu> jimmy@blake.acs.washington.edu (Jim
>Li) writes:
>
>| What sorting algorithm shoud I use? Should I use the built-in 'qsort()'?
>| How about mergesort and heapsort?
>| (I want my program to run as FAST as possible!)
>
>  One technique commonly used for sorting strings is to create a vector
>of pointers to the individual strings and then sort the pointers. You
>can then write or move the data is you must.

This is commonly known as an "indexed sort," and speeds up sorting by only
moving pointers around, rather than the actual blocks of data.  However,
this does not answer his question about the actual sorting algorithm.  I
I will leave *that* for someone else to answer.

--
- Charles Martin Hannum II       "Klein bottle for sale ... inquire within."
    (and PROUD OF IT!!!)         "To life immortal!"
  c9h@psuecl.psu.edu             "No noozzzz izzz netzzzsnoozzzzz..."
  cmh117@psuvm.psu.edu           "Mem'ry, all alone in the moonlight ..."

jmik@topaz.rutgers.edu (Rev. Joseph Miklojcik) (11/21/89)

[ I haven't read too far ahead in this group, so this may be redundant
  information.  Bear with me.  ]

> I want to sort 10000 records, each of 80 bytes long, in ascending order.

If you trust qsort(), it's a clear win.  It is possible that qsort() right off
the shelf may not use an efficent partitioning method, and may recurse on
small sublists.  This could make it worse than the other D&C sorts you mention.

Even if you don't trust the c library qsort(), implementing your own is still
a good idea if you are after pure speed.  The quicksort algorithm only stacks
indicies (rather than mergesort which stacks sublists), which will cause
minimal paging.  And it's faster than anything we can draw a lower bound on.

If you do have problems with thrashing on whatever machine you are using, you
might consider heapsort, which is entirely in-place.  But I doubt that should
be necessary.

If you do end up implementing your own qsort, the textbook for my algorithm
class has a very comprehensive list of tricks you can use to make qsort work
better.  "Computer Algorithms; Introduction to Design and Analysis", 2nd
edition, Saara Baase, Addison Wesley, Reading MASS.

(hope this helps)

--joe			joe@dot.unipress.com