[comp.text.tex] sorter for large ascii files for msdos

spel@hippo.ru.ac.za (Dr. E.W. Lisse) (04/12/91)

Hi,

I need a SORT program that can handle 20000 lines ascii and is
reasonably fast. I do not have ftp access nor can I receive
comp.ibm.binary or what it is called as we have a phone link to the us
and it has too much traffic for the finance section :-)-O

I used Timo Salmi's TSCHEK recently and it is very nice. I read in the
commands from LATEX.TEX deleted the internal commands (\??@????) and
added them to the word list. Unfortunately I have to resort to very ugly
tricks to get the stuff sorted fine. (like reading it into data bases
and stuff :-)-O)

my sorter does unfortunately sort like this

things
thing

instead of

thing
things

It is however quite fast. 56 seconds for sortinmg through 13218 records
(127288Bytes) 232618 comparisons and at least half of the time
reading/writing to/from the 20ms hard disk.

SO, please get me the host and subdirectory and name of the ultimate
sorter or email me the offer of emailing it to me. Sources would of
course be best (so I learn some :-)-O) Tube-c and tube-pascal, but the
binaries do it as well.

regards, el

ps: followup can be directed to these newsgroups, as I DO read them
:-)-O


--
Dr. Eberhard W. Lisse       (spel@hippo.ru.ac.ZA)
Katatura State Hospital     (formerly extel@quagga.ru.ac.za)
Private Bag 13215           (Real Soon Now ...  el@lisse.NA)
Windhoek, Namibia           (no FTP yet. [This is Africa :-)-O])

spel@hippo.ru.ac.za (Dr. E.W. Lisse) (04/13/91)

In <spel.671470036@hippo> spel@hippo.ru.ac.za (Dr. E.W. Lisse) writes:

>Hi,

>I need a SORT program that can handle 20000 lines ascii and is
>reasonably fast. I do not have ftp access nor can I receive
>comp.ibm.binary or what it is called as we have a phone link to the us
>and it has too much traffic for the finance section :-)-O

I today discovered I need one that can handle 50.000 lines. (see below)

>I used Timo Salmi's TSCHEK recently and it is very nice. I read in the
>commands from LATEX.TEX deleted the internal commands (\??@????) and
>added them to the word list. Unfortunately I have to resort to very ugly
>tricks to get the stuff sorted fine. (like reading it into data bases
>and stuff :-)-O)

I dug around in my ZIPs and found that other nice speller MicroSpell
from MicroEmacs. I reread the docs and found a way of decompressing the
dictionary to ASCII (I always maintained it saves to read the manuals
:-)-O) Now I have merged the two dictionaries (13.000 and 48.000 words,
400KB)

>my sorter does unfortunately sort like this

>things
>thing

>instead of

>thing
>things

>It is however quite fast. 56 seconds for sortinmg through 13218 records
>(127288Bytes) 232618 comparisons and at least half of the time
>reading/writing to/from the 20ms hard disk.

It barfs now on 61.000 words. (This is forgivable :-)-O)

>SO, please get me the host and subdirectory and name of the ultimate
>sorter or email me the offer of emailing it to me. Sources would of
>course be best (so I learn some :-)-O) Tube-c and tube-pascal, but the
>binaries do it as well.

Yeah, give it to me :-)-O

I am willing to share the secret of how to uncompress the DICT.DCT for
those unwilling to read the manual :-)-O

>regards, el

>ps: followup can be directed to these newsgroups, as I DO read them
>:-)-O


>--
>Dr. Eberhard W. Lisse       (spel@hippo.ru.ac.ZA)
>Katatura State Hospital     (formerly extel@quagga.ru.ac.za)
>Private Bag 13215           (Real Soon Now ...  el@lisse.NA)
>Windhoek, Namibia           (no FTP yet. [This is Africa :-)-O])
--
Dr. Eberhard W. Lisse       (spel@hippo.ru.ac.ZA)
Katatura State Hospital     (formerly extel@quagga.ru.ac.za)
Private Bag 13215           (Real Soon Now ...  el@lisse.NA)
Windhoek, Namibia           (no FTP yet. [This is Africa :-)-O])

spel@hippo.ru.ac.za (Dr. E.W. Lisse) (04/22/91)

In <1991Apr20.024135.20183@cside1.uucp> kuyper@cside1.uucp (Kuyper Hoffman) writes:

>For those of you following the saga in sanet.uniforum, please note
>my Distribution line!!

>In article <spel.671542996@hippo> spel@hippo.ru.ac.za (Dr. E.W. Lisse) writes:
>>I today discovered I need one that can handle 50.000 lines. (see below)
>>
>>I dug around in my ZIPs and found that other nice speller MicroSpell
>>from MicroEmacs. I reread the docs and found a way of decompressing the
>>dictionary to ASCII (I always maintained it saves to read the manuals
>>:-)-O) Now I have merged the two dictionaries (13.000 and 48.000 words,
>>400KB)

>Out of interest, is this latter dictionary the same as the one
>Tanenbaum placed in comp.os.minix some years ago?  This was quite
>nice as it was a purely Public Domain dictionary.

No, it was timo salmi's and Dan Lawrence's. On the other hand I don't
read os.minix...

>>>It is however quite fast. 56 seconds for sortinmg through 13218 records
>>>(127288Bytes) 232618 comparisons and at least half of the time
>>>reading/writing to/from the 20ms hard disk.

>I once wrote a sorter using a T800 Trnsputer to sort a dictionary of
>more than 40 000 words.  As the transputer had 2MB of RAM I kept it
>all in main memory.  The sort took but a few seconds, getting the
>data across the bus to the Transputer memory though was something
>else entirely....

>Regards
>Kuyper
>-- 
>|      Kuyper Hoffman                   |  Life is just a bowl of All-Bran |
>|  kuyper@cside1.UUCP                   |  You wake up every morning       |
>|  ....!ddsw1!olsa99!oct1!cside1!kuyper |  And it's there....              |
>\--------------------------------------/ \--------------------------------/
--
Dr. Eberhard W. Lisse       (spel@hippo.ru.ac.ZA)
Katatura State Hospital     (formerly extel@quagga.ru.ac.za)
Private Bag 13215           (Real Soon Now ...  el@lisse.NA)
Windhoek, Namibia           (no FTP yet. [This is Africa :-)-O])

ts@uwasa.fi (Timo Salmi) (04/24/91)

In article <spel.672319218@hippo> spel@hippo.ru.ac.za (Dr. E.W. Lisse) writes:
>>>:-)-O) Now I have merged the two dictionaries (13.000 and 48.000 words,
>>>400KB)
>
>>Out of interest, is this latter dictionary the same as the one
>>Tanenbaum placed in comp.os.minix some years ago?  This was quite
>>nice as it was a purely Public Domain dictionary.
>
>No, it was timo salmi's and Dan Lawrence's. On the other hand I don't
>read os.minix...
:

Would you kindly let me into the secret :-).

...................................................................
Prof. Timo Salmi
Moderating at garbo.uwasa.fi anonymous ftp archives 128.214.12.37
School of Business Studies, University of Vaasa, SF-65101, Finland
Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun