[comp.unix.questions] ASCII & Binary sort

barnett@crdgw1.crd.ge.com (Bruce G. Barnett) (05/18/89)

In article <10289@smoke.BRL.MIL>, gwyn@smoke (Doug Gywn) writes:

[ re sorting files with ascii and binary data ]

>but there is no widespread UNIX
>convention for binary file structures.

Because such a convention or program would be machine dependent,
and files sorted this way would only be meaningful to machines with
similar architectures.

I also assume that each sorting routine should be optimized
for each architecture, floating point representation, etc.

Let me pose a question. What is the best way to solve this problem?

1. Write a sort routine based on XDR (eXternal Data Representation).
	The binary file would then be portable to other machines
	It wouldn't be very fast - I expect.

2. Write a sort routine that allows you to sort on binary fields.
	The data files would be non-portable. Special applications
	would have to read and write the binary files.

3. Store all information in ASCII format. Use standard UNIX utilities.

4. Write a dedicated C program. Since the application was custom,
   the sort would be too.

5. Use a real database package like Unify, Ingres, Oracle, etc.

6. ?

I personally would use 3 unless I had to make the program faster, and
then I would use 4. Using qsort(3) isn't very difficult to a C
programmer.

Does anyone have any experience of the disadvantages (speed, size) of using
a pure ASCII database?

--
Bruce G. Barnett	<barnett@crdgw1.ge.com>  a.k.a. <barnett@[192.35.44.4]>
			uunet!crdgw1.ge.com!barnett barnett@crdgw1.UUCP

abcscnge@csuna.csun.edu (Scott "The Pseudo-Hacker" Neugroschl) (05/20/89)

In article <415@crdgw1.crd.ge.com> barnett@crdgw1.crd.ge.com (Bruce G. Barnett) writes:
>Does anyone have any experience of the disadvantages (speed, size) of using
>a pure ASCII database?

Speed/size aren't that bad, IMHO.  We wrote a test driver for an embedded
system.  That used ASCII files, speed/size was OK.

The REAL problem was that the users knew just enough to be dangerous --
they'd try to edit the data files with vi(1).

-- 
Scott "The Pseudo-Hacker" Neugroschl
UUCP:  ...!sm.unisys.com!csun!csuna.csun.edu!abcscnge
-- Beat me, Whip me, make me code in Ada
-- Disclaimers?  We don't need no stinking disclaimers!!!

barnett@crdgw1.crd.ge.com (Bruce G. Barnett) (05/23/89)

In article <1987@csuna.csun.edu>, abcscnge@csuna (Scott "The Pseudo-Hacker" Neugroschl) writes:
>
>Speed/size aren't that bad, IMHO.  We wrote a test driver for an embedded
>system.  That used ASCII files, speed/size was OK.

What about files that approach database sizes ( 100,000+ records)?
It seems to me that when you are looking for all records where field
6 is between 9900 and 10700, converting an ASCII string into an int
must be more expensive in CPU time than just storing the data in int form.

--
Bruce G. Barnett	<barnett@crdgw1.ge.com>  a.k.a. <barnett@[192.35.44.4]>
			uunet!crdgw1.ge.com!barnett barnett@crdgw1.UUCP