[comp.sys.amiga] C standard across machines

v069qqqc@ubvmsd.cc.buffalo.edu (Michael Carrato) (10/29/90)

I am writing a general purpose database in C. I'm developing it for 
the company I work for so I'm doing it with the ibm XT's they have in mind.
However, I would like to eventually Amiga-tize it and release it as
(probably) shareware. I would therefore prefer to make the program (as 
well as the database format) as portable as possible.

The problem is, (correct me if I'm wrong) Intel machines are "little
endian" whereas Motorolas are "big endian". I assume C adopts one of
these standards to maintain protability of files between machines.
My question is, which is the C standard, and how do I ensure that a database
created on the ibm version will not be garbage on the Amiga version. 
(To add to the confusion, I'm doing the development on a sun... I have
no idea what the sparc standard is!)

As it stands now, I'm using fread() and fwrite() for my file i/o.
I KNOW this must be wrong, since it's simply a binary image transfer.
Is there any general way to read/write a block of data so that this 
mess can be resolved, or do I have to write code to output the data 
byte-by-byte?

Forgive me if I'm making a mountain out of a molehill, but I want to be 
sure before I go ahead with this thing..


As a side note, is there any demand for this type of program in the
Amiga community? I have seen few general purpose PD/Freeware/Shareware 
database programs out there, and I've been tempted to go out and buy 
SuperBase or some such commercial program. (Why haven't I? I'm broke! :-(..).
I would think there are others in my position... looking for a decent, cheap
database. Also, if you are interested, what features would you like to see?
I plan to intuitionalize it, and add an AREXX port.. (I don't have AREXX
yet but I'm working on it.. poor college student :-}). The program so far
is built to handle a large number of records fairly quickly, and I hope to 
eventually include a report generator and some other goodies.


Thanks for enduring all of this babble...

Mike Carrato
SUNY at Buffalo, senior in computer engineering.

mcmahan@netcom.UUCP (Dave Mc Mahan) (10/30/90)

 In a previous article, v069qqqc@ubvmsd.cc.buffalo.edu writes:
>
>I am writing a general purpose database in C. I'm developing it for 
>the company I work for so I'm doing it with the ibm XT's they have in mind.
>However, I would like to eventually Amiga-tize it and release it as
>(probably) shareware. I would therefore prefer to make the program (as 
>well as the database format) as portable as possible.
>
>The problem is, (correct me if I'm wrong) Intel machines are "little
>endian" whereas Motorolas are "big endian". I assume C adopts one of
>these standards to maintain protability of files between machines.
>My question is, which is the C standard, and how do I ensure that a database
>created on the ibm version will not be garbage on the Amiga version. 
>(To add to the confusion, I'm doing the development on a sun... I have
>no idea what the sparc standard is!)

You are correct in your observation.  Alas, there is no standard for the order
of bytes written to a file (that I have ever heard of or seen).  There really
aren't any gaurantees that you will even be using the same number of bytes for
an integer on one machine that you use on another.  Some compilers like to
use 16 bit integers, some prefer 32 bit integers.  About all K&R will let you
count on is that a long integer is at least as big as a 'normal' integer.
There isn't really even a definate standard as to the number of bits per
character, although most machines that are capable use 8 bits per character.
Some older mainframes use 6 bits and some use 9.   'C' is able to accept code
that uses the ASCII character sequence as well as EBCDIC (although functions
like strcmp() have special versions for EBCDIC machines).  About the only
thing you can REALLY count on is that you can read a byte and write a byte.
If you want to be really portable, you have to pack and un-pack bytes as they
are read or written to disk.  There really isn't any absolute gaurantee that
writing a binary 16 bit value on a 68000 CPU will write the most significant
byte first.

>As it stands now, I'm using fread() and fwrite() for my file i/o.
>I KNOW this must be wrong, since it's simply a binary image transfer.
>Is there any general way to read/write a block of data so that this 
>mess can be resolved, or do I have to write code to output the data 
>byte-by-byte?

To absolutely gaurantee portability, you have to read and write on a byte by
byte basis.  It's a pain, but it's the only way I know of.  Of course, you
can always make some assumptions.......

The other solution is to keep all your info as ASCII characters.  It tends
to make your data files much bigger and slow down access times, but last time
I checked, everyone still reads english from left to right, right?    :-)

>Mike Carrato
>SUNY at Buffalo, senior in computer engineering.

    -dave

etxtomp@eos.ericsson.se (Tommy Petersson) (10/30/90)

> In a previous article, v069qqqc@ubvmsd.cc.buffalo.edu writes:
>>
>>I am writing a general purpose database in C. I'm developing it for 
>>the company I work for so I'm doing it with the ibm XT's they have in mind.
>>However, I would like to eventually Amiga-tize it and release it as
>>(probably) shareware. I would therefore prefer to make the program (as 
>>well as the database format) as portable as possible.
>>
>>The problem is, (correct me if I'm wrong) Intel machines are "little
>>endian" whereas Motorolas are "big endian". I assume C adopts one of
>>these standards to maintain protability of files between machines.
>>My question is, which is the C standard, and how do I ensure that a database
>>created on the ibm version will not be garbage on the Amiga version. 
>>(To add to the confusion, I'm doing the development on a sun... I have
>>no idea what the sparc standard is!)

For portability of data files, NEVER use int, long... in your program.
Have a global header file that's included in all your programs define
portable types like BIT16, BIT32 a.s.o.

The byte-swap issue could be solved by something like this: Have a 16-bit
word in the beginning of a data file contain number 1. If you read this
data file on another machine and the word contains 256, the bytes are swapped.
You could have your read/write routines do different things depending on
this initial test(s) (have a table where you install pointers to function A
or B...).

However, I don't think byte-swapping is the only problem. Some machines
have MSB and LSB different, so this number one can probably become either
1, 128, 256 or 32768... ASCII strings will not get swapped, but integers
do, so the read routines will have to know what they are reading.

Tommy Petersson

peter@sugar.hackercorp.com (Peter da Silva) (10/30/90)

You have three choices:

	o Choose one end of the word, and write it out byte-by-byte.
	o Flip your words in memory, then use fwrite.
	o Use an intermediate transfer format.

I would suggest the third choice. And I'd suggest that if there's an existing
standard for whatever you're doing (bitmaps, say) use it. There is a whole
family of standards for things like video, audio, and music on the Amiga called
Interchange File Format. If there is no standard, use a text format unless
you're dealing with huge databases. This way you can edit your files to patch
up problems and for debugging using a regular text editor.

(Oh, I wish .info files were text!)
-- 
Peter da Silva.   `-_-'
<peter@sugar.hackercorp.com>.

sheley@convex.com (John "Dumptruck" Sheley) (11/01/90)

In <43072@eerie.acsu.Buffalo.EDU> v069qqqc@ubvmsd.cc.buffalo.edu (Michael Carrato) writes:

>I am writing a general purpose database in C. I'm developing it for 
>the company I work for so I'm doing it with the ibm XT's they have in mind.
>However, I would like to eventually Amiga-tize it and release it as
>(probably) shareware. I would therefore prefer to make the program (as 
>well as the database format) as portable as possible.
>
>The problem is, (correct me if I'm wrong) Intel machines are "little
>endian" whereas Motorolas are "big endian". I assume C adopts one of
>these standards to maintain protability of files between machines.
>My question is, which is the C standard, and how do I ensure that a database
>created on the ibm version will not be garbage on the Amiga version. 
>(To add to the confusion, I'm doing the development on a sun... I have
>no idea what the sparc standard is!)

  There is no C standard which declares that integer types will be stored in
memory `big endian' or `little endian' - that is left completely up to the
architecture you're running on.  C treats the basic types (char, int, short,
long, float, double) atomically, and cares not a bit how the machine stores
data as log as it retrieves the data the same way it stored the data.  This
causes nightmares (and rightly so) to people who do indiscrimant `union'ing
and try to port their code to different cpus.

>As it stands now, I'm using fread() and fwrite() for my file i/o.
>I KNOW this must be wrong, since it's simply a binary image transfer.
>Is there any general way to read/write a block of data so that this 
>mess can be resolved, or do I have to write code to output the data 
>byte-by-byte?

  This isn't really wrong - you're just screwed if you try to use the same
data file on an I*M and an Amiga.  There's really nothing you're missing.
The suggestions others have offered are about all you can do if you want
portability.  I could be mistaken, but I believe dbase stores it's numbers
as ASCII.  Not a great example, but there you are.  Something else to
consider, though is how you are going to run indexes to your database tables.
If you allow mixed composite keys (1 or more ASCII fields combined with 1 or
more numeric fields) to be used to build indexes, then it might be better
to store your numbers as ASCII.  The collating sequence of a group of keys
composed of ASCII data and pure binary numbers together will not be what
you want unless the numbers are unsigned.

  By the way, I'm thinking of doing something along those same lines, and
I'm having trouble deciding what mechanism to use for indexing.  Do you plan
to use ISAM, B-tree/balanced B-tree, or hashing?  I'm thinking about ISAM for
sorting speed, combined with extra pointers B-tree kind of searches.  I'd
appreciate any thoughts.

>Mike Carrato
>SUNY at Buffalo, senior in computer engineering.

John Sheley
Convex Computer Corp.
sheley@convex.com

david@twg.com (David S. Herron) (11/03/90)

In article <43072@eerie.acsu.Buffalo.EDU> v069qqqc@ubvmsd.cc.buffalo.edu writes:
>I am writing a general purpose database in C. ...
>... I would therefore prefer to make the program (as 
>well as the database format) as portable as possible.
>
>The problem is, (correct me if I'm wrong) Intel machines are "little
>endian" whereas Motorolas are "big endian". I assume C adopts one of
>these standards to maintain protability of files between machines.
...

No, C does not adopt a standard -- whatever the byte order is on
the local system is what it is.

The networking (TCP/IP) people came across the same problem loooong
ago and came up with a Network Standard Byte Order.  Grep around
in /usr/include on your sun for "ntohl" to see how they're implemented.
I forget where this ordering is defined, probably in an RFC somewhere.
Probably it's that one which reprints the part of Gulliver's Travels
detailing his experiences with the Endians..?


>As it stands now, I'm using fread() and fwrite() for my file i/o.
>I KNOW this must be wrong, since it's simply a binary image transfer.
>Is there any general way to read/write a block of data so that this 
>mess can be resolved, or do I have to write code to output the data 
>byte-by-byte?

No.. you can use f{read,write}() but before you fwrite() and after you
fread() you should go through the buffer & swap all the bytes around.

Usually this is implemented by having a layer of code through which
all accesses to the object is done.  (database in this case)  In this
layer you translate between internal & external representation appropriately
as data is going through this layer.

-- 
<- David Herron, an MMDF & WIN/MHS guy, <david@twg.com>
<- Formerly: David Herron -- NonResident E-Mail Hack <david@ms.uky.edu>
<-
<- Use the force Wes!