eppstein@garfield (David Eppstein) (01/22/89)
I don't why I'm contributing to this recurring flamefest, but here goes... Big endian lets you use integer comparison instructions to do string compares a word at a time. Little endian means you are stuck with a byte at a time. The arguments about how people expect to read things seem pretty bogus to me. One of the things computers are very good at doing is format conversion. -- David Eppstein eppstein@garfield.cs.columbia.edu Columbia U. Computer Science
stevev@tekchips.CRL.TEK.COM (Steve Vegdahl) (01/24/89)
In article <6133@columbia.edu>, eppstein@garfield (David Eppstein) writes: > Big endian lets you use integer comparison instructions to do string > compares a word at a time. Little endian means you are stuck with a > byte at a time. This depends on how strings are represented. If you take the view that "C does things the right way, the only way", then I would be inclined to agree that making the "wrong" endian choice would slow down string comparison *if* your compiler is smart enough to figure out the optimization, *or* if the string-comparison algorithm is coded in a non-portable way (namely, an endian-ness assumption). But consider a representation of strings where the characters are laid out "backwards" in memory; a pointer to a string would contain the address of the strings highest-addressed byte, which is the first character of the string. Now, big endian and little endian find their roles reversed WRT the above optimization. > The arguments about how people expect to read things seem pretty bogus > to me. One of the things computers are very good at doing is format > conversion. I agree. I also believe that people (other than language implementors) should not be concerned with the details of how a string is represented in memory. He should only be reading high-level-language code. Steve Vegdahl Computer Research Lab Tektronix Labs Beaverton, Oregon
ok@aucsv.UUCP (Richard Okeefe) (01/25/89)
Before arguing about whether big-endian order or little-endian order is "more natural" for people, it's enlightening to consider the historical origin of the way we write numbers. We write from left to right, and put the most significant digit on the left. But we copied that method of writing numbers from >>Arabic<< mathematics, where the direction of writing was otherwise right to left. So in Arabic, you encountered the low digit first in your normal reading scan. This had the pleasant psychological advantage that when you added two numbers, you wrote the answer down in the order that you always wrote numbers, instead of starting from the opposite end. There was one famous mathematician this century who habitually wrote numbers least-significant-digit- first, apparently for this reason. So _both_ conventions are "natural" in human writing systems.
jimp@cognos.uucp (Jim Patterson) (01/26/89)
In article <6133@columbia.edu> eppstein@garfield (David Eppstein) writes: >Big endian lets you use integer comparison instructions to do string >compares a word at a time. Little endian means you are stuck with a >byte at a time. I'd like to see an algorithm that actually benefits from this. Consider... If you know how long the string is ahead of time, you can optimize the first (n / 4) int's (assuming 4-byte ints) whether or not it's big-endian. If it's big-endian, then the first non-equal match indicates the result. If it's little-endian, you have to switch to a byte-wise loop for the non-equal word which means up to four bytes are checked twice. In the C library, this approach only applies to memcmp. You have to treat the last word specially if it's not an even multiple of the word size with the big-endian approach. Otherwise, you will get the wrong answer if the portion that shouldn't be matched is the only part that is different. The little-endian approach already checks any non-matching word, so if designed right this would not be a special case. If you want to do signed-byte comparisons, the big-endian word-oriented approach won't work (you will have to do it similar to the little-endian approach). The reason is that the sign of the result will indicate the relative ordering of the int's, and not of the bytes that mismatch. If you don't know how long the string is (as for strcmp and strncmp), then you have to scan the string to find how long it is. For a word-oriented algorithm to be effective here, you need an algorithm which detects which byte of a word (if any) contains a NUL. I contend that there's no simple way to do this with integer instructions; it's more effective to use byte-oriented instructions. The only word-oriented approaches I can think of are along these lines. has_NUL = ! (i & 0xff && i & 0xff00 && i & 0xff0000 && i & 0xff000000); If you have to scan the string as bytes anyways, I think that it would be more efficient to compare them with the other string at the same time. Maybe someone has a better algorithm. This one is sure to be worse than using byte-oriented instructions, unless your machine just doesn't have any or they are woefully inadequate. A straight table-lookup is obviously out of the question. In summary, a big-endian machine might gain a slight advantage if it knows ahead of time how long the string is. Really, the only difference is that the big-endian machine can run along some fixed number of words, and return the result from the first non-equal match, whereas the little-endian algorithm has to double-check the unmatching word with byte instructionsbefore it can return. For varying-length strings, I see no advantage to the word-oriented approach. If anyone has any good word-oriented implementations of memcmp or especially strcmp, I'd be interested in seeing them. -- Jim Patterson Cognos Incorporated UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707 PHONE:(613)738-1440 3755 Riverside Drive Ottawa, Ont K1G 3Z4
throopw@xyzzy.UUCP (Wayne A. Throop) (01/27/89)
> stevev@tekchips.CRL.TEK.COM (Steve Vegdahl) >> eppstein@garfield (David Eppstein) >> Big endian lets you use integer comparison instructions to do string >> compares a word at a time. > This depends on how strings are represented. [...] > consider a representation of strings where the characters are laid out > "backwards" in memory; [...] > Now, big endian and little endian find their roles reversed WRT the > above optimization. True, true... BUT this role reversal is by no means "free". Now the little-endian-but-backwards-strings machine cannot use the same routines to allocate, read, write, and otherwise treat as an uninterpreted bucket of bits strings and non-strings. All manipulators of pointers will have to "know" what they point at. There will be much "duplicated" code on this machine relative to a machine where all memory-chunks are addressed by their "least" (or greatest) component-address. And of course all the attendant bugs that occur when a pointer of one kind is fed to a routine expecting the other kind. Not that these problems are insolvable. It may (possibly) even be worthwhile to do things this way. But the big-endian way is still superior in this respect, and making this change is only trading one difficulty for another. (Of course, the little-endian way is superior in other respects.) -- There are two ways to write error-free programs; only the third one works. --- Alan J. Perlis -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
tim@crackle.amd.com (Tim Olson) (01/28/89)
In article <5124@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes: | In article <6133@columbia.edu> eppstein@garfield (David Eppstein) writes: | >Big endian lets you use integer comparison instructions to do string | >compares a word at a time. Little endian means you are stuck with a | >byte at a time. | | I'd like to see an algorithm that actually benefits from this. | Consider... | | If you don't know how long the string is (as for strcmp and strncmp), | then you have to scan the string to find how long it is. For a | word-oriented algorithm to be effective here, you need an algorithm | which detects which byte of a word (if any) contains a NUL. | | If anyone has any good word-oriented implementations of memcmp or | especially strcmp, I'd be interested in seeing them. A couple of years ago I posted essentially the string routines we use on the Am29000. These routines make use of: 1) comparisons a word at a time 2) the "cpbyte" instruction to detect a null byte in a word 3) the "extract" instruction (extract a 32-bit word from a 64-bit word pair at any bit boundary) to take care of misaligned operands These routines broke even with a standard byte-at-a-time hand-coded routine with 5-byte strings (including null) and were always better with longer strings. -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
dik@cwi.nl (Dik T. Winter) (01/28/89)
In article <5124@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes: > word-oriented algorithm to be effective here, you need an algorithm > which detects which byte of a word (if any) contains a NUL. > > I contend that there's no simple way to do this with integer > instructions; it's more effective to use byte-oriented instructions. > The only word-oriented approaches I can think of are along these > lines. > > has_NUL = ! (i & 0xff && i & 0xff00 && i & 0xff0000 && i & 0xff000000); > On a ones complement machine (because of end-around carry): has_NUL = (~i ^ (i - 0x01010101)) & 0x01010101 On a twos complement machine you have to look at the carry bit too. However you might also try: j = i - 0x01010101; has_NUL = ((~i ^ j) & 0x01010101) | (~j & i & 0x80000000) modulo some typos of course. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
jangr@microsoft.UUCP (Jan Gray) (01/28/89)
In article <5124@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes: >If you don't know how long the string is (as for strcmp and strncmp), >then you have to scan the string to find how long it is. For a >word-oriented algorithm to be effective here, you need an algorithm >which detects which byte of a word (if any) contains a NUL. > >I contend that there's no simple way to do this with integer >instructions; it's more effective to use byte-oriented instructions. > > has_NUL = ! (i & 0xff && i & 0xff00 && i & 0xff0000 && i & 0xff000000); This went around comp.arch a while ago. has_NUL = (((i-0x01010101)&~i)&0x80808080) != 0, e.g. "test if there were any borrows as a result of the bytewise subtracts" Using this trick on the '386, strlen on long strings can be made about 30% faster than using the dedicated string instruction "rep scasb"! (Except this will cause many instruction fetches that will keep your bus busy.) The 80960 has the SCANBYTE instruction, and the 29000 has CPBYTE, for just this sort of thing. Hmm. "0x80808080". I knew the 8080 was good for something... :-) Jan Gray uunet!microsoft!jangr Microsoft Corp., Redmond Wash. 206-882-8080
bill@bilver.UUCP (bill vermillion) (01/29/89)
In article <186@aucsv.UUCP> ok@aucsv.UUCP (Richard Okeefe) writes: >Before arguing about whether big-endian order or little-endian >order is "more natural" for people, > >So _both_ conventions are "natural" in human writing systems. And mixed conventions are considered normal in spoken English. Consider that, for example twenty-five or thirty-six would fit the "big-endian" defintion, the numbers thir-teen, four-teen, would be considered "little endian" To be consistant with the numbering scheme of 1 to 100 the numbers after nine should probably be teen-zero or ten-zero, followed by teen-one, teen-three, teen-four. Using the "y" ending would be confusing if we called then teenty-four of tenty-four. Too much sound-alikes for the twentys. It appears the natural order is dis-order -- Bill Vermillion - UUCP: {uiucuxc,hoptoad,petsd}!peora!rtmvax!bilver!bill : bill@bilver.UUCP
maa@nbires.nbi.com (Mark Armbrust) (01/31/89)
In article <7@microsoft.UUCP> jangr@microsoft.UUCP (Jan Gray) writes: >In article <5124@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes: > >This went around comp.arch a while ago. > has_NUL = (((i-0x01010101)&~i)&0x80808080) != 0, >e.g. "test if there were any borrows as a result of the bytewise subtracts" > >Using this trick on the '386, strlen on long strings can be made about 30% >faster than using the dedicated string instruction "rep scasb"! @: lodsd 5 clocks to execute mov ebx, eax 2 sub eax, 01010101h 3 not ebx 2 and eax, ebx 3 and eax, 80808080h 3 loopz < 12 (30 clocks/19 bytes) Seems to me that the following would be a bit faster: @: lodsd 5 clocks to execute mov ebx, eax 2 shr ebx, 16 3 and ax, bx 3 and al, ah 3 loopnz @ 12 (28 clocks/13 bytes) In any case, I prefer strings stored with leading count instead of trailing zero's--I've been thinking of writing an alternate library for C to be able to handle them. Some of the programming I've done could have benifitted from this type of strings. (So what if they're longer--memory is cheaper than execution speed.) Mark (I should know better by now to post things when I have a nasty cold--there's prob'ly something wrong with the above and I'll be burried in mail :-( )
lexw@idca.tds.PHILIPS.nl (A.H.L. Wassenberg) (01/31/89)
In article <389@bilver.UUCP> bill@bilver.UUCP (bill vermillion) writes: > To be consistant with the numbering scheme of 1 to 100 the numbers > after nine should probably be teen-zero or ten-zero, followed by Do you think that is consistent? Is 20 in your language "two-zero"? Or "twen-zero"? I think "onety" would be more consistent (considering twenty, thirty, etc.). > teen-one, teen-three, teen-four. Using the "y" ending would be > confusing if we called then teenty-four of tenty-four. Too much > sound-alikes for the twentys. These would become onety-one, onety-two [ you forgot that one :-) ], onety-three, etc. All very consistent, and not more sound-alike than the other ....-ty's. __ / ) Lex Wassenberg / Philips Telecommunication & Data Systems B.V. / _ Apeldoorn, The Netherlands __/ /_\ \/ Internet: lexw@idca.tds.philips.nl (_/\___\___/\ UUCP: ..!mcvax!philapd!lexw
w-colinp@microsoft.UUCP (Colin Plumb) (02/01/89)
maa@nbires.UUCP (Mark Armbrust) wrote: > Seems to me that the following would be a bit faster: > > @: lodsd 5 clocks to execute > mov ebx, eax 2 > shr ebx, 16 3 > and ax, bx 3 > and al, ah 3 > loopnz @ 12 (28 clocks/13 bytes) But won't work. Consider "\1\2\4\8". BTW, Microsoft uses leading-count strings extensively. I'm not so sure about the 255-character limit the byte count imposes, but the arbitrary contents are nice, and people who've never used Real Software don't seem to notice. (Quick: who's run into Unix's 10K command-line limit?) -- -Colin (uunet!microsof!w-colinp)
maa@nbires.nbi.com (Mark Armbrust) (02/03/89)
In article <38@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: >maa@nbires.UUCP (Mark Armbrust) wrote: >> Seems to me that the following would be a bit faster: >> >> [some blatently WRONG code deleted.] > >But won't work. Consider "\1\2\4\8". Like I said in the original posting; I should have learned by now not to post when the brain is misfiring due to 'flu. As for string length limits and count preceeded string, I prefer using two one-word values: one that is fixed at allocation time and gives the maximum size string that can be held by this string, and one that gives the current size of the string. Mark
peter@ficc.uu.net (Peter da Silva) (02/03/89)
In article <38@microsoft.UUCP>, w-colinp@microsoft.UUCP (Colin Plumb) writes: > (Quick: who's run into Unix's 10K command-line limit?) I don't know, but I run into UNIX's command-line and environment limit all the time. Last I checked this limit was 1K, but it's probably bigger these days. On our 286 boxes it's certainly nowhere near 10K. -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Work: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. `-_-' Home: bigtex!texbell!sugar!peter, peter@sugar.uu.net. 'U` Opinions may not represent the policies of FICC or the Xenix Support group.
gillies@m.cs.uiuc.edu (02/07/89)
/* Written 12:25 am Feb 5, 1989 by PLS@cup.portal.com in m.cs.uiuc.edu:comp.arch */ This deserves a new subject. Since it was mentioned in the Endian Wars, does anyone know why C uses the null terminated string rather than an explicit length? ... - It removes a character from the character set, a source of many C bugs - All machines I know of that have character string instructions want the length of the string. This forces the string primitives to first scan for null, a time wasting operation. /* End of text from m.cs.uiuc.edu:comp.arch */ First, let me say that string type is a religious issue. I once worked for a workstation vendor whose main workstation had THREE different types of strings. Each development group claimed THEIR strings ran the fastest on the hardware platform. Every package had about 25 string subroutines, including 5-10 for "converting" "inferior" formats into "ours". Second, I was once told that the following C code compiles into 1 instruction (or something amazingly short) on the PDP-11, C's mother machine: while (*p++ = *q++); This is perhaps part of the reason why strings were designed with null-termination Don Gillies {uiucdcs!gillies} U of Illinois
firth@sei.cmu.edu (Robert Firth) (02/08/89)
In article <3300050@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes: >Second, I was once told that the following C code compiles into 1 >instruction (or something amazingly short) on the PDP-11, C's mother >machine: > >while (*p++ = *q++); It compiles into two instructions. If p and q are in registers R1 and R2 respectively, the code is 1$: MOVB (R2)+,(R1)+ BNE 1$ The "=" maps onto the MOVB, the "++" maps onto the autoincrement address mode, the move sets the condition codes for the branch to test, and the move of the trailing NUL makes the test fail. This is a neat and beautiful idiom in PDP-11 Assembler. There is, however, one problem with the equivalent C code: it is incorrect. After termination of the loop, the variables p and q, though declared of type 'pointer-to-char', will hold values that do not point to declared or allocated objects of type 'char'. Should you ever have the misfortune to port this code to a machine with hardware segmentation, and automatic segment bounds checking as part of the address arithmetic, (or be a consultant involved in such a port), you will face this problem. Could someone inform me whether the current C standard has fixed this? The simplest answer I guess is to rule that the address of array[upb+1] must always be legal; in practice this means the implementation has to leave dead space at the end of each memory segment.
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/09/89)
In article <186@aucsv.UUCP> ok@aucsv.UUCP (Richard Okeefe) writes: >So _both_ conventions are "natural" in human writing systems. That is absolutely true. Nevertheless, it is interesting to note that when we "Westerners" try to produce a consistent little endian machine, we always seem to fail. I thought that the ns32000 series had finally done it, but someone recently pointed out that in one small way it isn't quite. The fact is, after a typical "Western" education, it seems to be quite difficult to work with little endian numbers. Just look at the mess DEC made of the extensions to the PDP-11 formats when they produced the VAX. So, I still claim that it is easier for almost all Anglo/American/European folks to use big-endian numbers. For whatever it is worth. But since it only comes up when reading dumps, my real interest in the subject is VERY limited. I only want to point out that a) there is no big efficiency advantage to using little-endian formats, as some little endians have claimed (as far as I can see, all such claims made in this newsgroup have been refuted), and b) there ARE MANY advantages to having all machines the SAME Now, DEC just turned down the chance to start evolving in the direction of common formats with their new RISC machine, so, the best we can hope for now is a common interchange file format that all machines would create when data interchange is required. I suggest that the time is ripe for the development of such a standard. One small request - people who are working on creating such a standard please include 64 bit integer (and floating point, but that is usual) formats - there are quite a few uses for 64 bit integers, and some machines that really need access to integers at least 48 bits long. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
tim@crackle.amd.com (Tim Olson) (02/09/89)
In article <8480@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: | >while (*p++ = *q++); | | This is a neat and beautiful idiom in PDP-11 Assembler. There is, | however, one problem with the equivalent C code: it is incorrect. | After termination of the loop, the variables p and q, though declared | of type 'pointer-to-char', will hold values that do not point to | declared or allocated objects of type 'char'. Should you ever have | the misfortune to port this code to a machine with hardware segmentation, | and automatic segment bounds checking as part of the address arithmetic, | (or be a consultant involved in such a port), you will face this | problem. | | Could someone inform me whether the current C standard has fixed this? | The simplest answer I guess is to rule that the address of array[upb+1] | must always be legal; in practice this means the implementation has to | leave dead space at the end of each memory segment. That is exactly what is done in the current proposed ANSI C standard; the address is legal to compute, although dereferencing the address is undefined. Only a single byte of "dead space" is required for this. -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
ch@maths.tcd.ie (Charles Bryant) (02/09/89)
In article <389@bilver.UUCP> bill@bilver.UUCP (bill vermillion) writes: >In article <186@aucsv.UUCP> ok@aucsv.UUCP (Richard Okeefe) writes: >>Before arguing about whether big-endian order or little-endian >>order is "more natural" for people, >> >>So _both_ conventions are "natural" in human writing systems. > >And mixed conventions are considered normal in spoken English. > >Consider that, for example twenty-five or thirty-six would fit the >"big-endian" defintion, the numbers thir-teen, four-teen, would be >considered "little endian" "Twenty-five" etc do not fit into _either_ big-or little- endian categories. Nor do most numbers in English because it isn't a positional system. It is possible to speak it backwards without losing meaning: five [and] twenty. (five twenty is a time!). It is spoken with the most significant part first to allow an estimate of the size to be made easily, I suppose, and obviously if the number is spoken without qualifiers like "hundred" this is impossible. Numbers for computers are always (as far as I know) given as a fixed size object (in programs) of as a string of digits (most I/O) where it is either unnecessary or impossible to estimate the magnitude of a number without having it all. -- Charles Bryant. Working at Datacode Electronics Ltd.
gnb@melba.bby.oz (Gregory N. Bond) (02/09/89)
In article <8480@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes:
[ Re: while (*p++ = *q++); ]
.This is a neat and beautiful idiom in PDP-11 Assembler. There is,
.however, one problem with the equivalent C code: it is incorrect.
.After termination of the loop, the variables p and q, though declared
.of type 'pointer-to-char', will hold values that do not point to
.declared or allocated objects of type 'char'. Should you ever have
.the misfortune to port this code to a machine with hardware segmentation,
.and automatic segment bounds checking as part of the address arithmetic,
.(or be a consultant involved in such a port), you will face this
.problem.
.
.Could someone inform me whether the current C standard has fixed this?
.The simplest answer I guess is to rule that the address of array[upb+1]
.must always be legal; in practice this means the implementation has to
.leave dead space at the end of each memory segment.
This is only a problem if p or q are dereferenced after the loop. They
are (at least potentially) invalid addresses, but so is NULL. And if
it is legal for p to be NULL, it is legal for it to point nowhere. And
if it is dereferenced, SIGSEGV it, just as with NULL pointers. No need
to fix the ANSI doc, nor to allocate dead space. It's not incorrect
(IMHO!)
[ No, we don't have comp.lang.c in Australia. Sorry. ]
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,mnetor,pyramid,ubc-vision,ukc,mcvax,...}!munnari!melba.bby.oz!gnb
wen-king@cit-vax.Caltech.Edu (King Su) (02/09/89)
In article <21557@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >Now, DEC just turned down the chance to start evolving in the direction <of common formats with their new RISC machine, so, the best we can hope >for now is a common interchange file format that all machines would create <when data interchange is required. I suggest that the time is ripe for >the development of such a standard. One small request - people who are <working on creating such a standard please include 64 bit integer >(and floating point, but that is usual) formats - there are quite a few <uses for 64 bit integers, and some machines that really need access to >integers at least 48 bits long. You mean DEC has finally decided to go big-endian? That is news to me. The little-endian format is the current dominate format - remember all the IBM PC's and their clones. To evolve in the direction of a common format would mean to take the little-endian route. I would say that the day we have a common format will come a day after US adopts the metric system. The SUN's XDR library has already provided us with a common interchange file format. It probably does not address 64 bit integers. -- /*------------------------------------------------------------------------*\ | Wen-King Su wen-king@vlsi.caltech.edu Caltech Corp of Cosmic Engineers | \*------------------------------------------------------------------------*/
firth@sei.cmu.edu (Robert Firth) (02/09/89)
In article <8480@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: | Could someone inform me whether the current C standard has fixed this? | The simplest answer I guess is to rule that the address of array[upb+1] | must always be legal; in practice this means the implementation has to | leave dead space at the end of each memory segment. In article <24384@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >That is exactly what is done in the current proposed ANSI C standard; >the address is legal to compute, although dereferencing the address is >undefined. Only a single byte of "dead space" is required for this. Thanks, Tim, and others who mailed me. There is a copy of the latest ANSI C in this building, but it seems to have wandered off, so I can't look this up for myself readily. However, is only a single byte required? Suppose you have an array of a struct; is it legal to compute the address of array[upb+1].component? If so, then you really do need to allocate a complete dead array element.
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/10/89)
In article <9468@cit-vax.Caltech.Edu> wen-king@cit-vax.UUCP (Wen-King Su) writes: >In article <21557@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >The little-endian format is the current dominate format - remember all >the IBM PC's and their clones. To evolve in the direction of a common I did not mean to imply that DEC has decided to go Big OR Little Endian. DEC's new machine is touted (I haven't seen an architecture manual, so I can't vouch for the accuracy of it) to have duplicated the stubbornly Middle Endian formats of the VAX. Now, why SHOULD DEC build a consistent (big or little endian) data format machine? I don't know, consistency is the bugaboo of small minds, after all. ( Quiz: Figure out, in your head, which byte (offset from the address in memory) of a DEC F format fp number contains the least significant bits of the fraction (multiple choice): a) byte 0 b) byte 1 c) byte 2 d) byte 3 e) I dunno, I never could figure it out. But real programmers don't use f.p. ) >The SUN's XDR library has already provided us with a common interchange >file format. It probably does not address 64 bit integers. I remember seeing a definition of XDR a couple of years ago - but in the context of RPC. Is there an XDR FILE format definition? NFS certainly doesn't translate to/from it. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
tim@crackle.amd.com (Tim Olson) (02/10/89)
In article <8496@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes: | However, is only a single byte required? Suppose you have an array of | a struct; is it legal to compute the address of array[upb+1].component? | If so, then you really do need to allocate a complete dead array element. It is not legal to compute the address of array[upb+1].component (at least if component has a non-zero offset from the beginning of the structure). -- Tim Olson Advanced Micro Devices (tim@crackle.amd.com)
peter@ficc.uu.net (Peter da Silva) (02/10/89)
In article <3300050@m.cs.uiuc.edu>, gillies@m.cs.uiuc.edu writes: > while (*p++ = *q++); p, q in registers: tst (rq) beq pool loop: movb (rq)+,(rp)+ bne loop pool: > len = *p++ = *q++; > while(len-->0) > *p++ = *q++; movb (rq)+,rtemp movb rtemp,(rp)+ beq pool loop: movb (rq)+,(rp)+ sob rtemp,loop ; not at all sure of the syntax here pool: Two instructions for the loop, either way. But the former is more likely to be implemented by a dumb compiler... what did Ritchie's compiler do for it with p and q in registers? -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Work: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. `-_-' Home: bigtex!texbell!sugar!peter, peter@sugar.uu.net. 'U` Opinions may not represent the policies of FICC or the Xenix Support group.
guy@auspex.UUCP (Guy Harris) (02/10/89)
>The SUN's XDR library has already provided us with a common interchange >file format. It probably does not address 64 bit integers. Nope, it does. The datatype names are "hyper" and "unsigned hyper". XDR is big-endian, BTW. (So is the 68K; this may or may not be a coincidence :-).)
guy@auspex.UUCP (Guy Harris) (02/10/89)
>I remember seeing a definition of XDR a couple of years ago - but in the >context of RPC. XDR's spec tends to be bundled with RPC's spec, but XDR is a data representation format that can be used on a spinning oxide coated platter, or a strip of oxide-coated plastic, just as it can be used on a wire. The UNIX XDR from Sun implementation can stuff XDR'ed data into memory or pick it up from memory, or write it to a standard I/O stream or read it from a standard I/O stream. You can, in fact, define your own XDR stream implementation types, if the canned ones that come with the user-mode RPC library (standard I/O, memory, and "record stream" - can be used over a TCP connection, or into and out of a file, or....) won't do what you want. >Is there an XDR FILE format definition? Well, there are the formats generated and read by the standard I/O and "record stream" XDR stream types. I suspect the standard I/O format consists of a sequence of the encoded objects, shoved to the file as bytes in the format specified by the XDR spec. The "record stream" format is probably similar, but with some form of "record marks" in the stream (the documentation claims the record marking mechanism is described in "Advanced Topics", but it's not described there in the ONC/NFS Protocol Specifications and Services Manual). >NFS certainly doesn't translate to/from it. No, and I doubt it could do so, given that many file formats are not self-describing. If you want to maintain a file over NFS that's readable and writable by clients with different "native" data representations, you're probably best off using XDR in some form to write the data out (either XDR into memory and write/read and XDR from memory, or use XDR over standard I/O, or XDR over "record stream", or...).
new@udel.EDU (Darren New) (02/10/89)
In article <21609@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >In article <9468@cit-vax.Caltech.Edu> wen-king@cit-vax.UUCP (Wen-King Su) writes: >>In article <21557@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >>The SUN's XDR library has already provided us with a common interchange >>file format. It probably does not address 64 bit integers. > >I remember seeing a definition of XDR a couple of years ago - but in the >context of RPC. Is there an XDR FILE format definition? NFS certainly >doesn't translate to/from it. > If you are really looking for a COMMON interchange format, ASN.1 is the way to go. XDR is nice when you are working with C, but defining something like a font in a way that is machine independent requires much extra information in terms of order of bits in a byte and so on. XDR is also not self-delimiting, does not handle time or alternate character sets well (last I looked), and cannot be parsed without knowledge of the XDR functions used to encode the higher-level constructs. ASN.1, having been standardized by ISO (I can find the number if anyone wants it), is available world-wide. ASN.1 also addresses integers, strings, etc as big as you want. Unfortunately, it does not at this time have a single, standard floating-point format, but this is easy to add on an application basis. This topic seems to be wanderring from comp.arch somewhat... - Darren
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/11/89)
In article <8475@louie.udel.EDU> new@udel.EDU (Darren New) writes: >If you are really looking for a COMMON interchange format, ASN.1 is the way to >go. XDR is nice when you are working with C, but defining something like a font >etc as big as you want. Unfortunately, it does not at this time have a single, >standard floating-point format, but this is easy to add on an application But floating point IS my problem. Yes, it IS easy to add on an application basis, just not on a hundreds-of-applications basis. (Text data is easy to convert on an application basis also between character sets of different types. But when you want to create text files that can be read by a large number of applications you are going to write in the future, you use ASCII. In the future, perhaps you will be able to use ASN.1 data streams.) It also can be expensive, even though it is "easy", when you need to translate large quantities of data between different data types. (End of sermon :-) ) Anyway, does ASN.1 define some kind of file structure? (Since this is USENET, we won't use the R-word (a logical r*c*rd for you old-timers over thirty)). Are the data types defined in the structure somewhere, so that a conversion program can figure out what it is converting from/to? Is there a well defined library with C and Fortran bindings that an applications programmer can use to read and write ASN.1 files with? Will the cost of using ASN.1 structured data approach zero as the structures become large (arrays of 10000 floating point numbers, for example)? If the answer is yes to all these questions, I would like to know more. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117