[comp.unix.wizards] Size of SysV "block"

dave@astra.necisa.oz (Dave Horsfall) (06/25/87)

Can anyone tell me in an unambiguous manner just how many bytes
are in the following System V (2.?) "blocks"?  Make any assumptions
uou like, if it makes a difference.  I have yet to see a clear
reference on this matter.  (BSD people stop laughing, I'll bet you're
not much better off!)

	BUFSIZ
	ls -s
	du
	df
	tar -b
	cpio -B
	mkfs
	fsck
	fsdb

	and so forth - I'm sure I missed some ...
-- 
Dave Horsfall (VK2KFU)           TEL: +61 2 438-3544   FAX: +61 2 439-7036
NEC Information Systems Aust.    ACS: dave@astra.necisa.oz (also CSNET) 
3rd Floor, 99 Nicholson St      ARPA: dave%astra.necisa.oz@seismo.css.gov
St. Leonards  NSW  2064         UUCP: {enea,hplabs,mcvax,prlb2,seismo,ukc}!\
AUSTRALIA                             munnari!astra.necisa.oz!dave

mats@forbrk.UUCP (Mats Wichmann) (06/30/87)

In article <218@astra.necisa.oz> dave@astra.necisa.oz (Dave Horsfall) writes:
>Can anyone tell me in an unambiguous manner just how many bytes
>are in the following System V (2.?) "blocks"? 

>	ls -s   du   df   tar -b   cpio -B   mkfs   fsck   fsdb
All of the preceding are 512 byte "blocks" and refer to "disk" blocks;
it is left at 512 to avoid having to change things around on systems which
support different logical block sizes on different file systems and just
for general consistency (let's see, this is a Frozzboz 1000, the blocks must
875 bytes each, unlike the 1500, where they are 950 each...).

>	BUFSIZ
This is the stdio buffer size and varies from system to system, although
it seems to be 1024 for most V.2 implementations - should be the same size 
as the largest allowable file system logical block.

Note that there are programs, such as CPIO, which take an argument (-B
in the case of cpio) which seems to indicate that the block size is
changed; really this sets the "blocking factor" - how many blocks to
collect before doing the physical write/read. The number reported by
cpio when it finishes is still in terms of 512-byte blocks.

Mats Wichmann

allbery@ncoast.UUCP (Brandon Allbery) (07/01/87)

As quoted from <218@astra.necisa.oz> by dave@astra.necisa.oz (Dave Horsfall):
+---------------
| Can anyone tell me in an unambiguous manner just how many bytes
| are in the following System V (2.?) "blocks"?  Make any assumptions
| uou like, if it makes a difference.  I have yet to see a clear
| reference on this matter.  (BSD people stop laughing, I'll bet you're
| not much better off!)
+---------------

There is no single "block" size.  Instead, there are "tape blocks" -- usually
512 bytes for historical reasons (and thereby compatibility) and "disk blocks"
(which used to be 512 bytes but on most modern systems are 1024).  If you have
an improperly ported System V (or a clone System V that was never within 10
miles of AT&T since it was V7 compatible), you may have 512-byte blocks no
matter what, or some/all of the programs may not have been changed to reflect
the new block size.  WOrse, System V can handle both kinds of filesystems,
so you may have a partition with 512-byte blocks and one with 1024-byte
blocks....

In general, tape-oriented utilities (tar, dd, cpio) use tape blocks (512
bytes) and disk utilities (including stdio) use 1024-byte disk blocks even
on 512-byte block file systems.  (Low-level utilities like mkfs and fsck/fsdb
will use the actual block size of the file system.)

++Brandon
-- 
     ---- Moderator for comp.sources.misc and comp.binaries.ibm.pc ----
Brandon S. Allbery	<BACKBONE>!cbosgd!ncoast!allbery
aXcess Company		{ames,mit-eddie,harvard,talcott}!necntc!ncoast!allbery
6615 Center St. #A1-105	{well,sun,pyramid,ihnp4}!hoptoad!ncoast!allbery
Mentor, OH 44060-4101	necntc!ncoast!allbery@harvard.HARVARD.EDU (Internet)
+01 216 974 9210	ncoast!allbery@CWRU.EDU (CSnet)
			Brandon Allbery on 157/504 (Fidonet/Matrix/whatever)

daryl@ihlpe.UUCP (07/09/87)

In article <348@forbrk.UUCP>, mats@forbrk.UUCP (Mats Wichmann) writes:
> In article <218@astra.necisa.oz> dave@astra.necisa.oz (Dave Horsfall) writes:
> >Can anyone tell me in an unambiguous manner just how many bytes
> >are in the following System V (2.?) "blocks"? 
> 
> >	ls -s   du   df   tar -b   cpio -B   mkfs   fsck   fsdb
> All of the preceding are 512 byte "blocks" and refer to "disk" blocks;
> >	BUFSIZ
> This is the stdio buffer size and varies from system to system, although
> it seems to be 1024 for most V.2 implementations - should be the same size 

UTS block sizes on our machines differ from "normal" VAX/sun/most things.

To this day I am not sure why we cannot use either bytes, Kbytes, Mbytes, etc.
instead of blocks.


Daryl Monge				UUCP:	...!ihnp4!ihcae!dlm
AT&T					CIS:	72717,65
Bell Labs, Naperville, Ill		AT&T	312-979-3603

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/10/87)

In article <1852@ihlpe.ATT.COM> daryl@ihlpe.ATT.COM (Daryl Monge) writes:
>To this day I am not sure why we cannot use either bytes, Kbytes, Mbytes, etc.
>instead of blocks.

How big is a "byte"?  (No, it's not necessarily 8 bits!)
How about sizing things in terms of number of bits, which is a
universal measure.

rjd@tiger.UUCP (07/13/87)

>>To this day I am not sure why we cannot use either bytes, Kbytes, Mbytes, etc.
>>instead of blocks.
> 
> How big is a "byte"?  (No, it's not necessarily 8 bits!)
> How about sizing things in terms of number of bits, which is a
> universal measure.

   O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight bits!!!
Maybe you are thinking of a word??  And a nibble is four bits, and a gulp is
sixteen bits (or was this a mouthful?), etc....

Randy

guy%gorodish@Sun.COM (Guy Harris) (07/14/87)

>    O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight
> bits!!!

Don't say that within earshot of any PDP-10 aficionados....
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

roy@phri.UUCP (Roy Smith) (07/15/87)

In article <142700010@tiger.UUCP> rjd@tiger.UUCP writes:
> O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight bits!!!
> Maybe you are thinking of a word??  And a nibble is four bits, and a gulp is
> sixteen bits (or was this a mouthful?), etc....

	No, no, no, a thousand times NO!  A byte is NOT NECESSARILY 8 bits!
Granted, on most of the popular machines you are likely to see today (Vax,
PDP-11, 680x0, 320xx, 80x86, Pyramid, etc, a byte is 8 bits, but that
doesn't mean it has to be.  A byte is simply some collection of contigious
bits taken as a unit.  Often a byte is that number of bits which most
comfortably holds a single character in the machine's native character
code, but not always.  Often the number of bits in a byte is dictated by
the underlying machine architecture, but that's not a hard and fast rule
either.  I could write a program on a Vax to read a file in 7-bit bytes if
I wanted to.  In fact, if I wanted to read DEC-10 tapes I would have to
write just such a program (and I once did).

	On a DEC-10/20, for example, a byte can reasonably be anything from
1 (0?) to 36 (35?) bits; 6, 7, and 9 bit bytes are all quite common and if
anything, I would say an 8-bit byte on a DEC-10/20 is a mite strange.  I'm
not sure byte even has a real meaning on a machine like a Cray.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

malcolm@spar.SPAR.SLB.COM (Malcolm Slaney) (07/15/87)

In article <142700010@tiger.UUCP> rjd@tiger.UUCP writes:
>
>> How big is a "byte"?  (No, it's not necessarily 8 bits!)
>
>   O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight 
>   bits!!!

You and I both know this....but tell that to the Common Lisp people.

In "Common Lisp, The Language" by Guy Steele, 1984.  (page 225)

	Several functions are provided for dealing with an arbitrary-
	width field of contiguous bits appearing anywhere in an integer/
	Such a contiguous set of bits is called a "byte".  Here the
	term "byte" does not imply some fixed number of bits (such as
	eight) rather a field of arbitrary and user-specifiable width.

ARGGHHHHH.....Talk about making it difficult to move software between 
a Symbolics machine (which is where the screwy standard came from, I
think) and a Unix machine.

							Malcolm

melohn%sluggo@Sun.COM (Bill Melohn) (07/15/87)

In article <23436@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>>    O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight
>> bits!!!
>
>Don't say that within earshot of any PDP-10 aficionados....

Yes, you've clearly bit off more than you can chew. An Octet (as described
in the TCP/IP RFCs) IS 8 bits; bytes are arbitrary both in size and order
within the large variety of machine architectures.

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (07/15/87)

In article <2792@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>In article <142700010@tiger.UUCP> rjd@tiger.UUCP writes:
>> O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight bits!!!
>> Maybe you are thinking of a word??  And a nibble is four bits, and a gulp is
>> sixteen bits (or was this a mouthful?), etc....
>
Let me clarify this:
  8 bits is a byte
  4 bits is a nybble
  2 bits is a tayste (actually 2 bits is a quarter)

36 bit machines usually support at least 6 and 9 bit bytes in hardware,
although I'm sure someone will write and tell me that their machine is
not only obsolete but brain-damaged as well and doesn't have any
hardware bytes.

36 bit machines were a great idea which fell by the wayside... the extra
bit in the byte allows many extended character sets (ASCII + 384
others), the short is +/-262144, large enough for many applications, and
the long is +/-64*10^9, which will hold almost any real world value.

When most of our applications were moved from a Honeywell to vaxen and
an IBM, we did a lot of conversion to long, double, and real*8, because
the number of significant digits dropped to <1.

    ================================================================
    |   Please any followup discussion of archetecture to 	   |
    |   comp.arch not wizards!					   |
    ================================================================

-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {chinet | philabs | sesimo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

jhh@ihlpl.ATT.COM (Haller) (07/16/87)

In article <2792@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:

> 	No, no, no, a thousand times NO!  A byte is NOT NECESSARILY 8 bits!
> 
> 	On a DEC-10/20, for example, a byte can reasonably be anything from
> 1 (0?) to 36 (35?) bits; 6, 7, and 9 bit bytes are all quite common and if
> anything, I would say an 8-bit byte on a DEC-10/20 is a mite strange.  I'm
> not sure byte even has a real meaning on a machine like a Cray.
> Roy Smith, {allegra,cmcl2,philabs}!phri!roy

This is why the standards organizations use the term octet rather than
byte.  Almost all data networks, and certainly all of the protocol
information (headers, etc) are octet aligned, making life very
difficult for those manufacturers with "wierd" machines.  Unfortunately,
mega-octets and giga-octets doesn't have quite as nice a ring as
megabyte and gigabyte.

John Haller

Isaac_K_Rabinovitch@cup.portal.com (07/16/87)

>>    O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight
>> bits!!!
>
>Don't say that within earshot of any PDP-10 aficionados....
>        Guy Harris
>        {ihnp4, decvax, seismo, decwrl, ...}!sun!guy
>        guy@sun.com

No, the basic unit on a PDP 10 is not a "byte" it's a "word".  "Word"
was universal nomenclature for unit of data before IBM introduced the 360,
the first byte-oriented machine.

An old IBMer once told me that "byte" was a Olde English word meaning
"syllable".  Never been able to confirm this.

devine@vianet.UUCP (Bob Devine) (07/16/87)

In article <2792@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
> Maybe you are thinking of a word??  And a nibble is four bits, and a gulp is
> sixteen bits (or was this a mouthful?), etc....

In article <6705@steinmetz.steinmetz.UUCP>, davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) writes:
>   8 bits is a byte
>   4 bits is a nybble
>   2 bits is a tayste (actually 2 bits is a quarter)

  This a reposting of the results of a question I asked last year.  I had
asked what to a grouping of bits.  It all started because I originally
thought that "crayte" would be a marvelous name for a 64-bit group.

Bob Devine

+++++++++++++++++++++++++++++++++++++++++++++++++++++++
     1 bit  == bit[?], byt[1], singlet[4]
     2 bits == quarter[0], dibit[4], doublet[4]
     4 bits == nybble[?], nibble[4], quadlet[4]
     8 bits == byte[?], octlet[4]
    16 bits == gulp[2], dysh[3], hexlet[4], playte[5], gulp[6],
	       snack[7,8], chomp[9]
    32 bits == box[2], coarse[3], triclet[4], plattyr[5], mouthful[6],
               meal[7,8], snarf[9]
    64 bits == crayte[0], meel[3], sexlet[4], feast[7,8], gobble[9]
     * bits == buffet[3]

Contributors:
    [?] unknown
    [0] vianet!devine (Bob Devine)
    [1] uiucdcs!mcewan (Scott McEwan)
    [2] ima!haddock!karl  (Karl W. Z. Heuer)
    [3] iuvax!bobmon  (Robert Montante)
    [4] ccvaxa!aglew (Andy "Krazy" Glew)
    [5] sphinx!eric (Eric M. Nelson)
    [6] reed!jeanne (Jeanne A. E. DeVoto)
    [7] uu.warwick.ac.uk!kay (Kay Dekker)
    [8] necis!schuldy  (Mark Schuldenfrei)
    [9] decuac!bagwill (Bob Bagwill)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/16/87)

In article <142700010@tiger.UUCP> rjd@tiger.UUCP writes:
>A byte IS eight bits!!!

No, that started with IBM's System/360 and gained further support from
the PDP-11.  Before that time, and since, many architectures have either
had other fundamental address unit sizes (e.g. 6 or 9 bits) or have
supported variable-sized bytes (e.g. CDC).  An 8-bit byte is simply not
a suitable unit of measure for systems whose fundamental memory unit
size is not an integral multiple of 8 bits.

guy%gorodish@Sun.COM (Guy Harris) (07/17/87)

> No, the basic unit on a PDP 10 is not a "byte" it's a "word".

I didn't say a byte was the *basic* unit of memory on the 10.  It
most definitely *did* have the notion of a "byte" in the instruction
set, however (consider the Load Byte, Store Byte, Increment Byte
Pointer, etc. instructions).  Byte pointers indicated the size of the
byte, so there was no single byte size in the hardware; I think the
original software packed 5 7-bit bytes in a word, with one bit left
over.

> "Word" was universal nomenclature for unit of data before IBM introduced
> the 360, the first byte-oriented machine.

Not quite.  The IBM 7030 or "Stretch" supported bit addressing; it
used an 8-bit byte to store characters.  I don't know if they used
the term "byte", but it definitely supported access to bytes.  (And,
if you don't want to consider character-oriented machines like the
14xx series to be "byte-oriented", it's still byte-oriented; Stretch
was not one of those machines.)  I suspect there were other machines
of the general flavor of the 360 out before the 360, as well.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

dgk@ulysses.homer.nj.att.com (David Korn[eww]) (07/17/87)

I believe that in the mid-seventies the CDC-STAR used the term
sword (super-word) to refer to a 512-bit quantum. 

At the time I remember that a 1024 bit word was going to be called a pen
for obvious reasons.  I have not heard these terms used since.  Maybe
Multi-flow was words form them as well.

David Korn
{ihnp4|allegra}ulysses!dgk

henry@utzoo.UUCP (Henry Spencer) (07/17/87)

> 	On a DEC-10/20, for example, a byte can reasonably be anything from
> 1 (0?) to 36 (35?) bits; 6, 7, and 9 bit bytes are all quite common...

Another example worth mentioning is the BBN C/70 and its kin, which have
10-bit bytes as I recall.  This isn't quite the same situation as the
DEC-20, which has 36-bit words and a rather fuzzy notion of (sort of)
bitfields within them; on the C/70, the division of memory into bytes is
just as fixed as it is on (say) a VAX, but the bytes are 10 bits wide,
no more, no less.  There are also machines with 9-bit bytes, although one
seldom sees them in the Unix world.

And then there's the PDP-8, where you get your choice of 12-bit bytes (ugh)
or 6-bit bytes (ARGH)...
-- 
Support sustained spaceflight: fight |  Henry Spencer @ U of Toronto Zoology
the soi-disant "Planetary Society"!  | {allegra,ihnp4,decvax,utai}!utzoo!henry

rjd@tiger.UUCP (07/18/87)

>> O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight bits!!!
>> Maybe you are thinking of a word??  And a nibble is four bits, and a gulp is
>> sixteen bits (or was this a mouthful?), etc....
> 
> 	No, no, no, a thousand times NO!  A byte is NOT NECESSARILY 8 bits!
> .... more on this....

  You sound convincing, and I would like to think that you were right, but I
still have my doubts.  The way you are describing a byte:

"....A byte is simply some collection of contigious bits taken as a unit. Often
a byte is that number of bits which most comfortably holds a single character
in the machine's native character code, but not always.  Often the number of
bits in a byte is dictated by the underlying machine architecture, but that's
not a hard and fast rule either."

  This is a word!!  On the machines I most commonly work on, even at the
hardware design level, the word size is 32-bit (true 32-bit), and have
memory sizes specified in bytes - 8-bit bytes!!  The machine uses ASCII,
as do most except IBM, and ASCII is based on seven bits.  So there would
be no reason to use a byte meaning 8-bits unless it WAS so.

  I HAVE AN IDEA!!! Lets look it up........ (turning pages on my Webster's):

byte - n. [arbitrary formation, < BITE ] a string of binary digits, usually
    eight, operated on as a basic unit by a digital computer.

word - ...... 8. an ordered combination of characters carrying at least one
    meaning that is stored in one location in a computer and that is regarded
    as a unit when stored or transferred by the computers circuits.

  I guess you are right, yet I think that common usage dictates a byte be eight
bits.  A very good point you have brought up, though, as I thought I KNEW a byte
to ONLY be eight bits, and there seems to be a point of ambiguity here....

Randy

bzs@bu-cs.BU.EDU (Barry Shein) (07/20/87)

Posting-Front-End: GNU Emacs 18.41.4 of Mon Mar 23 1987 on bu-cs (berkeley-unix)



Some more suggestions:

In honor of the page size of a Vax:

	512 bits == nanopage

1024 should be called a Kbit, there's just no choice, sorry, not funny.

	-Barry Shein, Boston University

roy@phri.UUCP (Roy Smith) (07/20/87)

	Somebody, somewhere, some time ago, in some article wrote:

> How about sizing things in terms of number of bits, which is a
> universal measure.

	This unleashed a torrent of silly and not-so-silly articles on the
definition of a byte and cute names for N-bit chunks (to which I confess
contributing), but nobody really addressed this guy's question.  So...

	Yes, clearly bits is a more precise unit of information than bytes.
The problem with reporting file sizes in bits is that most of the time it's
not what people want to know.  What they really want to know is how many
characters long the file is (notice I said characters, not bytes).  If I
took a text file on a vax that was N characters long and moved it to a
DEC-20, it would still be N characters long.  Maybe my Vax uses 8 bits per
character and your DEC-20 uses 7-1/5 bits per character, but I don't want
to know about that (usually).

	On Unix, ls would show a byte count and on TOPS-20, DIR would show
a word count.  These numeric values would be different, but the number of
characters wouldn't have changed.  Well, maybe that's a bad example because
TOPS-20 would turn all the newlines into carriage-return/newline pairs, but
you get the idea.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

Isaac_K_Rabinovitch@cup.portal.com (07/20/87)

According to the Oxford English Dictionary Supplement, the "byte" is simply
and purely an IBM invention.  So it means whatever IBM says it does.

dhesi@bsu-cs.UUCP (Rahul Dhesi) (07/20/87)

In article <2792@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>In article <142700010@tiger.UUCP> rjd@tiger.UUCP writes:
>> O.K., I'll byte.  (oops, pun initially unintended.)   A byte IS eight bits!!!
>> Maybe you are thinking of a word??  And a nibble is four bits, and a gulp is
>> sixteen bits (or was this a mouthful?), etc....
>
>	No, no, no, a thousand times NO!  A byte is NOT NECESSARILY 8 bits!
>Granted, on most of the popular machines you are likely to see today (Vax,
>PDP-11, 680x0, 320xx, 80x86, Pyramid, etc, a byte is 8 bits, but that
>doesn't mean it has to be.  A byte is simply some collection of contigious
>bits taken as a unit. 

On modern machines, a byte is 8 bits.  

On obsolete hardware a byte can be of arbitrary size.

Since we are now in the 1980s going on to the 1990s, I think it's about
time we streamlined our terminlogy to reflect the times.

A byte is therefore exactly 8 bits.  No more and no less.  Opinions to
the contrary belong in the 1960s.  Let them lie there and die there.

It's time to upgrade from your 12-bit PDP-8 or your 60-bit CDC or your 
36-bit DEC-20 to a new architecture.

In his book "Reliable Data Structures in C", Thomas Plum gives portable
implementations of the memxxx functions (e.g.  memset(), memcpy()).  He
does not feel the need to point out that these are portable only if the
machine's word will hold exactly an integral number of chars.

If you are moving with the times, welcome aboard.  If you have your
feet firmly planted in the 1960s, lots of luck;  you will need it.
-- 
Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi

daryl@ihlpe.ATT.COM (Daryl Monge) (07/21/87)

I think that everyone is getting side tracked on the original issue; that is
what is a block as reported by many UNIX utilities?

Clearly "block" is not useful, especially as file systems get more complex
and the notion of a "block" gets confused.

However, "bit" is useless in terms of user friendliness.  Imagine:
-rwxr-x---   1 daryl    daryl     3102120bits Feb  6 22:40 gmacs

The number of bits in a byte is not relevant.  Perhaps we should use the
word (:-) "character", since at least to me that has some real world meaning.
ex:
/e31          (/dev/dsk/36bs2):     12632K characters   33572 unique files

Comments?


Daryl Monge				UUCP:	...!ihnp4!ihcae!daryl
AT&T					CIS:	72717,65
Bell Labs, Naperville, Ill		AT&T	312-979-3603

beattie@netxcom.UUCP (Brian Beattie) (07/21/87)

As I recall the term "byte" was coined by IBM for the 1401.  The 1401
had a variable length word delimited by a "word mark".  A "byte" was the
smallest addresable object.  Though usage it has come to mean the smallest
object larger than 1 bit but less than a "word" that can be manipulated
by the CPU (a word being the "natural object" of the CPU).

PS.
the 1401 had a 8bit byte.
-- 
-----------------------------------------------------------------------
Brian Beattie			| Phone: (703)749-2365
NetExpress Communications, Inc.	| uucp: seismo!sundc!netxcom!beattie
1953 Gallows Road, Suite 300	|
Vienna,VA 22180			|

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/21/87)

In article <857@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>A byte is therefore exactly 8 bits.  No more and no less.  Opinions to
>the contrary belong in the 1960s.  Let them lie there and die there.

And people who believe that 8 bits is sufficiently to encode a
character are either naive or stupid.

beede@hubcap.UUCP (Mike Beede) (07/22/87)

in article <857@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP (Rahul Dhesi) says:
> 
> [ much deleted ]
>
> Since we are now in the 1980s going on to the 1990s, I think it's about
> time we streamlined our terminlogy to reflect the times.
> 
> A byte is therefore exactly 8 bits.  No more and no less.  Opinions to
> the contrary belong in the 1960s.  Let them lie there and die there.
> 
> [ more deleted ]

I suppose that you've allowed for all possible increases in character
set size, possibly including fonts encoded on a per-character basis?
And any advances in technology, too?

While we're at it, let's standardize on whatever machine you like,
with all the ``modern'' features, and get rid of all these other nasty
architectures with their own ideosyncratic features.

Oh well, all right   :-> / 2.

Seriously--different machines serve different purposes, and so are designed
differently.  That is why it is foolish to freeze some design parameter
arbitrarily.  I don't see that there is, for instance, a clear argument
against 36 bit words and 9 bit bytes as opposed to 32 bit words and 8 bit
bytes, especially if your application works well with 9 bit quantities.
-- 
Mike Beede                      
Computer Science Dept.          UUCP: . . . !gatech!hubcap!beede
Clemson University              INET: beede@hubcap.clemson.edu
Clemson SC 29631-1906           YOUR DIME: (803)656-{2845,3444}

seifert@doghouse.gwd.tek.com (Snoopy) (07/22/87)

In article <9815@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:

>1024 should be called a Kbit, there's just no choice, sorry, not funny.

The plural of kbit is of course kibitz, as in "How much netnews came
in last night?"  "Lots of kibitz."   :-)

Snoopy
tektronix!doghouse.gwd!snoopy
snoopy@doghouse.gwd.tek.com

"And it's a middle-endian machine with trinary logic."
"They would do that!"

greywolf@unisoft.UUCP (The Grey Wolf @ ext 165) (07/22/87)

In article <6144@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <857@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>>A byte is therefore exactly 8 bits.  No more and no less.  Opinions to
>>the contrary belong in the 1960s.  Let them lie there and die there.
>
>And people who believe that 8 bits is sufficiently to encode a
>character are either naive or stupid.

And people who believe that there is a nice, neat way to implement an alter-
nate solution throughout the computer industry are foolish, buttheaded and/or
not very capable of mental throughput.  [ or they own a Cray, where everything
is a double (64 bits) anyway...]

	What is the problem here?  I see nothing wrong eight bits for a
character.  Can you come up with anything better?  What's the matter?
Are escape sequences for special characters too much for you to handle?
Gimme a break.

			Disgusted that this discussion is even *happening*,

				Roan Anderson

			...sun!  \
			...ucbvax!>unisoft!  \
			...dual! /	      \
					       >greywolf
			..sun!island!unicom!  /
			..ucbvax!well!unicom!/

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/23/87)

In article <463@unisoft.UUCP> greywolf@unisoft.UUCP (The Grey Wolf @ ext 165) writes:
>I see nothing wrong [with] eight bits for a character.

I take it you don't pay much attention to the rest of the world, then.

jhh@ihlpl.ATT.COM (Haller) (07/23/87)

In article <326@hubcap.UUCP>, beede@hubcap.UUCP (Mike Beede) writes:
> Seriously--different machines serve different purposes, and so are designed
> differently.  That is why it is foolish to freeze some design parameter
> arbitrarily.  I don't see that there is, for instance, a clear argument
> against 36 bit words and 9 bit bytes as opposed to 32 bit words and 8 bit
> bytes, especially if your application works well with 9 bit quantities.

The clear argument against 36 bit words and 9 bit bytes is data
communications.  Like it or not, data communications have virtually
standardized the 8 bit byte.  Just try to generate TCP/IP headers
on a 9 bit machine, packing all of the data contiguously.  Oh,
you want to use a communications processor to do that?  How
many bits is its byte?

Look at some of the higher level ISO protocols, and you will
find that the basic unit of data is, surprise, surprise, an octet.
Oh, sure, there is support for arbitrary bit strings, but
even they are padded to octet boundaries.

Back in the 60's, and possibly up to the mid 70's, when 7 track
mag tape wasn't considered hopelessly obsolete, there was room
for argument on what size provided the best 'byte'.  However,
the de facto standard is an 8 bit byte, which is becoming more
and more institutionalized as time progresses.

Given that a byte is an important measure, byte addressability
becomes important in hardware architectures.  Given that
our machines operate with binary logic, word sizes are going
to be powers of two bytes long, just so that byte addresses
can be easily converted into word addresses, which is typically
related by a power of two to the memory and bus architecture.
Look at the Harris/6 if you want to see what kind of
contortions were necessary to provide byte addressability
with a 24 bit word size.

In summary, I agree that while there was no good technical reason
to have an eight bit byte originally, anyone designing a new
computer that does not have an eight bit byte will be doomed
to market failure.  If Univac's 1100 series had taken off better than
IBM's machines, I would probably be saying that six bit bytes
are the wave of the future.  That is not the case.

John Haller

louie@sayshell.umd.edu (Louis A. Mamakos) (07/24/87)

In article <2425@ihlpl.ATT.COM> jhh@ihlpl.ATT.COM (Haller) writes:
>In article <326@hubcap.UUCP>, beede@hubcap.UUCP (Mike Beede) writes:
>> Seriously--different machines serve different purposes, and so are designed
>> differently.  That is why it is foolish to freeze some design parameter
>> arbitrarily.  I don't see that there is, for instance, a clear argument
>> against 36 bit words and 9 bit bytes as opposed to 32 bit words and 8 bit
>> bytes, especially if your application works well with 9 bit quantities.
>
>The clear argument against 36 bit words and 9 bit bytes is data
>communications.  Like it or not, data communications have virtually
>standardized the 8 bit byte.  Just try to generate TCP/IP headers
>on a 9 bit machine, packing all of the data contiguously.  Oh,
>you want to use a communications processor to do that?  How
>many bits is its byte?

Well, gee whiz;  I've done just that. Having 9 bits/byte doesn't make this
task anymore difficult that having byteswapped (little-endian) hardware.  The
1100 communications hardware (serial communications lines, byte/block
multiplexor channels, etc) just use the lower 8 bits of each byte to put on
the wire.  When the packet arrives, it do the equivilent of ntohl() and ntohs()
on the appropriate header fields so that I can do arithmetic, etc on them.  On
a VAX, these operations byteswap,  on the 1100 they simply squeeze the 9th bit
out of each byte.  Of course, in actual implmentation its not called ntohl()
and its written in PLUS, not C.


Also, having an 1100 makes calculating the 1's complement checksum easy, as
its a ones-complement machine.  

Having 9 bits bytes comes in handy.  You can leave "cookies" in a character
string that are unique from any possible ASCII character value.

>Given that a byte is an important measure, byte addressability
>becomes important in hardware architectures.  Given that
>our machines operate with binary logic, word sizes are going
>to be powers of two bytes long, just so that byte addresses
>can be easily converted into word addresses, which is typically
>related by a power of two to the memory and bus architecture.
>Look at the Harris/6 if you want to see what kind of
>contortions were necessary to provide byte addressability
>with a 24 bit word size.
>
>In summary, I agree that while there was no good technical reason
>to have an eight bit byte originally, anyone designing a new
>computer that does not have an eight bit byte will be doomed
>to market failure.  If Univac's 1100 series had taken off better than
>IBM's machines, I would probably be saying that six bit bytes
>are the wave of the future.  That is not the case.
>

Actually, it would be nice to have 9 bit bytes.  Granted, there are many times
that I wish I have byte addressability, but the PLUS compiler compiler can 
generate some (rather clever) code to do it for me.  Actually, with PLUS I
can have arrays of arbitrarly long entities, as someone in a previous message
wanted.

Another thing to consider is that on a machine like the 1100 with 36 bit words,
you have more precision available both in the (single/double) integer and 
floating point formats.  This apparantly matters to some folks around here.


>John Haller

BTW, if anyone is interesting in obtaining the TCP/IP package that was written
for the 1100 at the University of Maryland, drop me some mail.


Louis A. Mamakos  WA3YMH    Internet: louie@TRANTOR.UMD.EDU
University of Maryland, Computer Science Center - Systems Programming

dg@wrs.UUCP (David Goodenough) (07/25/87)

In article <6144@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <857@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>>A byte is therefore exactly 8 bits.  No more and no less.  Opinions to
>>the contrary belong in the 1960s.  Let them lie there and die there.
>
>And people who believe that 8 bits is sufficiently to encode a
>character are either naive or stupid.

Well I've never yet had a problem communicating with any machine that uses
ASCII (American *STANDARD* Code for Information Interchange), and it's my
(possibly deluded :-) belief that there are a lot of machines out there that
do like I do and use 8 bit bytes for holding characters. Let's see - there
are Z80's (and maybe a couple of dozen other 8 bit micros), 8086 family,
ns32000 family, pdp-11, vax, 68000 family, Z8000, amd2900 family, etc.
etc. etc. Then we start looking at uarts and other communication devices-
we have the Z80 DART/SIO, 8080 devices, the 6502 ACIA, plus the countless
others that are not attached to any architecture. I don't know about the
rest of the world, but it looks to me as if 8 bit chars are here to stay.
(Just out of idle curiosity what size did you have in mind for a character,
and WHY?)
--
		dg@wrs.UUCP - David Goodenough

					+---+
					| +-+-+
					+-+-+ |
					  +---+

chris@mimsy.UUCP (Chris Torek) (07/26/87)

>In article <6144@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn) writes:
>>And people who believe that 8 bits is sufficiently to encode a
>>character are either naive or stupid.

In article <274@wrs.UUCP> dg@wrs.UUCP (David Goodenough) replies:
>Well I've never yet had a problem communicating with any machine that uses
>ASCII (American *STANDARD* Code for Information Interchange), ...
	--------
>(Just out of idle curiosity what size did you have in mind for a character,
>and WHY?)

No wonder people get the idea that Americans are parochial.  Americans
*are* parochial!  :-)

How many languages do you speak---or rather, how many do you *write*?
How many can you write while staying with 7-bit ASCII?

ISO Latin-1 helps; the `extra' characters allow me to write in
Deutsch (if I could) or Francois (look, there is one of those
missing letters already) or Espanol (there goes another one), but
does not do much for Hebrew (lost a bunch!) or Russian or (more
troublesome) Japanese or Chinese.  16 bits seems to work for Japanese
Kanji, but is, at least technically, not enough for Chinese (80,000+
symbols!).  Moreover, there are people who want all of these
simultaneously.

I think 32 bits should suffice.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

mats@forbrk.UUCP (Mats Wichmann) (07/28/87)

>> 	On a DEC-10/20, for example, a byte can reasonably be anything from
>> 1 (0?) to 36 (35?) bits; 6, 7, and 9 bit bytes are all quite common...
>Another example worth mentioning is the BBN C/70 and its kin, which have
>10-bit bytes as I recall....There are also machines with 9-bit bytes, 
> although one seldom sees them in the Unix world.
>
>And then there's the PDP-8, where you get your choice of 12-bit bytes (ugh)
>or 6-bit bytes (ARGH)...

Old programmer #1: You think you had it tough? When I were learning to 
    program, all I had were bits. I had to tie them together with string 
    anytime I wanted to do something.

Old Programmer #2: Bits? You had bits? You had it easy! All we had were.....

    ...	...

Okay, guys, enough already. Please?

-mats

karl@haddock.ISC.COM (Karl Heuer) (07/29/87)

In article <857@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>On modern machines, a byte is 8 bits.  [Anything else is archaic.] ...

>In his book "Reliable Data Structures in C", Thomas Plum gives portable
>implementations of the memxxx functions (e.g.  memset(), memcpy()).  He
>does not feel the need to point out that these are portable only if the
>machine's word will hold exactly an integral number of chars.

He doesn't need that restriction because the C language has already imposed
it.  But this has nothing to do with 8-bit bytes!  On a 36-bit machine, a byte
(in the C sense) *cannot* be 8 bits.  If Plum's implementation is portable, it
will still work on such a machine, with 9- or even 12- or 36-bit bytes.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

larry@kitty.UUCP (Larry Lippman) (07/29/87)

In article <358@forbrk.UUCP>, mats@forbrk.UUCP (Mats Wichmann) writes:
> >And then there's the PDP-8, where you get your choice of 12-bit bytes (ugh)
> >or 6-bit bytes (ARGH)...
> 
> Old programmer #1: You think you had it tough? When I were learning to 
>     program, all I had were bits. I had to tie them together with string 
>     anytime I wanted to do something.
> 
> Old Programmer #2: Bits? You had bits? You had it easy! All we had were.....

	operational amplifiers, 10-turn pots, patch cords, and a null meter!

	Sorry, but I had to finish this.  Anyone remember ANALOG computers
(especially those "personal" desktop versions made by EIA)?

<>  Larry Lippman @ Recognition Research Corp., Clarence, New York
<>  UUCP:  {allegra|ames|boulder|decvax|rocksanne|watmath}!sunybcs!kitty!larry
<>  VOICE: 716/688-1231        {hplabs|ihnp4|mtune|seismo|utzoo}!/
<>  FAX:   716/741-9635 {G1,G2,G3 modes}    "Have you hugged your cat today?" 

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/30/87)

Re: memcpy() etc.

It also is not yet settled whether the mem* functions are to handle (char)s
or some other type of object representing bytes, e.g. (short char)s.  At
the moment the C language does not distinguish between a byte and a char,
although it makes no presumption about the size of a char except that it
must be at least 8 bits (it can be larger).  The multiple-byte character
issue has not yet been decided, unless it happened at the Paris X3J11
meeting in June which I had to miss.

steve@nuchat.UUCP (Steve Nuchia) (07/30/87)

In article <1879@ihlpe.ATT.COM>, daryl@ihlpe.ATT.COM (Daryl Monge) writes:
> However, "bit" is useless in terms of user friendliness.  Imagine:
> -rwxr-x---   1 daryl    daryl     3102120bits Feb  6 22:40 gmacs
> word (:-) "character", since at least to me that has some real world meaning.
> ex:
> /e31          (/dev/dsk/36bs2):     12632K characters   33572 unique files
> 
> Comments?
> Daryl Monge

How bout:

drwxrwxrwx 11 foo baz    10 entries frotz     some_dir
---x--x--t  1 bin bin    `size myprog`        myprog      (use your copious
							    immagination)
-rw-rw-rw- 2  joe dbase  47 Srecords	      dbase4	  (S = Sagan = 1
							     billion billions)

There are many metrics that have meaning.  Which to use for a particular
file depends on what you want to do with the file, which I should point
out is not constant for a given file, and on a multiuser machine may even
be more than one thing at a time.  The guy trying to find a place to put
the turkey wants to know how much medium it occupies while the owner wants
to know how many words are in his file so he'll know when he can turn it in.

	Hey! its just a comment... he asked for it... it's his fault!
	Steve Nuchia

allbery@ncoast.UUCP (Brandon Allbery) (07/31/87)

As quoted from <274@wrs.UUCP> by dg@wrs.UUCP (David Goodenough):
+---------------
| >And people who believe that 8 bits is sufficiently to encode a
| >character are either naive or stupid.
| 
| Well I've never yet had a problem communicating with any machine that uses
| ASCII (American *STANDARD* Code for Information Interchange), and it's my
>...
| others that are not attached to any architecture. I don't know about the
| rest of the world, but it looks to me as if 8 bit chars are here to stay.
| (Just out of idle curiosity what size did you have in mind for a character,
| and WHY?)
+---------------

The key words are in here:  *AMERICAN* Standard Code... and "I don't know about
the rest of the world...".  Kanji (for example) doesn't fit in 8 bits.  Is the
U.S. of A. the only country allowed to use computers?
-- 
 Brandon S. Allbery, moderator of comp.sources.misc and comp.binaries.ibm.pc
  {{harvard,mit-eddie}!necntc,well!hoptoad,sun!cwruecmp!hal}!ncoast!allbery
ARPA: necntc!ncoast!allbery@harvard.harvard.edu  Fido: 157/502  MCI: BALLBERY
   <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>>

mark@ems.MN.ORG (Mark H. Colburn) (07/31/87)

In article <6156@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <463@unisoft.UUCP> greywolf@unisoft.UUCP (The Grey Wolf @ ext 165) writes:
>>I see nothing wrong [with] eight bits for a character.
>I take it you don't pay much attention to the rest of the world, then.

	Often times I have seen a lot of flaming with absolutely no explanation
as to why the original poster was wrong.  This is one of those cases.  Rather
than say that an opinion is wrong, it would help to explain why it is wrong,
so that the original poster (hopefully) learns by his mistakes. 

	Doug is right of course.  There is a need for more than eight bits 
for representing characters in other languages.  The most glaring example is 
Kanji or Katakana, where there are literally 100,000+ letters in the alphabet.  
Obviously, it would be very difficult to express that in 8 bits :-).

	Other less obvious examples would be German, Norwegien, French and 
Greek.  All of these languages, and others as well, make use of letters with
special attributes.  For example and e or u with an umlaut in German, a c with 
a circumflex (^), accent grave ('), or accent ague (`) in French, or the
ae combination in Greek.

	Any of these characters are not in the standard ASCII 8-bit character
set.  Many of these are handled by extensions to ASCII or some other character
set standard, however, 8-bits is not enough for some of the glyph-oriented 
alphabets.

	If you would like more information on this topic, there have been a
number of good papers written and given at USENIX, as well as appearing in many
of the trade journals.  In addition, it is addressed in the proposed POSIX 
standard.


-- 
Mark H. Colburn    DOMAIN: mark@ems.MN.ORG 
EMS/McGraw-Hill      UUCP: ihnp4!meccts!ems!mark      AT&T: (612) 829-8200

henry@utzoo.UUCP (Henry Spencer) (08/05/87)

>	operational amplifiers, 10-turn pots, patch cords, and a null meter!
>
>	Sorry, but I had to finish this...

Ah, but you didn't finish it.  You forgot the tube tester!  (Yes, Virginia,
people did build op-amps out of vacuum tubes...)
-- 
Support sustained spaceflight: fight |  Henry Spencer @ U of Toronto Zoology
the soi-disant "Planetary Society"!  | {allegra,ihnp4,decvax,utai}!utzoo!henry