[net.lang] I/O operations in programming languages

condict@csd1.UUCP (Michael Condict) (08/25/83)

Recently I was prompted to reconsider a problem of programming language design
that I was working on several years ago.  Simply put, it concerns the abysmally
inelegant manner in which I/O is performed in even the fanciest, most
new-fangled languages, such as PROLOG and modern LISP dialects.  No matter
how applicative or logically-based the language is, the best that designers
seem able to come up with for I/O primitives amounts to procedure calls with
side effects that get or put the next element of a sequence of values.  The
user is forced to view input and output in a procedural manner and is forced
to view a file as a sequence of objects (in many languages the objects must
be scalars, such as bytes or integers), rather than as, say, an array of
records each one of which contains a mixture of character string and
numeric fields.

Wirth knew the way out of this mess when he designed Pascal way back in the
early 70's, I think.  Instead of making I/O operations available only through
procedures with side effects, he put a FILE type into the language so that
files could, to some extent, be treated as variables.  This gave files access
to the entire world of Pascal data structures, greatly increasing the elegance
with which operations on complicated data bases could be performed.  There
are a few substantial problems and limitations associated with Pascal I/O,
however, not all of which are attributable to its design as stated in Jensen &
Wirth.  The most serious of these is in fact a matter or interpretation by
compiler writers, caused by Wirth's unfortunate choice of the reserved word
FILE (which shows that he may have tacitly agreed with the compiler writers
subsequent interpretation, even though he was not willing to endorse it in
the Revised Report).

To make a long story short, there is little justification
beyond compatibility with other implementations for requiring that the
FILE data type be allocated on a mass-storage device with a name and lifetime
external to the program and for prohibiting any other data type from being
used this way.  It would have been far better, I think, had Wirth chosen the
name SEQUENCE instead of FILE, and not just because a legalistic reading of
the Revised Report shows that the FILE data type has no particular connection
with operating system files (beyond the connection through the PROGRAM header,
which is further indication that Wirth would take the implementor's side
against me).  Several more substantial arguments can be made for treating
FILE's as nothing more than sequence, or string, variables, and for allowing
arbitrary Pascal variables to be connected to external files.  First there
is the often stated complaint that Pascal has only sequential I/O (this because
a FILE is a sequence), a problem that is remedied in incompatible ways by
making extensions to the sequence operations GET and PUT, to allow, e.g. the
specification of an index.  If arrays could be connected to external files
in the same way that FILE variables are forced to be, Pascal WOULD HAVE
random-access I/O.  Conversely, many users want the ability to manipulate
sequences of objects, such as variable-length character strings, stored in
"fast" program variables, without having to put up with all the junk and
inefficiency associated with operating system files.  Proposals for a STRING
data type of various sorts abound in the Pascal literature, even though the FILE
data type (with some restrictions removed) is the data type they want and it
is already there.  In fact, although it needs such library routines as
INDEX and SUBSTR to make it convenient, it is quite a general character-string
mechanism, since it includes conversion between numbers and characters in
its built-in set of operations (through the use of READ/WRITE).

Let me make it clear that I am no longer interested in convincing Pascal
purists that there is anything wrong with their wonderful and esquisitely
perfect language.  I merely use it to give a proposal for a better
solution of the I/O problem (and most would agree that it is a serious problem)
in the design of programming languages.  We will never achieve portability in
the I/O operations of real-world programs until the languages they are
written in can support at least a fragment of the complexity (and versatility)
of I/O operations that are allowed by modern operating systems.  Otherwise
each language implementor will continue to insert incompatible hooks into
each compiler to allow access to these operations, resulting in a perpetual
and unnecessary proliferation of language dialects.

To close a somewhat windy and highly opinionated article, I put forth a
question to this readership:  Tell me why must there be a separate set of
constructs in a programming language for performing I/O, rather than just
one construct that associates a variable with an external file.  In what way
are the data structures of a language like C, Pascal or Ada inadequate to the
task of representing and manipulating data on mass-storage devices?

					Michael Condict
					Courant Inst., New York Univ.
					251 Mercer St.
					New York, NY   10012

					...!cmcl2!csd1!condict

rehmi@umcp-cs.UUCP (08/26/83)

1) Don't you *dare* mention C within ten words of Pascal, Ada,
   or any other such structured, verbose crocks.

2) I have rarely had any difficulty in porting programs written in
   C across any forms of systems, in the way of i/o, and in fact in
   any other usual ways... The general problem seems to be foolish
   assumptions about the size, order, and meaning in the way
   variables are stored. But this usually occurs in systems-type
   hacks only, because at writing time you might not think it would
   be ported about. "Typical user" code (the latest aaai buzzword!)
   moves with practically no difficulty, easiness being proportional
   to the user's naivete about the underlying machine. Once you
   see the full system below you, you have to restrain yourself from
   bumming every microsecond you can.
   
3) Along with C, you practically always get a Un*x or Un*x-like
   system interface, and always a standard C library. Part of Un*x's
   win comes from the fact that it imposes no structure at all on
   files, and treats them as a seekable byte-stream. The structure
   that results is entirely up to the user and highly flexible. And
   since that structure (if any) is defined in the user's code, that
   means that it should port trivially. And it does. As I said, I
   shift lots of code among about 4 or 5 C's/Unixen all the time,
   and only once did the i/o part break in the port. This was
   because the i/o was being done on the raw disk file.

4) Which brings up another point. For the most part, you can treat
   terminals, printers, and all devices, even the network, as files.
   Of course, you get to send control messages to these special
   device files, with a mechanism separate from the actual i/o, but
   of a similar form. Fred Blonder around here even wrote a short
   (about 20 lines, I believe, when it came down to it)
   pseudo-device which would duplicate a file descriptor you already
   have open, when it was opened. The uses of this are usually when
   a program wants a file name, but you want to feed it input from
   a pipe. [He's umcp-cs!fred, if you want it].

5) Pascal does not map nicely onto any machine I have seen it
   implented on. In all my C's except the 6502 version, C hits it
   off with the underlying machine's architechture and instruction
   set very nicely. C produces tight, fast code.
   
I too once thought Pascal was the world's greatest language, and
thought that C was a hairy, unusable, unreadable language. But many,
if not all of Pascal's constructs are subsumed by C. This I found
out the hard way when my Lisp interpreter was due in 1.5 weeks. All
the poking I had done in Pascal (on bsd Unix, fairly friendly)
didn't work and was so verbose it was confusing to read. Verbose,
constrained, deterministic (Pascal is just a syntax table), and
very unfriendly are the best descriptors of this language. Who gives
a damn whether the array starts at 0, -5, 4 or 100? I am least
confused by having everything start at 0. And what if I want to get
the raw bytes to/from a file? In C, I just slurp some into a buffer
(whose address I can know and twiddle if I so choose). But in
Pascal, it becomes a trick to figure out how *this particular*
implementation will let you do that, if at all. It usually won't.

As for Lisp i/o, it isn't the greatest, but the Franz way of things
seems fine for the moment.

Go away, pascal!

		Not afraid to have an opinion, and back it up,

					-rehmi
-- 

By the fork, spoon, and exec of The Basfour.

Arpa:   rehmi.umcp-cs@udel-relay
Uucp:...!harpo!seismo!umcp-cs!rehmi
     ...!{allegra,brl-bmd}!umcp-cs!rehmi

wilner@pegasus.UUCP (08/26/83)

Michael Condict asks why there must be a separate set of consructs
for performing IO.  There need not be, but the answer is not to
mash files into arrays.  Devices exhibit a richer set of properties
than arrays (e.g., access time, cylinder organization, sequential
transfer).  More often that not, applications need some control over
these properties in order to run efficiently.

It seems highly desirable to have one basic declaration for what we
now call strings, arrays, files, pipes, and sequences.  Let's not
forget shell variables and command-line arguments while we're at it.
The invention that we'd all like is a unifying data type into which
we can read, write, index, search and do all the common things in a
common way, yet which continues to have time-dependent behavior and
to exhibit behavior that arises from the geometry of the hardware
out of which it is constructed.  As long as we can additionally
express such things as "block size" and number of buffers for these
new-fangled strings, we don't need files explicitly.  Finally, if
one's system has a swift implementation of these beasts that doesn't
use disk for the little ones and does use disk for the big ones,
everyone should be happy.

Languages are loath to admit time- and space-dependent behavior into
their data types, but if we are to unify arrays and files, some
compromise is necessary.

alan@allegra.UUCP (08/26/83)

In response to Rehmi...


	Don't you *dare* mention C within ten words of Pascal, Ada,
	or any other such structured, verbose crocks.


What's your problem, anyway?  Are you upset because we haven't all
accepted C as the One True Language?  In the future, why don't you
save such stupidity for net.religion.c, and stop bothering the rest
of us?


	Part of Un*x's win comes from the fact that it imposes no
	structure at all on files, and treats them as a seekable
	byte-stream.


That's very interesting.  The discussion was about I/O operations in
programming languages, not in operating systems.  There is no I/O in
C, so your article seems pretty pointless.

Give me a break...


	Alan Driscoll
	Bell Labs, Murray Hill

laura@utcsstat.UUCP (Laura Creighton) (08/28/83)

Keep the IO *out* of the language if you want to write portable
code. No matter how wonderful your ideal concept of what bits,
bytes, and datastreams ought to be, no matter how aethetically
glorious your opinion of what a file is -- there are going to be
a lot of folk out there who have equally glorious ideas that are
out-and-out incompatible with yours.

And a lot of these folk build hardware. 

Laura Creighton
utzoo!utcsstat!laura

gumby@mit-eddie.UUCP (David Vinayak Wallace) (08/29/83)

Well, one good thing about c is that IO isn't really part of the language at
all; it's all done via functions in libraries.

Modern languages are starting to use streams (e.g. Common Lisp, Ada).
Streams are generic objects all of which do a certain number of standard
operations and which can be extended for special purposes.  Then you can
have a special stream which reads or writes special structures for some
operation.  Then other programmers can access these streams without knowing
their structure.  The problem with embedding IO in the language definition
(i.e. BASIC) is that each programmer must grok the file (and device) format.

Device IO probably can't be improved.  But maybe we can just flush files, or
at least produce a better model?

david

condict@csd1.UUCP (Michael Condict) (08/30/83)

Now I am genuinely confused.  I've heard these statements before that C does
not have I/O built into it and how wonderful this is, and I've never been
able to understand them.  People claim it improves portability, so I guess
that's why they find it wonderful, and who can disparage the goal of
portability?  The only problem I have with these claims is that I can't see
any reasonable view of C or programming languages in general under which they
are not obviously false!  Let me elaborate (and you can't stop me from there
anyway, can you?):

DEFINITION 1. Input/Output (I/O).  The means by which data is transferred
between an executing computer program and the rest of the world.

THEOREM 1.  A programming language without any way of doing I/O is of purely
theoretical interest.

PROOF.  Directly from the definition of I/O, since anything that is computed
by a program in the language cannot ever be discovered except by redoing the
calculations by hand, which clearly is not practical.

THEOREM 2.  C has I/O.

PROOF.  We could prove this by contradiction of theorem 1, using the fact that
people really do execute C programs on computers, so it is apparently not just
of theoretical interest.  A more direct approach is to note that to whatever
extent C is a language defined independently of any particular implementation,
it is a language that has several I/O constructs.  I'm not familiar with all of
them but I know that printf is one of them [K&R 1978].  Conversely, to whatever
extent C is a family of languages, say, a different one for each C compiler,
it is a safe bet that each member of the family has I/O.  For instance, on
any Unix implementation of C, the primitive I/O constructs are read,
write, open, creat and a few others.  Printf, scanf, getc and so on are all
definable in terms of these system calls.

Now let's look at portability:

THEOREM 3.  Let X be a language without I/O.  If any particular compiler for X
is to be useful, a set of I/O constructs must be added to the language processed
by the compiler. That is, the compiler must process a language different from X.

PROOF.  Directly from Theorem 1.  The fact that the only addition to X consists
of adding some library routines, rather than new syntax, is not relevant --
that these routines are guaranteed by the local reference manual to be available
makes them part of the local language.

THEOREM 4.  Let X be a language without I/O.  Complete, portable programs that
are useful cannot be written in X.

PROOF.  By theorem 3, we know that every particular implementation of X will
have its own set of I/O constructs.  And to whatever extent each implementation
of X has the same set of I/O constructs, X is not a language without I/O, (or
at least the informally defined extension to X that everybody is implementing
and using is not a language without I/O).  So we are forced to conclude that
there are several different ways of doing I/O depending on which compiler
is in use.  Hence, useful X programs, since they perform I/O, must be changed
when moved from one compiler to the next.

All facetiousness aside, would somebody please show me where my reasoning
has gone astray here?  My best guess as to why people say that C has no I/O is
that all its I/O is performed with function calls, but I can't see how there
being no separate syntax for the I/O constructs matters one hill of beans.
We certainly don't say that FORTRAN has addition but does not have the absolute
value operation simply because the latter is a function call while the former
uses infix notation; or do we?  As far as the portability issue goes, we've
already had one response arguing from bitter experience against the
portability of C, so I don't feel quite as badly admitting that I haven't the
foggiest idea how you support this view (and usually I can anticipate the
other side of a debate to some extent).  Laura, can you help me out here, since
you are one of the people who subscribe to these views?

REFERENCE.
Kernighan B. & Ritchie D, "The C Programming Language", Prentice-Hall, 1978.

Michael Condict		...!cmcl2!csd1!condict
Courant Inst., N.Y.U.
251 Mercer St.
New York, N.Y.  10012

robertd@tektronix.UUCP (Bob Dietrich) (08/30/83)

I thought I'd reply to Michael Condict's (csd1!condict) discussion of I/O in
programming languages. There seems to have been at least one other reply on
the net from someone named umcp-cs!rehmi, but it was quite garbled. Something
about hacking in a higher-level assembly language call C, and the fact that
it's rarely found anywhere but in a Unix environment.

I agree with most of Michael's comments, especially the fact that sequences 
were misnamed files. I have also fought this battle long and hard and loudly.
(Perhaps this is a very good caution to be careful in picking terminology!)
As far as Pascal goes, nothing is stopping implementors from providing random
access to mass-storage via Pascal's random access abstraction, the array.
The appearance of a variable in the program heading means only that it is
bound to an object outside of the program.  The mechanism and the allowed
types is determined by the implementation, not the language.  Also, the
ANSI-IEEE Joint Pascal Committee has a proposal for using random access
files. (By the way, if you want to attend the next meeting in October, see
net.lang.pascal or contact me).

A few comments seem in order relating to programming language design and
implementation. First, language implementations are usually done quite
separately from the file system. This means that the language implementor
must map the abstraction of the language onto some predefined primitives
provided by the file system. The trouble is that the necessary primitives may
not be there, or else do a poor job of supporting the abstraction. The result
is that it is useless to put a feature in a language that depends on file
system support that is not universal, because that feature will simply not
be implemented. The dictatorial approach (as in the DoD and Ada) doesn't
work either. If keyed-file access were added to Pascal, would that mean
your home computer (with wonderful cassette tape) couldn't run Pascal? I'm
not arguing for the lowest common denominator in languages; it's just very
hard to know where to draw the line.

My second observation is that few programming languages reinvent the wheel;
instead, a few corners are added in the hope that enough will make the
wheel roll more smoothly. Thus the design is influenced heavily by others that
have gone before (learning from mistakes is fine; keeping them is not) and by
the environment the designer is currently working in. Pascal, although not
as strongly influenced as FORTRAN was by the IBM 709 architecture, seems to
have been influenced by the CDC Cyber operating system SCOPE. This is at least
true in the use of the program heading for communicating information between
the program and the outside world. CDC FORTRAN (and other languages?) has long
provided this facility, but only for files (in the operating system sense).
Pascal was also designed to provide this facility, but as Michael pointed out,
was fortunately not limited to files in the program heading. Some evidence for
Pascal being influenced by the CDC environment is the fact that unlike
regular parameters, program parameters are simply mentioned in the heading and
defined (given a type) later on in the program block. Most implementations,
if they pay attention to the program heading at all, only allow files because
1) that's the way the first implementation, Pascal 6000, did it; 2) the P2/P4
portable Pascal compilers that fathered most Pascals in the world did it that
way; and 3) most operating systems either don't know what a command line is or
make it difficult to parse.

My third comment relates to implementors (I am both a user and implementor of
Pascal).

  Note: It's been suggested to me that the remainder of this text borders on
        flaming.

Several things usually effect the way someone implements a language,
often in such a way that the result is an extended subset of the language
instead of the whole thing. For instance, 1) The implementor is rushed to get
"something working", so the simple things like expressions are done first, and
the parts that look harder are left to be studied and implemented "later".
"Later" often never comes. 2) The implementor is invariably under some sort of
deadline, so that the parts left to be implemented near the end of the schedule
are implemented in a rush, and perhaps with less care. 3) Implementors, like
most people, are uncomfortable with things they don't understand. Such things
don't get implemented. 4) A particular feature in language X is "just like in
language Y", and so that feature will be easy to implement. Right and wrong.
Yes, in isolation the feature is just like another language's feature, but
in relation to the rest of the language the impact is quite different (usually
a disaster). This is especially true of grafting features from one language
onto another. 5) The simple question is rarely asked: "Is the abstraction
I'm looking for already in the language (perhaps in a form I'm not familiar
with) ?" 6) The "creeping feature creature" grabs hold of the implementor
("If I add this feature, then I can eliminate 20 whole statements from my
5000 line compiler!!") 7) Some implementors are just plain lazy.

Lest I seem to be merely maligning implementors, let me say a few words in
their defense. Often an implementor is between a rock and a hard place. The
implementor must take a language that the user expects to be clairvoyant and
implement it in a relatively hostile environment. Usually the operating system
has long since been frozen and the machine instruction set was designed by
a electrical engineer that says "Huh??" when you use the term "software".
An added benefit that we are fortunately moving away from is that the compiler
has been specified (by someone else) to run in 2K bytes of memory and compile
20,000 line programs at 10,000 lines a minute on an 8080 with floppies. (Oh,
and by the way, can you have it done next month?) Given the environment that
implementors have had to work with, it's a wonder we have languages at all.

                                              Bob Dietrich
                                              Tektronix, Inc.
                                              (503) 629-1727
{ucb or dec}vax!tektronix!robertd             uucp address
robertd@tektronix                             csnet address
robertd.tektronix@rand-relay                  arpa address

alan@allegra.UUCP (08/30/83)

	All facetiousness aside, would somebody please show me where
	my reasoning has gone astray here?  My best guess as to why
	people 	say that C has no I/O is that all its I/O is performed
	with function calls, but I can't see how there being no separate
	syntax for the I/O constructs matters one hill of beans.  We
	certainly don't say that FORTRAN has addition but does not have
	the absolute value operation simply because the latter is a
	function call while the former uses infix notation; or do we?  


In Fortran, the absolute value function is part of the definition of the
language.  It is a built-in function.  If your Fortran compiler does not
supply it, then it's not a correct and complete compiler.  So it's fair
to say that Fortran has an absolute value operation.

In C, on the other hand, no I/O operations exist in the definition of the
language.  From "The C Programming Language" by K & R:


	Finally, C itself provides no input-output facilities: there
	are no READ or WRITE statements, and no wired-in file access
	methods.


So, your C library (not your C compiler) may include "printf", but it
doesn't have to.  With or without "printf", it's still C.

Do you now understand why people say C has no I/O?


	Alan Driscoll
	Bell Labs, Murray Hill

hal@cornell.UUCP (Hal Perkins) (08/31/83)

Something's not quite right here.  The claim is that the C language
contains I/O operations because any language that can be used to write
useful, portable programs must contain I/O operations.  I don't agree,
and I don't think the argument holds together.

First, I think it is reasonably obvious that the C programming language
itself does not contain I/O commands in the sense that FORTRAN, COBOL,
PL/I or Pascal do.  In these languages, I/O operations have special
syntactic forms and semantics that, in most cases, could not be defined
by procedures or functions written in the language.  For example, the
Pascal read, readln, write, and writeln operations can accept any number
of parameters of any of the basic scalar types, and a special notation is
available to specify the format of numbers printed by write and writeln.
These procedures absolutely cannot be defined using the standard Pascal
language; they have to be specified in the language definition or they
couldn't be there in their present form.

In C and Ada, among other examples, there are no special forms for I/O
commands.  So I claim that these languages do not contain I/O.  How then
is it possible to use these languages to write useful programs?  The
answer is that there are standard libraries or packages that provide I/O
operations.  Furthermore, virtually every implementation of these languages
includes the an implementation of the standard I/O packages.  For C, this
is customarily the standard set of I/O routines described in Kernighan &
Ritchie.  For Ada, DoD has decreed that all certified Ada implementations
must provide certain standard I/O packages.

So it seems to me there is no problem with the statement that C does not
contain any I/O.  The incorrect statement is the one that claims that any
useful programming language must contain I/O--otherwise programs cannot
communicate with the world.  The correct version is that either the language
must contain I/O statements OR ELSE a standard package of I/O routines must
be available with all implementations of the language.  This is one of the
reasons that ALGOL 60 never caught on.  ALGOL 60 contains no I/O.  Nothing
wrong with that--C has demonstrated that a language without I/O statements
is perfectly usable.  But the ALGOL 60 community never defined a single,
universally available, set of I/O procedures, so there was no way to write
a portable program that communicated with the outside world.  (Actually,
attempts were made to specify a standard ALGOL 60 I/O package, but these
were never widely adopted.  I suspect that it is at least as difficult to
standardize the I/O package definitions as it is to standardize language
definitions.  In the case of C, the Unix environment has been so closely
associated with the language that it became the I/O standard by default.)

To close this excessively long note, it seems to me that languages
should not contain I/O commands because this helps to keep the language
definition short and comprehensable.  But it is equally important to
define a basic set of I/O operations that are universally available with
every implementation.  Defining a good basic set of I/O operations is a
difficult design problem.  Separating this problem from the basic language
can only help; it simplifies the number of things that must be designed
at the same time.

Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET: hal@crnlcs

condict@csd1.UUCP (Michael Condict) (08/31/83)

Boy, this is getting rough -- people are starting to talk about breaking each
other with clubs!  Not to bring down such mayhem on my own poor head, I would
just like to reiterate a point that was also made by at least one of my allies.
Perhaps I can avoid all attempts at cuteness this time and confine myself to
the facts (but I doubt it).  The issue that seems to be most widely disagreed
upon is whether or not C has I/O.  When I referred to K&R I did not mean to
plant myself in the camp of the faithful Bible thumpers who take everything
found there as divinely inspired, hence unarguable.  I only meant to point to
the source most influential in the design of C I/O as it exists today.

I don't particularly find it important whether the book says that I/O is
part of C or not or shows printf in the syntax summary.  In fact, there
happens to be absolutely no mention of I/O, even of its omission, in the
C Reference Manual (the "official" language-defining portion of the C book).
All of this is irrelevant, because a language is defined by its use, not its
official definition -- this is true of English and is just as true of program-
ming languages (even Ada). One person called this the "language together with
its environment", but I don't see how to make such a distinction.  The language
consists of those sequences of symbols you can write down with reasonable
confidence about their meaning.  For a programming language with dozens of
different compilers running on thousands of different computers, each with
its own support library, there is no hard and fast definition of what is in
the language and what is not.  All the same it is just as clear that printf is
in the C language as, say, the union type, since more implementations
probably provide the former than the latter.

Anyway, let's not argue about trivial semantic issues like this.  I'll
agree to say that C has no I/O if you'll agree that there is a certain extended
language, let's call it C+, that everyone uses instead of C and that does have
I/O.  See, we're both right!

What I'd really like to see is a resumption of the discussion about
improving the way I/O is handled in programming languages; that is, not whether
C has I/O (every implementation will), but what is the best way to connect
a high-level language, especially one oriented towards very abstract program-
ming, such as SETL, LISP or PROLOG, to the grungy, incredibly over-designed and
horribly complex I/O monster found on typical operating systems.  One popular
point of view seems to be that the official language definition should avoid
the I/O issue entirely, presumably allowing individual implementations to
define this necessary portion of the language by default.  The claim is that
this leads to portability of programs.  I'm still waiting for an explanation
of the reasoning behind this conclusion, which I am at a loss to reproduce.
Even if you convince me that this is the right way to go, a possibility I have
not ruled out at all, we don't seem to have made any progress on the original
problem, which is how to define the I/O portion of programming language.
You've just told me WHO should do it (the compiler writers, rather than the
language designer) rather than HOW it should be done.


Michael Condict		...!cmcl2!csd1!condict
Courant Inst., N.Y.U.

hal@cornell.UUCP (Hal Perkins) (08/31/83)

Bob Dietrich's article arrived while I was posting my previous note.
I'd like to add one thing to what he and Mike Condict have both discussed.

From a logical standpoint, it is very clean to allow Pascal program
parameters to be of any data type.  One can get random-access files
this way very without adding any cruft to the language.  But this
is a bitch to implement on most conventional operating systems.
The representation of random-access files (arrays) on disks and arrays
in main storage, and the operations needed to access elements of these
two different kinds of arrays, are vastly different in almost every
conventional system (including Unix).  So if you try to allow some
program variables and parameters to be associated with disk files and
others to be associated with objects in central storage, you must use
different mechanisms to access the elements of the different sorts of
variables.  This is moderately unpleasant to compile and generate code
for, but for simple variables it's not too bad.  Just keep something in
the symbol table to indicate whether the data is in main store or in the 
file system, and generate the correct code to access it directly or else
call the operating system.

But.... what about parameters to procedures?  Suppose I have two objects
of the same logical data type, one of which is in main store and the
other of which is a program parameter.  And I have a procedure with a
parameter of this type.  What sort of code do I compile in the procedure
to access the parameter?  You really have to use something like the
thunks used to implement ALGOL 60 call-by-name.  And this is a
bad efficiency penalty for access to actual parameters that are
objects in main storage and not in the file system.  Given this
situation, most implementors are not going to go to the trouble of
supporting uniform notations in the language for objects both in main
storage and on the disk, since the compiled code would be too slow for
the "real world", where performance counts.  (I plead guilty to this.
I did a lot of the work on the interface between the Pascal 8000 compiler
and IBM's OS/MVS/CMS systems.  I wanted to implement random files as 
array program parameters, but the implications for code generated for all
array accesses were terrible.)

As long as there are two separate address spaces for main storage and the
filing system as provided on most machines, this problem won't go away.  This
seems to me to be one of the best arguments for the one-level-store concept
(as in Multics, et. seq.), since a one-level store would allow the use of
large variables and data structures in exactly the same way as smaller
data objects (as suggested by Condict) and make it possible to eliminate
the extra baggage needed for files.  A decent abstract data type facility
would allow the definition of trees, indexed data structures, and other
things typically found in file systems.

However, I'm pretty pestimistic that this could be done on most conventional
architectures since it seems that hardware support would be needed for a
decent segmented memory with a very large address space to make the whole
thing work.  I don't see how one can get away from the concept of files and
variables as different sorts of gadgets on most existing machines and
operating systems.

Cheers

Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET:  hal@crnlcs

mrm@datagen.UUCP (08/31/83)

C does not have I/O built into the language per-se, since the compiler
proper does not recognize calls to printf et. all.  Rather it interprets
them as normal function calls (and/or macro replacements).  The user is
free to write his/her own `printf' that does input instead of output or
what have you.  Also C stdio is not completely portable since a header
file could change a declaration from `int' to `long' on any given machine
and the format strings must be changed manually from "%d" to "%ld".  On a
16 bit machine this would produce incorrect results, whereas on a 32 bit
machine it would produce correct results.  Since the I/O is not builtin,
the compiler cannot tell the I/O system the type of the variables through
dope vectors, and the poor user is left to manage as best s/he can.  This
discussion also brings to mind that this is a good case for bringing back
the old "%r" format.

For those of you who don't know, %r interpretted the next argument as a
nested format, thus you could make one macro be the standard format for
any given type, which you might print out in several different places.  For
example:

	#ifdef	PDP11

	typedef	long	type;
	#define	FORMAT	"%ld"

	#endif

	#ifdef	VAX

	typedef	int	type;
	#define	FORMAT	"%d"

	#endif


	type a;

	...

	printf( "a is %r\n", FORMAT, a );

(allegra,ittvax)!datagen!mrm

mark@cbosgd.UUCP (08/31/83)

In reply to Michael Condict's "proofs" that C has I/O:

I challenge you to find any reference to printf or any other I/O
operation in the C Reference Manual.  (That's the appendix to the
C book, for example.)  The C language simply does not have I/O
as a defined part of the language.

There is something called the "standard I/O library" which nearly
every implementation of C has.  This includes fopen, <stdio.h>,
printf, and a couple dozen other routines.  There are implementations
of C that don't include this (Idris comes to mind, as well as V6,
which had a different "portable C library" that flopped), but the
current de facto standard is that you get a C compiler AND stdio.
(In fact, most implementations go one step further and provide
something called "the C library" which was originally supposed to
be UNIX specific.  This includes string operations, routines like
open, read, write, and lseek that deal with small integer file
descriptors, and command line parsing implementing the UNIX argc/argv
notion and < and > redirection.)

This does not make them part of the language.  It just means that in
order to use C, you need both a compiler AND a library.  This is in
contrast to FORTRAN, where I/O and a set of built in functions are
defined in the manual to be part of the language.

Claiming that if you don't have I/O in the language you have a language
of only theoretical interest is silly.  It assumes you can't put
tools together.  Kind of like saying that since Goodyear makes tires,
if Goodyear doesn't make things to put the tires on (e.g. cars) then
Goodyear products are of only theoretical interest.

perl@rdin.UUCP (Robert Perlberg) (08/31/83)

cFbr*w
In response to umcp-cs!rehmi's article on C portability, you  are
missing  one  very  important point.  You only refer to porting C
from UNIX to UNIX.  THAT's why the I/O always works.  If you  are
only  concerned with porting from UNIX to UNIX, ANY language will
port with no I/O problems.  It has nothing  to  do  with  C.   In
fact,  C  I/O is so heavily based on UNIX I/O concepts that it is
perhaps the HARDEST language to port  to  a  DIFFERENT  operating
system.   I had to use C under CP/M.  CP/M does not keep track of
the size of a file in bytes; only in blocks.  If you had to  port
most  UNIX  utilities  to CP/M, it would be nearly impossible.  I
had to write my programs to use header records  to  indicate  the
length of the file since the OS wouldn't do it for me.

I partly agree with your first statement:

"Don't you *dare* mention C withing ten words of Pascal, Ada,  or
any other such structured, verbose crocks."

Take out the word "crocks" and I couldn't agree  more.   C  isn't
worthy  of  being  associated  with  languages  that  were  truly
designed for portability and logical correctness.

With regard to portability being proportional to one's  knowledge
of  the  machine,  you're right; but in the wrong direction.  The
more you know about the differences between  machines,  the  more
careful  you  will be about exploiting a non-portable feature.  I
am constantly having to patch up code submitted by hackers who do
whatever  seems  to  work  on  their  machine  because they don't
understand that you can't pass just any old variable  type  to  a
function.

And as far as C being hairy and unreadable, that depends  on  how
you code.  Most C code I see outside my company looks like it was
written by a first year BASIC hacker.  There's no law  that  says
that  you have to fit as many functions into one statement as the
compiler can (or often can't) deal with.

I agree that Pascal is not the answer, but  whoever  it  is  that
maintains  the C standard doesn't seem to be aware that there's a
question!  Why can't C have dynamically dimensioned  arrays,  run
time  error  messages,  array  bounds  checking,  and a few other
little bits of  civilization  that  every  other  language  since
FORTRAN has had?  I've known people to spend DAYS trying to debug
a program before they discover that a variable is being passed to
a function that's expecting a different type.

Just like V'GER, the C standard MUST evolve.  It  maddens  me  to
see  people  trying to keep C from getting better in the interest
of standardization.  If you  want  stunted  standardization,  use
FORTRAN.  Let's let C stand for Civilized.

Are you listening, Dennis?

Robert Perlberg
Resource Dynamics Inc.
New York
philabs!rdin!perl

laura@utcsstat.UUCP (Laura Creighton) (09/01/83)

So sorry, but I misunderstood the original article. I THOUGHT that 
what was being discussed is an ideal language to do 'local area
networks' (magic phrase, fill in with whatever is current wherever
you are to discribe same). This was not the case, I am sorry to have
misunderstood.

i got this horrible concept of where a "file" was a 'data packet' (another
magic word, your word may be different), and the whole language had to
comprise every device driver known to man (and whatever other strange
creatures build devices).

Given that the language designer does a good job, and defines the behavior
of all the IO in excruciating detail -- so that anyone who corresponds to the
standard must by definition be compatible -- I do not think that it matters
whether the IO is in or out of the language. In fact, if it is in the
language, people like Whitesmiths cannot pull a sleazy trick of writing an
incompatible compiler which strictly speaking IS compatible for it is
only <stdio.h> (<std.h>) which is different. Whitesmiths can get away with it,
sure, but porting things to places which have a Whitesmith's C compiler is
a real pain...

(Actually, I am told that Whitesmith's now has a compatible <stdio.h>.
This is news to me, and I cannot personally vouch for its accuracy, but
the person who just told me is not known for making wildly innaccurate
statements. If this is the case then I say "Its about time!")

laura creighton
utzoo!utcsstat!laura

perl@rdin.UUCP (Robert Perlberg) (09/02/83)

In response to umcp-cs!rehmi's article on C portability, you  are
missing  one  very  important point.  You only refer to porting C
from UNIX to UNIX.  THAT's why the I/O always works.  If you  are
only  concerned with porting from UNIX to UNIX, ANY language will
port with no I/O problems.  It has nothing  to  do  with  C.   In
fact,  C  I/O is so heavily based on UNIX I/O concepts that it is
perhaps the HARDEST language to port  to  a  DIFFERENT  operating
system.   I had to use C under CP/M.  CP/M does not keep track of
the size of a file in bytes; only in blocks.  If you had to  port
most  UNIX  utilities  to CP/M, it would be nearly impossible.  I
had to write my programs to use header records  to  indicate  the
length of the file since the OS wouldn't do it for me.

I partly agree with your first statement:

"Don't you *dare* mention C within ten words of Pascal,  Ada,  or
any other such structured, verbose crocks."

Take out the word "crocks" and I couldn't agree  more.   C  isn't
worthy  of  being  associated  with  languages  that  were  truly
designed for portability and logical correctness.

With regard to portability being proportional to one's  knowledge
of  the  machine,  you're right; but in the wrong direction.  The
more you know about the differences between  machines,  the  more
careful  you  will be about exploiting a non-portable feature.  I
am constantly having to patch up code submitted by hackers who do
whatever  seems  to  work  on  their  machine  because they don't
understand that you can't pass just any old variable  type  to  a
function.

And as far as C being hairy and unreadable, that depends  on  how
you code.  Most C code I see outside my company looks like it was
written by a first year BASIC hacker.  There's no law  that  says
that  you have to fit as many functions into one statement as the
compiler can (or often can't) deal with.

I agree that Pascal is not the answer, but  whoever  it  is  that
maintains  the C standard doesn't seem to be aware that there's a
question!  Why can't C have dynamically dimensioned  arrays,  run
time  error  messages,  array  bounds  checking,  and a few other
little bits of  civilization  that  every  other  language  since
FORTRAN has had?  I've known people to spend DAYS trying to debug
a program before they discover that a variable is being passed to
a function that's expecting a different type.

Just like V'GER, the C standard MUST evolve.  It  maddens  me  to
see  people  trying to keep C from getting better in the interest
of standardization.  If you  want  stunted  standardization,  use
FORTRAN.  Let's let C stand for Civilized.

Are you listening, Dennis?

Robert Perlberg
Resource Dynamics Inc.
New York
philabs!rdin!perl

mjl@ritcv.UUCP (Mike Lutz) (09/02/83)

Here I go off on a vacation for a couple of days and things really
light up (or heat up).  Anyhow, having perused the 30-odd articles,
some odder than others, on the subject of I/O, C, and UTOPIA84 (the
language that solves everybody's problems), I will add my two cents.

It is a red-herring to ask whether or not C has I/O -- of course it
does.  However, the real issue (and one which few have addressed
directly) is the manner in which I/O is specified.  As Dennis Ritchie
says, C separates the language issues from the particular form in which
I/O is expressed in the language.  In this way, new I/O packages can be
tried out without jeopardizing existing programs that use previous
packages.  What is more, as technology progresses, new I/O paradigms
can be included without sending all the compiler writers back to their
terminals to hack at LEX, YACC, and the code generators.

An example of a new paradigm is the curses library, providing a screen
oriented I/O interface:  a mode of I/O that was unknown (or at least
unusual) when C and Pascal were being developed.  Whereas the addition
of this new paradigm to C is relatively easy (just write new
functions), in Pascal it's a real pain.  The root cause of the problem
was hubris on the part of Pascal's designers, who decided in 1970 that
they "knew" how I/O should be modeled once and forever, and who instead
created a iron box which has imprisoned a generation of programmers.  I
bet dollars to donuts that a similar fate awaits those who "know" how
I/O should be implemented in 1983, and are willing to embed their ideas
in special linguistic forms.

So, though there definitely is I/O in C, the crucial point is the
manner in which it is incorporated.  The library function approach is a
source of power and flexibility, and my plea would be to build on this
protean base. The alternative is a procrustean I/O model (such as found
in most other languages).

Mike Lutz
{allegra,seismo}!rochester!ritcv!mjl

rehmi@umcp-cs.UUCP (09/02/83)

From: perl@rdin.UUCP

	You only refer to porting C from UNIX to UNIX.

I should have been clearer - some ports were from C w/Un*x to standalone
C with a library to do unix-like i/o.

	THAT's why the I/O always works.  If you are only concerned with
	porting from UNIX to UNIX, ANY language will port with no I/O
	problems.  It has nothing to do with C.  In fact, C I/O is so
	heavily based on UNIX I/O concepts that it is the HARDEST language
	to port to a DIFFERENT operating system.  I had to use C under CP/M.
	CP/M does not keep track of the size of a file in bytes; only in
	blocks.  If you had to port most UNIX utilities to CP/M, it would be
	nearly impossible.  I had to write my programs to use header records
	to indicate the length of the file since the OS wouldn't do it for me.

But C does not have I/O! Look, under vanilla bsd/bell you have at least
three immediately obvious choices for i/o. The first is directly with system
calls, e.g. read() and write(). The second is with the buffered i/o library,
e.g. fread() and fwrite(). The third is with the stdio library, like scanf()
and printf(). Even the system calls are in a library (the calling sequence).

	I agree that Pascal is not the answer, but whoever it is that
	maintains the C standard doesn't seem to be aware that there's a
	question! Why can't C have dynamically dimensioned arrays, run
	time error messages, array bounds checking, and a few other
	little bits of civilization that every other language since
	FORTRAN has had? I've known people to spend DAYS trying to debug
	a program before they discover that a variable is being passed to
	a function that's expecting a different type.

Because they're slow (in reference to both "Why can't..." and "I've
known..."). Besides, there are often libraries for doing error messages
easy. And what do you think link is for? Decoration? It does a pretty
complete rundown over the code in question, including type checking,
unreferenced variables, etc... There is this library function called
"malloc" which will give you all the space you want, so you can have dynamic
allocation. Array bounds checking is slow. Besides, what is the point of
starting your array at element -5 and running up to 27? It is slow, and adds
to the confusion...

	Just like V'GER, the C standard MUST evolve. It maddens me to
	see people trying to keep C from getting better in the interest
	of standardization. If you want stunted standardization, use
	FORTRAN. Let's let C stand for Civilized.
	
The C standard is spartan yet rich. It is concise and straightforward. You
can ignore the machine it is running on, or write the machine's os in it.
The language itself is a win. The libraries... well, all things must change,
hopefully for the better.

					-rehmi
-- 

By the fork, spoon, and exec of The Basfour.

Arpa:   rehmi.umcp-cs@udel-relay
Uucp:...!harpo!seismo!umcp-cs!rehmi
     ...!{allegra,brl-bmd}!umcp-cs!rehmi

condict@csd1.UUCP (Michael Condict) (09/02/83)

I hope that Mike Lutz has cleared the air about the issue of whether or not
C has I/O.  I feel a little guilty about prolonging the discussion, which is
of course a "red herring".  In my defense, I was trying to indicate the
undesirability of defining the boundaries of a programming language as "the
set of strings given meaning by the compiler alone, rather than any set of
object libraries, whether standard or local".  I was attempting to use the
Socratic method of posing questions that point out serious difficulties with
widely held beliefs.  Well it didn't work.  I guess Socrates was a little
better at it than I am.  So now I'll have to resort to the direct approach.
Why do I say this definition is undesirable?

Other than the fact that my C processor may not involve a separate compiler,
run-time system and object libraries, the problem that I was trying to expose
is evidenced by the early responses to my first notes.  People were expounding
the virtues of C not having any I/O, without any recognition at all of the
fact that this is only due to a technicality in their definition of what is in
the C language.  Most of these people would be quick to admit that there is
a fairly well defined set of C statements (almost all of them function calls)
that one can supply to almost any C processor to cause I/O to happen in a
predictable fashion.  Thus any advantages accruing to the C I/O paradigm are
not due to the fact that I/O is not defined anywhere in the collection of
documents, compilers and portable/standard object libraries that could be called
"informal C + environment".  Rather, as Dennis Ritchie and Mike Lutz point out
these advantages must occur because of *where* I/O is defined in C, namely
in the object libraries, rather than the compiler.  Now to the main point of
this note, the issue of whether such advantages actually do exist and why.

The claim appears to be that I/O is one portion of a programming language that
should not have separate syntax associated with it and be processed by the
compiler.  Of course, no one would argue that it should not have separate
semantics associated with it, otherwise it would not exist!  So we must
conclude that the only way this can happen is to steal from one of the other
constructs that do have syntax, in this case the function call.  That is, we
can arrange that certain function calls do I/O, instead of satisfying the
otherwise inviolable rule that the function call can be replaced by C code
that does the same thing.  This is perfectly acceptable, given that we
believe that I/O does not deserve its own syntax.  In fact, my proposal for
merging variables with files is almost the same approach.  I want to steal
from the syntax of variable declarations, using ones that were prefixed
with the "slow" attribute to perform I/O, at least to files (if not devices).
The reason I chose this approach is that, clearly, file I/O is a *data*
operation, not a matter of logic and control structures.  It therefore seems
most natural to use the data-oriented variable mechanisms of the programming
language to model I/O, rather than using the function call, which is
control-oriented.

The argument for using function calls was based on the belief that I/O is
somehow less well understood or more rapidly evolving than other portions of
programming languages and so, since compilers are more difficult to change than
object libraries, we should put the definition in the library rather than
in the syntax tables.  Furthermore, users can avoid the effects of changes
to libraries, presumably by refusing to use the new ones and keeping copies
of the old.  Okay, I'm willing to agree that these are real advantages.  What
hasn't been adequately explained to me is the belief that I/O is a less stable
programming language feature than, say, abstract data types or parallel
processing.  And where do we draw the line on this language design methodology?

The limiting case of the belief that the language should be defined by functions
is of course LISP, where it is possible to believe that all but about a
dozen functions are defined purely in LISP itself, including arithmetic
and all control structures besides COND.  I don't think that many adherents
of the manner in which I/O is defined in C would want to give up infix
arithmetic expressions for "add(x,y)", so I am interested in just what they
believe are the sufficiently stable set of constructs worthy of inclusion in
the "language" definition.  I feel that I/O operations on simple
character-stream devices and on files are, by definition, as stable and well
understood as operations on main-storage variables, since the only technical
semantic difference is the code that must be generated to do these operations.
This is why I propose using the same syntax.  Of course there will always be
people who want to hook up strange and complex equipment to computers,
such as automobile engines, Star Wars defense systems, marital aids and
Things Not Yet Invented That We Can Only Dimly Comprehend.
I never intended to imply that the control of such devices should in
any way be built into a compiler, interpreter, or even a standard object
library.  But, Mein Gott, aren't programming languages mature enough yet to
understand the concept of getting the nth record from a random-access file?
They can learn these concepts from their creators or they can pick them up
"in the streets", where each industrial implementor, seeing an opportunity
to use the reputation of the language to market its own products, will teach
the language how to connect itself to its own I/O system, usually of
questionable virtue, and will then begin selling this "improved" language to
all takers.  Is this what we want for our loved ones?


Michael Condict		...!cmcl2!csd1!condict
Courant Inst., N.Y.U.

kurt@fluke.UUCP (Kurt Guntheroth) (09/03/83)

I/O in C/UNIX

When one talks about a language used almost exclusively on a single
operating system (C/UNIX), and which has a standard I/O package, you might
as well consider that I/O is built into the language.  In fact, 

I/O as part of the language

I have seen languages which defined I/O as part of the language and they are
usually ugly.  I am thinking in particular of the proposed ANSI BASIC which
has so many ways of doing the same thing to a file that it is impossible to
remember what functions work in what modes and with what results.

The real problem is that people have not spent enough time deciding what
operatinos are good things for operating systems to support.  In general
there is not a good model of a file that is usable by enough people.  UNIX
and C have chosen a model that is at least consistent (seekable byte
stream), but a little simple for many people and also a little non-portable
to systems with different size 'bytes'.

It would be nice if as much attention was devoted to formalisms for I/O and
operating systems as was devoted to formalisms for programming language
constructs.  Unfortunately, there really are people in acedemia who treat I/O as
unnecessarily dirty and 'real-world' and persue issues that can be theorized
about with more detachment.

barmar@mit-eddie.UUCP (Barry Margolin) (09/04/83)

Well, it's is now my turn to be a twit and join this stupid debate.
Someone made the point that leaving I/O out of the language definition
permits you to experiment with new I/O schemes.  This is just plain
wrong.  Just because a language has I/O routines doesn't mean that you
have to use them.  As long as the language has subroutines you can
implement new I/O architectures such as curses.

For example, I am a Multics system programmer, and as some of you may
know, Multics is a PL/I system.  PL/I has I/O in the language, but
Multics programming standards specifically say "do not use PL/I I/O, use
the standard Multics subroutines".  I also know a number of Unix system
programmers who never use the low-level stdio routines, they always use
the simpler Unix routines.

Here are my thoughts on the pros and cons of providing I/O in the
language:
1) portability: I don't think it matters.  Someone will have to write
some standard I/O routines, and it doesn't matter to me whether it is
the compiler writer or the person who ports the library.  In many cases
the I/O primitives might have to be written in assembler, so it would
probably be better if the compiler-backend-writer were implementing it, to
reduce the number of people who must deal with the machine language.

2) syntax: building I/O into the language has the feature that the I/O
routines have access to information about the data that is not generally
available at runtime.  Most language-I/O designs are generic, meaning
that they do the appropriate thing with all the language-defined data
types.  In PL/I I can just say "put list (foo)", and the value of the
variable foo will be printed in an appropriate form.  On the other hand,
with the move these days towards user-defined types and various forms of
packaging (Zetalisp flavors, Smalltalk classes, CLU clusters, Modula
modules, and Ada packages) this has become less attractive.  Another
nice thing that can be done in PL/I is "put data (LIST-OF-VARIABLES)";
this is like "put list (LIST-OF-VARIABLES)" except it also outputs the
variable names with each value, a very useful debugging tool (especially
if you leave out the LIST-OF-VARIABLES, in which case it dumps ALL the
variables).  Of course, this is not necessary if your system has other
good debugging utilities, but it is a good example of something that
cannot be done by a subroutine in any language I am familiar with.
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

nishri@utcsstat.UUCP (Alex Nishri) (09/12/83)

The 'file' and 'array' are abstract concepts.  Although these
concepts once had a basis in physical media they no longer have to. 
The compiler writer and the operating system writer can implement the
abstract concepts in any way they chose.  For example, many UNIX systems
implement some larger arrays as buffered random access files (known by
implementers as 'virtual storage.')

I think the notion of a 'slow variable' is really the same as the
better defined concept of 'file.'  And just as notion of passing of
files between subroutines is recognized in many language definitions
so I think the 'slow variable' could be passed as a parameter.  In
fact, I think we could implement 'slow variables' in terms of 'files'
with a few macro preprocessor instructions.

Alex Nishri
University of Toronto
 ... utcsstat!nishri