[net.lang] Back to I/O operations in programming languages

condict@csd1.UUCP (Michael Condict) (08/31/83)

Thank you Hal, for a note that addresses the original question of whether it
is desirable to merge files with variables.  (I'm sorry, everybody, that it
took me so long to understand that C doesn't have I/O.  I won't ever say it
does again, I promise.  I understand the difference now between the Language
and the Library Routines, may they never be confused.  Amen.  I still don't
understand why ``main(){printf("abc");}'' is in the set of strings that, when
supplied to any C processor, produce a predictable result, whereas
``main(){glerp("abc");}'' is not, since both strings - or neither? - are in the
C Language.  But don't you worry, I'll figure it out; I'm pretty smart.)

Hal points out a serious difficulty with the original proposal, which he
actually considered installing in Pascal 8000, namely the adverse impact
it would have on the efficiency of access to variables that are in main
storage.  I certainly agree that we don't want to do anything that would
impinge on the efficiency of non-I/O operations.  For those who missed the
note, the point was that when compiling a procedure that takes an array
parameter, and especially when compiling it separately from the rest of the
program, we don't know whether it will be passed a main-storage array or
an array that is really a file.  So apparently we have to code up this infor-
mation with the actual parameter and check it with every access of the parameter
inside the procedure.

I can think of one cheater's way out of this.  Simply declare that the "slow"
attribute (which says that a variable is to be stored on a file) makes a var
incompatible with non-slow vars that are otherwise declared identically,
at least for the purposes of actual/formal parameter compatibility.  This is
in fact exactly what is done for the "packed" attribute, as I recall.  Correct
me if I'm wrong, Hal, but isn't it true that a packed array may not be passed
to a procedure that expects an unpacked array?  Or is it that an element of
a packed array may not be passed?  In any case, there is precedent for this
approach, and it doesn't by any means give up all the advantages of the original
idea.  We can still compare or copy any two files with a single operation
instead of a loop, we can still declare files to have a structure that is
appropriate to the problem and we can change our program to have it stop using
a temporary file (when we move to a virtual memory machine or one with a
larger address space) by removing a single word from a variable declaration.
In fact the only thing we lose is the ability to read an entire file into a
main-storage variable with one assignment statement.  And this restriction
need only be imposed across procedure boundaries because, in other contexts,
when we encounter the statement A := B;, we know whether or not one or both
of A and B are "slow" variables and can generate the appropriate code.  It
is clear that when neither one is "slow" we can generate the exact same
code we do now, so the presence of this possibility does not adversely
affect the efficiency of unrelated parts of the language.

Another issue I would like to see discussed is how to do I/O in a language
that is totally applicative (or functional, if you prefer to restrict your
applicative programs to second-order combinators, as Backus proposes).  Has
anyone written a beautiful, utterly
applicative program in LISP only to discover that, in order to get the input
into it or the results printed it is necessary to call READ or PRINT, procedures
with side effects?  Must we resort to the heavy artillery of lazy evaluation
(with its implications for the entire language) in order to achieve the
relatively modest goal of applicative I/O?  To phrase the question in concrete
terms, here is what I want in LISP: I want there to be (at least) two
predefined variables, INPUT and OUTPUT that are bound to standard input and
output.  These are, in the simplest case, ordinary list variables to the
naked eye.  However, because they are bound to the terminal, accessing elements
of these lists may cause I/O to happen.  Specifically, whenever CAR/CDR
combinations are used to access an element of the INPUT variable that has
not yet been accessed (is a virgin, let us say) then this element, preceded
by any virgin elements before it must immediately be read from standard input.
By this means, the program sees input as consisting of a pre-existing sequence
of LISP values, not unlike the case in PASCAL.  Totally applicative programs
that input values are now possible.  The problem is how to avoid slowing down
the execution of CAR and CDR for values other than the input list.

For the OUTPUT variable, things are more complicated.  I cannot see, in detail,
a reasonable way to associate actual output with this variable.  Let me just
say that whenever execution of the (applicative) program reaches a point where
we know the first N elements of the list that is to be the value of this
variable, we should make sure that all N of these elements are immediately
output, if they have not yet been.  The problems with this statement are
several.  First, what if our applicative language has non-applicative features
(so that "real" programmers will use it).  Then, presumably, the value of the
OUTPUT variable, after being almost completely constructed, can be changed
radically using, say, RPLACA and RPLACD.  So when do we know the final value
of the first N elements of OUTPUT?  Must we wait till the end of the program
before outputting anything?  The second problem is more fundamental.  Even if
the language is totally applicative, what is the nature and scope of the
OUTPUT variable?  If it is global, we cannot change or define its value by
any function calls, and if it is a local variable of some function or some
construct of the form

	let OUTPUT be x in e(OUTPUT)

then which function or construct does it belong to and why?  We could say
that whenever a variable with the name OUTPUT is bound anywhere in the
program, that value is immediately printed, but this is the SNOBOL approach
to output, is incompatible with the spirit of applicative programming and
is not analogous to the treatment proposed for the INPUT variable.

Another approach is to be dogmatic and say that an applicative program consists
entirely of the evaluation of a (large) expression, so clearly, the only
output consists of printing the value of this expression.  This is not
practical at all for programs that must interact with a user, unless we output
portions of the value of the expression while it is still being computed.
Even so, it does not address the question of how to interface applicative
programs to the real world, in which the execution of a program might be
required to change the contents of several pre-existing files, or make new
ones.  For these cases, I think we need a way of connecting variables to files,
but the same problems described above for the OUTPUT variable apply.

Surely you applicativists out there have some way out of this mess, don't you?


Michael Condict		...!cmcl2!csd1!condict
Courant Inst., N.Y.U.

hal@cornell.UUCP (Hal Perkins) (09/01/83)

I've got two comments on Mike's latest note.

1)  The proposed "slow" property for data types--similar to Pascal's
"packed".  The packed keyword in Pascal is one of the things that didn't
really work right.  The intent of "packed" was to indicate to the
implementation that storage for the data type can be economized, even if
this requires extra time to access components of objects of the type.
Furthermore, this should not affect the logical correctness of the program.
(If I remember correctly, the Axiomatic Definition of Pascal by Hoare and
Wirth doesn't even mention packed, except to say it has no effect on the
meaning of the program.)

But things didn't turn out this way, for exactly the same reasons that
I couldn't mix arrays in the OS/360 file system with arrays in main storage
efficiently.  Procedure parameters must be treated differently if they
are packed or unpacked structures or components of these.  And there are
a bunch of ad-hoc restrictions dealing with packed variables in Pascal.
For instance, components of packed objects can't be passed as variable
parameters because call-by-reference couldn't be implemented correctly
unless the procedure were prepared to deal with a part-word address on
word machines.  And strings are always packed arrays of characters, which
can't be mixed with just simple arrays of characters.

So yes, it seems to me that one could add a "slow" attribute that would
be similar to packed, and would have to have a similar set of ad hoc
restrictions on how it could be used and how slow variables could be
mixed with fast ones, passed as parameters, etc.  From a practical
standpoint, this would be a definite improvement on the usual baroque
set of file-system functions needed to access external data.

But in the long run, what is really needed is to understand how to deal
with multiple concrete representations of the same abstract type in a
way that allows one to program in terms of the abstract type without
being bitten when the concrete representation is changed.  "Packed",
"slow", and other similar things should just affect the pragmatics of
the program (time and space required to execute), and a correct program
should remain correct when the representation is changed.  This is hard,
and I haven't seen any nice solutions yet.  Ada does a little better
than most languages.  Derived types can be used to get different concrete
representations for the same type, and there are games that can be played
with generics, and it is possible to convert between related types.  But
it is still not possible to write a logically correct abstract program
and then tinker with the representations without having to touch the
abstract program sometimes.  If anyone knows of a language that does this
cleanly, please let me know, as it would save me a good bit of research work.


2) Applicative languages.  One way to clean up the problem of I/O is to
remember that the I/O in typical languages and systems is used for two
completely different purpose.  First, there is data that is read from
and written to the outside world--things read from keyboards, printed
on paper, signals used to control chemical plants, etc.  These all have
the characteristic that they are external to the computer system and
are generally either sources or sinks of data.  I doubt there will be
any purely applicative way to deal with these--streams seem to be the
simplest natural model.

The other major use for typical I/O is the one that is mucks up clean
programs the worst, and that is to use I/O instead of having a
sufficiently large virtual store.  This is also accompanied by using
the same I/O libraries to implement various abstract data types.  (One
of the many reasons that the I/O documentation for many systems is so
unpleasant and hard to understand--OS/360 is probably the worst--is that
these two logical functions (shoveling data to and from the disk and
organizing the disk data into some sort of data structure) are hopelessly
mashed together into the same unmanagable package.)

To simplify this mess, one can postulate a decent virtual storage system
to take care of the primary<->secondary storage transfers and then deal
with the logical organization of the data separately.  We are still left
with streams that communicate with the outside world, but we've gotten rid
of a lot (in fact most) of the logical complexity of typical file systems.
Maybe this would help.

Then again, networks would seem to mess this up.  Suppose you have a small
computer paging over the network.  Is the network a stream--a source and
sink for pages--or is it somehow part of the sufficiently large virtual
store that is supposed to make some of the I/O go away.  It is probably
both when viewed from different levels.


Hal Perkins                         UUCP: {decvax|vax135|...}!cornell!hal
Cornell Computer Science            ARPA: hal@cornell  BITNET:  hal@crnlcs