condict@csd1.UUCP (Michael Condict) (08/25/83)
Recently I was prompted to reconsider a problem of programming language design that I was working on several years ago. Simply put, it concerns the abysmally inelegant manner in which I/O is performed in even the fanciest, most new-fangled languages, such as PROLOG and modern LISP dialects. No matter how applicative or logically-based the language is, the best that designers seem able to come up with for I/O primitives amounts to procedure calls with side effects that get or put the next element of a sequence of values. The user is forced to view input and output in a procedural manner and is forced to view a file as a sequence of objects (in many languages the objects must be scalars, such as bytes or integers), rather than as, say, an array of records each one of which contains a mixture of character string and numeric fields. Wirth knew the way out of this mess when he designed Pascal way back in the early 70's, I think. Instead of making I/O operations available only through procedures with side effects, he put a FILE type into the language so that files could, to some extent, be treated as variables. This gave files access to the entire world of Pascal data structures, greatly increasing the elegance with which operations on complicated data bases could be performed. There are a few substantial problems and limitations associated with Pascal I/O, however, not all of which are attributable to its design as stated in Jensen & Wirth. The most serious of these is in fact a matter or interpretation by compiler writers, caused by Wirth's unfortunate choice of the reserved word FILE (which shows that he may have tacitly agreed with the compiler writers subsequent interpretation, even though he was not willing to endorse it in the Revised Report). To make a long story short, there is little justification beyond compatibility with other implementations for requiring that the FILE data type be allocated on a mass-storage device with a name and lifetime external to the program and for prohibiting any other data type from being used this way. It would have been far better, I think, had Wirth chosen the name SEQUENCE instead of FILE, and not just because a legalistic reading of the Revised Report shows that the FILE data type has no particular connection with operating system files (beyond the connection through the PROGRAM header, which is further indication that Wirth would take the implementor's side against me). Several more substantial arguments can be made for treating FILE's as nothing more than sequence, or string, variables, and for allowing arbitrary Pascal variables to be connected to external files. First there is the often stated complaint that Pascal has only sequential I/O (this because a FILE is a sequence), a problem that is remedied in incompatible ways by making extensions to the sequence operations GET and PUT, to allow, e.g. the specification of an index. If arrays could be connected to external files in the same way that FILE variables are forced to be, Pascal WOULD HAVE random-access I/O. Conversely, many users want the ability to manipulate sequences of objects, such as variable-length character strings, stored in "fast" program variables, without having to put up with all the junk and inefficiency associated with operating system files. Proposals for a STRING data type of various sorts abound in the Pascal literature, even though the FILE data type (with some restrictions removed) is the data type they want and it is already there. In fact, although it needs such library routines as INDEX and SUBSTR to make it convenient, it is quite a general character-string mechanism, since it includes conversion between numbers and characters in its built-in set of operations (through the use of READ/WRITE). Let me make it clear that I am no longer interested in convincing Pascal purists that there is anything wrong with their wonderful and esquisitely perfect language. I merely use it to give a proposal for a better solution of the I/O problem (and most would agree that it is a serious problem) in the design of programming languages. We will never achieve portability in the I/O operations of real-world programs until the languages they are written in can support at least a fragment of the complexity (and versatility) of I/O operations that are allowed by modern operating systems. Otherwise each language implementor will continue to insert incompatible hooks into each compiler to allow access to these operations, resulting in a perpetual and unnecessary proliferation of language dialects. To close a somewhat windy and highly opinionated article, I put forth a question to this readership: Tell me why must there be a separate set of constructs in a programming language for performing I/O, rather than just one construct that associates a variable with an external file. In what way are the data structures of a language like C, Pascal or Ada inadequate to the task of representing and manipulating data on mass-storage devices? Michael Condict Courant Inst., New York Univ. 251 Mercer St. New York, NY 10012 ...!cmcl2!csd1!condict
rehmi@umcp-cs.UUCP (08/26/83)
1) Don't you *dare* mention C within ten words of Pascal, Ada, or any other such structured, verbose crocks. 2) I have rarely had any difficulty in porting programs written in C across any forms of systems, in the way of i/o, and in fact in any other usual ways... The general problem seems to be foolish assumptions about the size, order, and meaning in the way variables are stored. But this usually occurs in systems-type hacks only, because at writing time you might not think it would be ported about. "Typical user" code (the latest aaai buzzword!) moves with practically no difficulty, easiness being proportional to the user's naivete about the underlying machine. Once you see the full system below you, you have to restrain yourself from bumming every microsecond you can. 3) Along with C, you practically always get a Un*x or Un*x-like system interface, and always a standard C library. Part of Un*x's win comes from the fact that it imposes no structure at all on files, and treats them as a seekable byte-stream. The structure that results is entirely up to the user and highly flexible. And since that structure (if any) is defined in the user's code, that means that it should port trivially. And it does. As I said, I shift lots of code among about 4 or 5 C's/Unixen all the time, and only once did the i/o part break in the port. This was because the i/o was being done on the raw disk file. 4) Which brings up another point. For the most part, you can treat terminals, printers, and all devices, even the network, as files. Of course, you get to send control messages to these special device files, with a mechanism separate from the actual i/o, but of a similar form. Fred Blonder around here even wrote a short (about 20 lines, I believe, when it came down to it) pseudo-device which would duplicate a file descriptor you already have open, when it was opened. The uses of this are usually when a program wants a file name, but you want to feed it input from a pipe. [He's umcp-cs!fred, if you want it]. 5) Pascal does not map nicely onto any machine I have seen it implented on. In all my C's except the 6502 version, C hits it off with the underlying machine's architechture and instruction set very nicely. C produces tight, fast code. I too once thought Pascal was the world's greatest language, and thought that C was a hairy, unusable, unreadable language. But many, if not all of Pascal's constructs are subsumed by C. This I found out the hard way when my Lisp interpreter was due in 1.5 weeks. All the poking I had done in Pascal (on bsd Unix, fairly friendly) didn't work and was so verbose it was confusing to read. Verbose, constrained, deterministic (Pascal is just a syntax table), and very unfriendly are the best descriptors of this language. Who gives a damn whether the array starts at 0, -5, 4 or 100? I am least confused by having everything start at 0. And what if I want to get the raw bytes to/from a file? In C, I just slurp some into a buffer (whose address I can know and twiddle if I so choose). But in Pascal, it becomes a trick to figure out how *this particular* implementation will let you do that, if at all. It usually won't. As for Lisp i/o, it isn't the greatest, but the Franz way of things seems fine for the moment. Go away, pascal! Not afraid to have an opinion, and back it up, -rehmi -- By the fork, spoon, and exec of The Basfour. Arpa: rehmi.umcp-cs@udel-relay Uucp:...!harpo!seismo!umcp-cs!rehmi ...!{allegra,brl-bmd}!umcp-cs!rehmi
wilner@pegasus.UUCP (08/26/83)
Michael Condict asks why there must be a separate set of consructs for performing IO. There need not be, but the answer is not to mash files into arrays. Devices exhibit a richer set of properties than arrays (e.g., access time, cylinder organization, sequential transfer). More often that not, applications need some control over these properties in order to run efficiently. It seems highly desirable to have one basic declaration for what we now call strings, arrays, files, pipes, and sequences. Let's not forget shell variables and command-line arguments while we're at it. The invention that we'd all like is a unifying data type into which we can read, write, index, search and do all the common things in a common way, yet which continues to have time-dependent behavior and to exhibit behavior that arises from the geometry of the hardware out of which it is constructed. As long as we can additionally express such things as "block size" and number of buffers for these new-fangled strings, we don't need files explicitly. Finally, if one's system has a swift implementation of these beasts that doesn't use disk for the little ones and does use disk for the big ones, everyone should be happy. Languages are loath to admit time- and space-dependent behavior into their data types, but if we are to unify arrays and files, some compromise is necessary.
alan@allegra.UUCP (08/26/83)
In response to Rehmi... Don't you *dare* mention C within ten words of Pascal, Ada, or any other such structured, verbose crocks. What's your problem, anyway? Are you upset because we haven't all accepted C as the One True Language? In the future, why don't you save such stupidity for net.religion.c, and stop bothering the rest of us? Part of Un*x's win comes from the fact that it imposes no structure at all on files, and treats them as a seekable byte-stream. That's very interesting. The discussion was about I/O operations in programming languages, not in operating systems. There is no I/O in C, so your article seems pretty pointless. Give me a break... Alan Driscoll Bell Labs, Murray Hill
laura@utcsstat.UUCP (Laura Creighton) (08/28/83)
Keep the IO *out* of the language if you want to write portable code. No matter how wonderful your ideal concept of what bits, bytes, and datastreams ought to be, no matter how aethetically glorious your opinion of what a file is -- there are going to be a lot of folk out there who have equally glorious ideas that are out-and-out incompatible with yours. And a lot of these folk build hardware. Laura Creighton utzoo!utcsstat!laura
gumby@mit-eddie.UUCP (David Vinayak Wallace) (08/29/83)
Well, one good thing about c is that IO isn't really part of the language at all; it's all done via functions in libraries. Modern languages are starting to use streams (e.g. Common Lisp, Ada). Streams are generic objects all of which do a certain number of standard operations and which can be extended for special purposes. Then you can have a special stream which reads or writes special structures for some operation. Then other programmers can access these streams without knowing their structure. The problem with embedding IO in the language definition (i.e. BASIC) is that each programmer must grok the file (and device) format. Device IO probably can't be improved. But maybe we can just flush files, or at least produce a better model? david
condict@csd1.UUCP (Michael Condict) (08/30/83)
Now I am genuinely confused. I've heard these statements before that C does not have I/O built into it and how wonderful this is, and I've never been able to understand them. People claim it improves portability, so I guess that's why they find it wonderful, and who can disparage the goal of portability? The only problem I have with these claims is that I can't see any reasonable view of C or programming languages in general under which they are not obviously false! Let me elaborate (and you can't stop me from there anyway, can you?): DEFINITION 1. Input/Output (I/O). The means by which data is transferred between an executing computer program and the rest of the world. THEOREM 1. A programming language without any way of doing I/O is of purely theoretical interest. PROOF. Directly from the definition of I/O, since anything that is computed by a program in the language cannot ever be discovered except by redoing the calculations by hand, which clearly is not practical. THEOREM 2. C has I/O. PROOF. We could prove this by contradiction of theorem 1, using the fact that people really do execute C programs on computers, so it is apparently not just of theoretical interest. A more direct approach is to note that to whatever extent C is a language defined independently of any particular implementation, it is a language that has several I/O constructs. I'm not familiar with all of them but I know that printf is one of them [K&R 1978]. Conversely, to whatever extent C is a family of languages, say, a different one for each C compiler, it is a safe bet that each member of the family has I/O. For instance, on any Unix implementation of C, the primitive I/O constructs are read, write, open, creat and a few others. Printf, scanf, getc and so on are all definable in terms of these system calls. Now let's look at portability: THEOREM 3. Let X be a language without I/O. If any particular compiler for X is to be useful, a set of I/O constructs must be added to the language processed by the compiler. That is, the compiler must process a language different from X. PROOF. Directly from Theorem 1. The fact that the only addition to X consists of adding some library routines, rather than new syntax, is not relevant -- that these routines are guaranteed by the local reference manual to be available makes them part of the local language. THEOREM 4. Let X be a language without I/O. Complete, portable programs that are useful cannot be written in X. PROOF. By theorem 3, we know that every particular implementation of X will have its own set of I/O constructs. And to whatever extent each implementation of X has the same set of I/O constructs, X is not a language without I/O, (or at least the informally defined extension to X that everybody is implementing and using is not a language without I/O). So we are forced to conclude that there are several different ways of doing I/O depending on which compiler is in use. Hence, useful X programs, since they perform I/O, must be changed when moved from one compiler to the next. All facetiousness aside, would somebody please show me where my reasoning has gone astray here? My best guess as to why people say that C has no I/O is that all its I/O is performed with function calls, but I can't see how there being no separate syntax for the I/O constructs matters one hill of beans. We certainly don't say that FORTRAN has addition but does not have the absolute value operation simply because the latter is a function call while the former uses infix notation; or do we? As far as the portability issue goes, we've already had one response arguing from bitter experience against the portability of C, so I don't feel quite as badly admitting that I haven't the foggiest idea how you support this view (and usually I can anticipate the other side of a debate to some extent). Laura, can you help me out here, since you are one of the people who subscribe to these views? REFERENCE. Kernighan B. & Ritchie D, "The C Programming Language", Prentice-Hall, 1978. Michael Condict ...!cmcl2!csd1!condict Courant Inst., N.Y.U. 251 Mercer St. New York, N.Y. 10012
robertd@tektronix.UUCP (Bob Dietrich) (08/30/83)
I thought I'd reply to Michael Condict's (csd1!condict) discussion of I/O in programming languages. There seems to have been at least one other reply on the net from someone named umcp-cs!rehmi, but it was quite garbled. Something about hacking in a higher-level assembly language call C, and the fact that it's rarely found anywhere but in a Unix environment. I agree with most of Michael's comments, especially the fact that sequences were misnamed files. I have also fought this battle long and hard and loudly. (Perhaps this is a very good caution to be careful in picking terminology!) As far as Pascal goes, nothing is stopping implementors from providing random access to mass-storage via Pascal's random access abstraction, the array. The appearance of a variable in the program heading means only that it is bound to an object outside of the program. The mechanism and the allowed types is determined by the implementation, not the language. Also, the ANSI-IEEE Joint Pascal Committee has a proposal for using random access files. (By the way, if you want to attend the next meeting in October, see net.lang.pascal or contact me). A few comments seem in order relating to programming language design and implementation. First, language implementations are usually done quite separately from the file system. This means that the language implementor must map the abstraction of the language onto some predefined primitives provided by the file system. The trouble is that the necessary primitives may not be there, or else do a poor job of supporting the abstraction. The result is that it is useless to put a feature in a language that depends on file system support that is not universal, because that feature will simply not be implemented. The dictatorial approach (as in the DoD and Ada) doesn't work either. If keyed-file access were added to Pascal, would that mean your home computer (with wonderful cassette tape) couldn't run Pascal? I'm not arguing for the lowest common denominator in languages; it's just very hard to know where to draw the line. My second observation is that few programming languages reinvent the wheel; instead, a few corners are added in the hope that enough will make the wheel roll more smoothly. Thus the design is influenced heavily by others that have gone before (learning from mistakes is fine; keeping them is not) and by the environment the designer is currently working in. Pascal, although not as strongly influenced as FORTRAN was by the IBM 709 architecture, seems to have been influenced by the CDC Cyber operating system SCOPE. This is at least true in the use of the program heading for communicating information between the program and the outside world. CDC FORTRAN (and other languages?) has long provided this facility, but only for files (in the operating system sense). Pascal was also designed to provide this facility, but as Michael pointed out, was fortunately not limited to files in the program heading. Some evidence for Pascal being influenced by the CDC environment is the fact that unlike regular parameters, program parameters are simply mentioned in the heading and defined (given a type) later on in the program block. Most implementations, if they pay attention to the program heading at all, only allow files because 1) that's the way the first implementation, Pascal 6000, did it; 2) the P2/P4 portable Pascal compilers that fathered most Pascals in the world did it that way; and 3) most operating systems either don't know what a command line is or make it difficult to parse. My third comment relates to implementors (I am both a user and implementor of Pascal). Note: It's been suggested to me that the remainder of this text borders on flaming. Several things usually effect the way someone implements a language, often in such a way that the result is an extended subset of the language instead of the whole thing. For instance, 1) The implementor is rushed to get "something working", so the simple things like expressions are done first, and the parts that look harder are left to be studied and implemented "later". "Later" often never comes. 2) The implementor is invariably under some sort of deadline, so that the parts left to be implemented near the end of the schedule are implemented in a rush, and perhaps with less care. 3) Implementors, like most people, are uncomfortable with things they don't understand. Such things don't get implemented. 4) A particular feature in language X is "just like in language Y", and so that feature will be easy to implement. Right and wrong. Yes, in isolation the feature is just like another language's feature, but in relation to the rest of the language the impact is quite different (usually a disaster). This is especially true of grafting features from one language onto another. 5) The simple question is rarely asked: "Is the abstraction I'm looking for already in the language (perhaps in a form I'm not familiar with) ?" 6) The "creeping feature creature" grabs hold of the implementor ("If I add this feature, then I can eliminate 20 whole statements from my 5000 line compiler!!") 7) Some implementors are just plain lazy. Lest I seem to be merely maligning implementors, let me say a few words in their defense. Often an implementor is between a rock and a hard place. The implementor must take a language that the user expects to be clairvoyant and implement it in a relatively hostile environment. Usually the operating system has long since been frozen and the machine instruction set was designed by a electrical engineer that says "Huh??" when you use the term "software". An added benefit that we are fortunately moving away from is that the compiler has been specified (by someone else) to run in 2K bytes of memory and compile 20,000 line programs at 10,000 lines a minute on an 8080 with floppies. (Oh, and by the way, can you have it done next month?) Given the environment that implementors have had to work with, it's a wonder we have languages at all. Bob Dietrich Tektronix, Inc. (503) 629-1727 {ucb or dec}vax!tektronix!robertd uucp address robertd@tektronix csnet address robertd.tektronix@rand-relay arpa address
alan@allegra.UUCP (08/30/83)
All facetiousness aside, would somebody please show me where my reasoning has gone astray here? My best guess as to why people say that C has no I/O is that all its I/O is performed with function calls, but I can't see how there being no separate syntax for the I/O constructs matters one hill of beans. We certainly don't say that FORTRAN has addition but does not have the absolute value operation simply because the latter is a function call while the former uses infix notation; or do we? In Fortran, the absolute value function is part of the definition of the language. It is a built-in function. If your Fortran compiler does not supply it, then it's not a correct and complete compiler. So it's fair to say that Fortran has an absolute value operation. In C, on the other hand, no I/O operations exist in the definition of the language. From "The C Programming Language" by K & R: Finally, C itself provides no input-output facilities: there are no READ or WRITE statements, and no wired-in file access methods. So, your C library (not your C compiler) may include "printf", but it doesn't have to. With or without "printf", it's still C. Do you now understand why people say C has no I/O? Alan Driscoll Bell Labs, Murray Hill
hal@cornell.UUCP (Hal Perkins) (08/31/83)
Something's not quite right here. The claim is that the C language contains I/O operations because any language that can be used to write useful, portable programs must contain I/O operations. I don't agree, and I don't think the argument holds together. First, I think it is reasonably obvious that the C programming language itself does not contain I/O commands in the sense that FORTRAN, COBOL, PL/I or Pascal do. In these languages, I/O operations have special syntactic forms and semantics that, in most cases, could not be defined by procedures or functions written in the language. For example, the Pascal read, readln, write, and writeln operations can accept any number of parameters of any of the basic scalar types, and a special notation is available to specify the format of numbers printed by write and writeln. These procedures absolutely cannot be defined using the standard Pascal language; they have to be specified in the language definition or they couldn't be there in their present form. In C and Ada, among other examples, there are no special forms for I/O commands. So I claim that these languages do not contain I/O. How then is it possible to use these languages to write useful programs? The answer is that there are standard libraries or packages that provide I/O operations. Furthermore, virtually every implementation of these languages includes the an implementation of the standard I/O packages. For C, this is customarily the standard set of I/O routines described in Kernighan & Ritchie. For Ada, DoD has decreed that all certified Ada implementations must provide certain standard I/O packages. So it seems to me there is no problem with the statement that C does not contain any I/O. The incorrect statement is the one that claims that any useful programming language must contain I/O--otherwise programs cannot communicate with the world. The correct version is that either the language must contain I/O statements OR ELSE a standard package of I/O routines must be available with all implementations of the language. This is one of the reasons that ALGOL 60 never caught on. ALGOL 60 contains no I/O. Nothing wrong with that--C has demonstrated that a language without I/O statements is perfectly usable. But the ALGOL 60 community never defined a single, universally available, set of I/O procedures, so there was no way to write a portable program that communicated with the outside world. (Actually, attempts were made to specify a standard ALGOL 60 I/O package, but these were never widely adopted. I suspect that it is at least as difficult to standardize the I/O package definitions as it is to standardize language definitions. In the case of C, the Unix environment has been so closely associated with the language that it became the I/O standard by default.) To close this excessively long note, it seems to me that languages should not contain I/O commands because this helps to keep the language definition short and comprehensable. But it is equally important to define a basic set of I/O operations that are universally available with every implementation. Defining a good basic set of I/O operations is a difficult design problem. Separating this problem from the basic language can only help; it simplifies the number of things that must be designed at the same time. Hal Perkins UUCP: {decvax|vax135|...}!cornell!hal Cornell Computer Science ARPA: hal@cornell BITNET: hal@crnlcs
condict@csd1.UUCP (Michael Condict) (08/31/83)
Boy, this is getting rough -- people are starting to talk about breaking each other with clubs! Not to bring down such mayhem on my own poor head, I would just like to reiterate a point that was also made by at least one of my allies. Perhaps I can avoid all attempts at cuteness this time and confine myself to the facts (but I doubt it). The issue that seems to be most widely disagreed upon is whether or not C has I/O. When I referred to K&R I did not mean to plant myself in the camp of the faithful Bible thumpers who take everything found there as divinely inspired, hence unarguable. I only meant to point to the source most influential in the design of C I/O as it exists today. I don't particularly find it important whether the book says that I/O is part of C or not or shows printf in the syntax summary. In fact, there happens to be absolutely no mention of I/O, even of its omission, in the C Reference Manual (the "official" language-defining portion of the C book). All of this is irrelevant, because a language is defined by its use, not its official definition -- this is true of English and is just as true of program- ming languages (even Ada). One person called this the "language together with its environment", but I don't see how to make such a distinction. The language consists of those sequences of symbols you can write down with reasonable confidence about their meaning. For a programming language with dozens of different compilers running on thousands of different computers, each with its own support library, there is no hard and fast definition of what is in the language and what is not. All the same it is just as clear that printf is in the C language as, say, the union type, since more implementations probably provide the former than the latter. Anyway, let's not argue about trivial semantic issues like this. I'll agree to say that C has no I/O if you'll agree that there is a certain extended language, let's call it C+, that everyone uses instead of C and that does have I/O. See, we're both right! What I'd really like to see is a resumption of the discussion about improving the way I/O is handled in programming languages; that is, not whether C has I/O (every implementation will), but what is the best way to connect a high-level language, especially one oriented towards very abstract program- ming, such as SETL, LISP or PROLOG, to the grungy, incredibly over-designed and horribly complex I/O monster found on typical operating systems. One popular point of view seems to be that the official language definition should avoid the I/O issue entirely, presumably allowing individual implementations to define this necessary portion of the language by default. The claim is that this leads to portability of programs. I'm still waiting for an explanation of the reasoning behind this conclusion, which I am at a loss to reproduce. Even if you convince me that this is the right way to go, a possibility I have not ruled out at all, we don't seem to have made any progress on the original problem, which is how to define the I/O portion of programming language. You've just told me WHO should do it (the compiler writers, rather than the language designer) rather than HOW it should be done. Michael Condict ...!cmcl2!csd1!condict Courant Inst., N.Y.U.
hal@cornell.UUCP (Hal Perkins) (08/31/83)
Bob Dietrich's article arrived while I was posting my previous note. I'd like to add one thing to what he and Mike Condict have both discussed. From a logical standpoint, it is very clean to allow Pascal program parameters to be of any data type. One can get random-access files this way very without adding any cruft to the language. But this is a bitch to implement on most conventional operating systems. The representation of random-access files (arrays) on disks and arrays in main storage, and the operations needed to access elements of these two different kinds of arrays, are vastly different in almost every conventional system (including Unix). So if you try to allow some program variables and parameters to be associated with disk files and others to be associated with objects in central storage, you must use different mechanisms to access the elements of the different sorts of variables. This is moderately unpleasant to compile and generate code for, but for simple variables it's not too bad. Just keep something in the symbol table to indicate whether the data is in main store or in the file system, and generate the correct code to access it directly or else call the operating system. But.... what about parameters to procedures? Suppose I have two objects of the same logical data type, one of which is in main store and the other of which is a program parameter. And I have a procedure with a parameter of this type. What sort of code do I compile in the procedure to access the parameter? You really have to use something like the thunks used to implement ALGOL 60 call-by-name. And this is a bad efficiency penalty for access to actual parameters that are objects in main storage and not in the file system. Given this situation, most implementors are not going to go to the trouble of supporting uniform notations in the language for objects both in main storage and on the disk, since the compiled code would be too slow for the "real world", where performance counts. (I plead guilty to this. I did a lot of the work on the interface between the Pascal 8000 compiler and IBM's OS/MVS/CMS systems. I wanted to implement random files as array program parameters, but the implications for code generated for all array accesses were terrible.) As long as there are two separate address spaces for main storage and the filing system as provided on most machines, this problem won't go away. This seems to me to be one of the best arguments for the one-level-store concept (as in Multics, et. seq.), since a one-level store would allow the use of large variables and data structures in exactly the same way as smaller data objects (as suggested by Condict) and make it possible to eliminate the extra baggage needed for files. A decent abstract data type facility would allow the definition of trees, indexed data structures, and other things typically found in file systems. However, I'm pretty pestimistic that this could be done on most conventional architectures since it seems that hardware support would be needed for a decent segmented memory with a very large address space to make the whole thing work. I don't see how one can get away from the concept of files and variables as different sorts of gadgets on most existing machines and operating systems. Cheers Hal Perkins UUCP: {decvax|vax135|...}!cornell!hal Cornell Computer Science ARPA: hal@cornell BITNET: hal@crnlcs
mrm@datagen.UUCP (08/31/83)
C does not have I/O built into the language per-se, since the compiler proper does not recognize calls to printf et. all. Rather it interprets them as normal function calls (and/or macro replacements). The user is free to write his/her own `printf' that does input instead of output or what have you. Also C stdio is not completely portable since a header file could change a declaration from `int' to `long' on any given machine and the format strings must be changed manually from "%d" to "%ld". On a 16 bit machine this would produce incorrect results, whereas on a 32 bit machine it would produce correct results. Since the I/O is not builtin, the compiler cannot tell the I/O system the type of the variables through dope vectors, and the poor user is left to manage as best s/he can. This discussion also brings to mind that this is a good case for bringing back the old "%r" format. For those of you who don't know, %r interpretted the next argument as a nested format, thus you could make one macro be the standard format for any given type, which you might print out in several different places. For example: #ifdef PDP11 typedef long type; #define FORMAT "%ld" #endif #ifdef VAX typedef int type; #define FORMAT "%d" #endif type a; ... printf( "a is %r\n", FORMAT, a ); (allegra,ittvax)!datagen!mrm
mark@cbosgd.UUCP (08/31/83)
In reply to Michael Condict's "proofs" that C has I/O: I challenge you to find any reference to printf or any other I/O operation in the C Reference Manual. (That's the appendix to the C book, for example.) The C language simply does not have I/O as a defined part of the language. There is something called the "standard I/O library" which nearly every implementation of C has. This includes fopen, <stdio.h>, printf, and a couple dozen other routines. There are implementations of C that don't include this (Idris comes to mind, as well as V6, which had a different "portable C library" that flopped), but the current de facto standard is that you get a C compiler AND stdio. (In fact, most implementations go one step further and provide something called "the C library" which was originally supposed to be UNIX specific. This includes string operations, routines like open, read, write, and lseek that deal with small integer file descriptors, and command line parsing implementing the UNIX argc/argv notion and < and > redirection.) This does not make them part of the language. It just means that in order to use C, you need both a compiler AND a library. This is in contrast to FORTRAN, where I/O and a set of built in functions are defined in the manual to be part of the language. Claiming that if you don't have I/O in the language you have a language of only theoretical interest is silly. It assumes you can't put tools together. Kind of like saying that since Goodyear makes tires, if Goodyear doesn't make things to put the tires on (e.g. cars) then Goodyear products are of only theoretical interest.
perl@rdin.UUCP (Robert Perlberg) (08/31/83)
cFbr*w In response to umcp-cs!rehmi's article on C portability, you are missing one very important point. You only refer to porting C from UNIX to UNIX. THAT's why the I/O always works. If you are only concerned with porting from UNIX to UNIX, ANY language will port with no I/O problems. It has nothing to do with C. In fact, C I/O is so heavily based on UNIX I/O concepts that it is perhaps the HARDEST language to port to a DIFFERENT operating system. I had to use C under CP/M. CP/M does not keep track of the size of a file in bytes; only in blocks. If you had to port most UNIX utilities to CP/M, it would be nearly impossible. I had to write my programs to use header records to indicate the length of the file since the OS wouldn't do it for me. I partly agree with your first statement: "Don't you *dare* mention C withing ten words of Pascal, Ada, or any other such structured, verbose crocks." Take out the word "crocks" and I couldn't agree more. C isn't worthy of being associated with languages that were truly designed for portability and logical correctness. With regard to portability being proportional to one's knowledge of the machine, you're right; but in the wrong direction. The more you know about the differences between machines, the more careful you will be about exploiting a non-portable feature. I am constantly having to patch up code submitted by hackers who do whatever seems to work on their machine because they don't understand that you can't pass just any old variable type to a function. And as far as C being hairy and unreadable, that depends on how you code. Most C code I see outside my company looks like it was written by a first year BASIC hacker. There's no law that says that you have to fit as many functions into one statement as the compiler can (or often can't) deal with. I agree that Pascal is not the answer, but whoever it is that maintains the C standard doesn't seem to be aware that there's a question! Why can't C have dynamically dimensioned arrays, run time error messages, array bounds checking, and a few other little bits of civilization that every other language since FORTRAN has had? I've known people to spend DAYS trying to debug a program before they discover that a variable is being passed to a function that's expecting a different type. Just like V'GER, the C standard MUST evolve. It maddens me to see people trying to keep C from getting better in the interest of standardization. If you want stunted standardization, use FORTRAN. Let's let C stand for Civilized. Are you listening, Dennis? Robert Perlberg Resource Dynamics Inc. New York philabs!rdin!perl
laura@utcsstat.UUCP (Laura Creighton) (09/01/83)
So sorry, but I misunderstood the original article. I THOUGHT that what was being discussed is an ideal language to do 'local area networks' (magic phrase, fill in with whatever is current wherever you are to discribe same). This was not the case, I am sorry to have misunderstood. i got this horrible concept of where a "file" was a 'data packet' (another magic word, your word may be different), and the whole language had to comprise every device driver known to man (and whatever other strange creatures build devices). Given that the language designer does a good job, and defines the behavior of all the IO in excruciating detail -- so that anyone who corresponds to the standard must by definition be compatible -- I do not think that it matters whether the IO is in or out of the language. In fact, if it is in the language, people like Whitesmiths cannot pull a sleazy trick of writing an incompatible compiler which strictly speaking IS compatible for it is only <stdio.h> (<std.h>) which is different. Whitesmiths can get away with it, sure, but porting things to places which have a Whitesmith's C compiler is a real pain... (Actually, I am told that Whitesmith's now has a compatible <stdio.h>. This is news to me, and I cannot personally vouch for its accuracy, but the person who just told me is not known for making wildly innaccurate statements. If this is the case then I say "Its about time!") laura creighton utzoo!utcsstat!laura
perl@rdin.UUCP (Robert Perlberg) (09/02/83)
In response to umcp-cs!rehmi's article on C portability, you are missing one very important point. You only refer to porting C from UNIX to UNIX. THAT's why the I/O always works. If you are only concerned with porting from UNIX to UNIX, ANY language will port with no I/O problems. It has nothing to do with C. In fact, C I/O is so heavily based on UNIX I/O concepts that it is perhaps the HARDEST language to port to a DIFFERENT operating system. I had to use C under CP/M. CP/M does not keep track of the size of a file in bytes; only in blocks. If you had to port most UNIX utilities to CP/M, it would be nearly impossible. I had to write my programs to use header records to indicate the length of the file since the OS wouldn't do it for me. I partly agree with your first statement: "Don't you *dare* mention C within ten words of Pascal, Ada, or any other such structured, verbose crocks." Take out the word "crocks" and I couldn't agree more. C isn't worthy of being associated with languages that were truly designed for portability and logical correctness. With regard to portability being proportional to one's knowledge of the machine, you're right; but in the wrong direction. The more you know about the differences between machines, the more careful you will be about exploiting a non-portable feature. I am constantly having to patch up code submitted by hackers who do whatever seems to work on their machine because they don't understand that you can't pass just any old variable type to a function. And as far as C being hairy and unreadable, that depends on how you code. Most C code I see outside my company looks like it was written by a first year BASIC hacker. There's no law that says that you have to fit as many functions into one statement as the compiler can (or often can't) deal with. I agree that Pascal is not the answer, but whoever it is that maintains the C standard doesn't seem to be aware that there's a question! Why can't C have dynamically dimensioned arrays, run time error messages, array bounds checking, and a few other little bits of civilization that every other language since FORTRAN has had? I've known people to spend DAYS trying to debug a program before they discover that a variable is being passed to a function that's expecting a different type. Just like V'GER, the C standard MUST evolve. It maddens me to see people trying to keep C from getting better in the interest of standardization. If you want stunted standardization, use FORTRAN. Let's let C stand for Civilized. Are you listening, Dennis? Robert Perlberg Resource Dynamics Inc. New York philabs!rdin!perl
mjl@ritcv.UUCP (Mike Lutz) (09/02/83)
Here I go off on a vacation for a couple of days and things really light up (or heat up). Anyhow, having perused the 30-odd articles, some odder than others, on the subject of I/O, C, and UTOPIA84 (the language that solves everybody's problems), I will add my two cents. It is a red-herring to ask whether or not C has I/O -- of course it does. However, the real issue (and one which few have addressed directly) is the manner in which I/O is specified. As Dennis Ritchie says, C separates the language issues from the particular form in which I/O is expressed in the language. In this way, new I/O packages can be tried out without jeopardizing existing programs that use previous packages. What is more, as technology progresses, new I/O paradigms can be included without sending all the compiler writers back to their terminals to hack at LEX, YACC, and the code generators. An example of a new paradigm is the curses library, providing a screen oriented I/O interface: a mode of I/O that was unknown (or at least unusual) when C and Pascal were being developed. Whereas the addition of this new paradigm to C is relatively easy (just write new functions), in Pascal it's a real pain. The root cause of the problem was hubris on the part of Pascal's designers, who decided in 1970 that they "knew" how I/O should be modeled once and forever, and who instead created a iron box which has imprisoned a generation of programmers. I bet dollars to donuts that a similar fate awaits those who "know" how I/O should be implemented in 1983, and are willing to embed their ideas in special linguistic forms. So, though there definitely is I/O in C, the crucial point is the manner in which it is incorporated. The library function approach is a source of power and flexibility, and my plea would be to build on this protean base. The alternative is a procrustean I/O model (such as found in most other languages). Mike Lutz {allegra,seismo}!rochester!ritcv!mjl
rehmi@umcp-cs.UUCP (09/02/83)
From: perl@rdin.UUCP You only refer to porting C from UNIX to UNIX. I should have been clearer - some ports were from C w/Un*x to standalone C with a library to do unix-like i/o. THAT's why the I/O always works. If you are only concerned with porting from UNIX to UNIX, ANY language will port with no I/O problems. It has nothing to do with C. In fact, C I/O is so heavily based on UNIX I/O concepts that it is the HARDEST language to port to a DIFFERENT operating system. I had to use C under CP/M. CP/M does not keep track of the size of a file in bytes; only in blocks. If you had to port most UNIX utilities to CP/M, it would be nearly impossible. I had to write my programs to use header records to indicate the length of the file since the OS wouldn't do it for me. But C does not have I/O! Look, under vanilla bsd/bell you have at least three immediately obvious choices for i/o. The first is directly with system calls, e.g. read() and write(). The second is with the buffered i/o library, e.g. fread() and fwrite(). The third is with the stdio library, like scanf() and printf(). Even the system calls are in a library (the calling sequence). I agree that Pascal is not the answer, but whoever it is that maintains the C standard doesn't seem to be aware that there's a question! Why can't C have dynamically dimensioned arrays, run time error messages, array bounds checking, and a few other little bits of civilization that every other language since FORTRAN has had? I've known people to spend DAYS trying to debug a program before they discover that a variable is being passed to a function that's expecting a different type. Because they're slow (in reference to both "Why can't..." and "I've known..."). Besides, there are often libraries for doing error messages easy. And what do you think link is for? Decoration? It does a pretty complete rundown over the code in question, including type checking, unreferenced variables, etc... There is this library function called "malloc" which will give you all the space you want, so you can have dynamic allocation. Array bounds checking is slow. Besides, what is the point of starting your array at element -5 and running up to 27? It is slow, and adds to the confusion... Just like V'GER, the C standard MUST evolve. It maddens me to see people trying to keep C from getting better in the interest of standardization. If you want stunted standardization, use FORTRAN. Let's let C stand for Civilized. The C standard is spartan yet rich. It is concise and straightforward. You can ignore the machine it is running on, or write the machine's os in it. The language itself is a win. The libraries... well, all things must change, hopefully for the better. -rehmi -- By the fork, spoon, and exec of The Basfour. Arpa: rehmi.umcp-cs@udel-relay Uucp:...!harpo!seismo!umcp-cs!rehmi ...!{allegra,brl-bmd}!umcp-cs!rehmi
condict@csd1.UUCP (Michael Condict) (09/02/83)
I hope that Mike Lutz has cleared the air about the issue of whether or not C has I/O. I feel a little guilty about prolonging the discussion, which is of course a "red herring". In my defense, I was trying to indicate the undesirability of defining the boundaries of a programming language as "the set of strings given meaning by the compiler alone, rather than any set of object libraries, whether standard or local". I was attempting to use the Socratic method of posing questions that point out serious difficulties with widely held beliefs. Well it didn't work. I guess Socrates was a little better at it than I am. So now I'll have to resort to the direct approach. Why do I say this definition is undesirable? Other than the fact that my C processor may not involve a separate compiler, run-time system and object libraries, the problem that I was trying to expose is evidenced by the early responses to my first notes. People were expounding the virtues of C not having any I/O, without any recognition at all of the fact that this is only due to a technicality in their definition of what is in the C language. Most of these people would be quick to admit that there is a fairly well defined set of C statements (almost all of them function calls) that one can supply to almost any C processor to cause I/O to happen in a predictable fashion. Thus any advantages accruing to the C I/O paradigm are not due to the fact that I/O is not defined anywhere in the collection of documents, compilers and portable/standard object libraries that could be called "informal C + environment". Rather, as Dennis Ritchie and Mike Lutz point out these advantages must occur because of *where* I/O is defined in C, namely in the object libraries, rather than the compiler. Now to the main point of this note, the issue of whether such advantages actually do exist and why. The claim appears to be that I/O is one portion of a programming language that should not have separate syntax associated with it and be processed by the compiler. Of course, no one would argue that it should not have separate semantics associated with it, otherwise it would not exist! So we must conclude that the only way this can happen is to steal from one of the other constructs that do have syntax, in this case the function call. That is, we can arrange that certain function calls do I/O, instead of satisfying the otherwise inviolable rule that the function call can be replaced by C code that does the same thing. This is perfectly acceptable, given that we believe that I/O does not deserve its own syntax. In fact, my proposal for merging variables with files is almost the same approach. I want to steal from the syntax of variable declarations, using ones that were prefixed with the "slow" attribute to perform I/O, at least to files (if not devices). The reason I chose this approach is that, clearly, file I/O is a *data* operation, not a matter of logic and control structures. It therefore seems most natural to use the data-oriented variable mechanisms of the programming language to model I/O, rather than using the function call, which is control-oriented. The argument for using function calls was based on the belief that I/O is somehow less well understood or more rapidly evolving than other portions of programming languages and so, since compilers are more difficult to change than object libraries, we should put the definition in the library rather than in the syntax tables. Furthermore, users can avoid the effects of changes to libraries, presumably by refusing to use the new ones and keeping copies of the old. Okay, I'm willing to agree that these are real advantages. What hasn't been adequately explained to me is the belief that I/O is a less stable programming language feature than, say, abstract data types or parallel processing. And where do we draw the line on this language design methodology? The limiting case of the belief that the language should be defined by functions is of course LISP, where it is possible to believe that all but about a dozen functions are defined purely in LISP itself, including arithmetic and all control structures besides COND. I don't think that many adherents of the manner in which I/O is defined in C would want to give up infix arithmetic expressions for "add(x,y)", so I am interested in just what they believe are the sufficiently stable set of constructs worthy of inclusion in the "language" definition. I feel that I/O operations on simple character-stream devices and on files are, by definition, as stable and well understood as operations on main-storage variables, since the only technical semantic difference is the code that must be generated to do these operations. This is why I propose using the same syntax. Of course there will always be people who want to hook up strange and complex equipment to computers, such as automobile engines, Star Wars defense systems, marital aids and Things Not Yet Invented That We Can Only Dimly Comprehend. I never intended to imply that the control of such devices should in any way be built into a compiler, interpreter, or even a standard object library. But, Mein Gott, aren't programming languages mature enough yet to understand the concept of getting the nth record from a random-access file? They can learn these concepts from their creators or they can pick them up "in the streets", where each industrial implementor, seeing an opportunity to use the reputation of the language to market its own products, will teach the language how to connect itself to its own I/O system, usually of questionable virtue, and will then begin selling this "improved" language to all takers. Is this what we want for our loved ones? Michael Condict ...!cmcl2!csd1!condict Courant Inst., N.Y.U.
kurt@fluke.UUCP (Kurt Guntheroth) (09/03/83)
I/O in C/UNIX When one talks about a language used almost exclusively on a single operating system (C/UNIX), and which has a standard I/O package, you might as well consider that I/O is built into the language. In fact, I/O as part of the language I have seen languages which defined I/O as part of the language and they are usually ugly. I am thinking in particular of the proposed ANSI BASIC which has so many ways of doing the same thing to a file that it is impossible to remember what functions work in what modes and with what results. The real problem is that people have not spent enough time deciding what operatinos are good things for operating systems to support. In general there is not a good model of a file that is usable by enough people. UNIX and C have chosen a model that is at least consistent (seekable byte stream), but a little simple for many people and also a little non-portable to systems with different size 'bytes'. It would be nice if as much attention was devoted to formalisms for I/O and operating systems as was devoted to formalisms for programming language constructs. Unfortunately, there really are people in acedemia who treat I/O as unnecessarily dirty and 'real-world' and persue issues that can be theorized about with more detachment.
barmar@mit-eddie.UUCP (Barry Margolin) (09/04/83)
Well, it's is now my turn to be a twit and join this stupid debate. Someone made the point that leaving I/O out of the language definition permits you to experiment with new I/O schemes. This is just plain wrong. Just because a language has I/O routines doesn't mean that you have to use them. As long as the language has subroutines you can implement new I/O architectures such as curses. For example, I am a Multics system programmer, and as some of you may know, Multics is a PL/I system. PL/I has I/O in the language, but Multics programming standards specifically say "do not use PL/I I/O, use the standard Multics subroutines". I also know a number of Unix system programmers who never use the low-level stdio routines, they always use the simpler Unix routines. Here are my thoughts on the pros and cons of providing I/O in the language: 1) portability: I don't think it matters. Someone will have to write some standard I/O routines, and it doesn't matter to me whether it is the compiler writer or the person who ports the library. In many cases the I/O primitives might have to be written in assembler, so it would probably be better if the compiler-backend-writer were implementing it, to reduce the number of people who must deal with the machine language. 2) syntax: building I/O into the language has the feature that the I/O routines have access to information about the data that is not generally available at runtime. Most language-I/O designs are generic, meaning that they do the appropriate thing with all the language-defined data types. In PL/I I can just say "put list (foo)", and the value of the variable foo will be printed in an appropriate form. On the other hand, with the move these days towards user-defined types and various forms of packaging (Zetalisp flavors, Smalltalk classes, CLU clusters, Modula modules, and Ada packages) this has become less attractive. Another nice thing that can be done in PL/I is "put data (LIST-OF-VARIABLES)"; this is like "put list (LIST-OF-VARIABLES)" except it also outputs the variable names with each value, a very useful debugging tool (especially if you leave out the LIST-OF-VARIABLES, in which case it dumps ALL the variables). Of course, this is not necessary if your system has other good debugging utilities, but it is a good example of something that cannot be done by a subroutine in any language I am familiar with. -- Barry Margolin ARPA: barmar@MIT-Multics UUCP: ..!genrad!mit-eddie!barmar
nishri@utcsstat.UUCP (Alex Nishri) (09/12/83)
The 'file' and 'array' are abstract concepts. Although these concepts once had a basis in physical media they no longer have to. The compiler writer and the operating system writer can implement the abstract concepts in any way they chose. For example, many UNIX systems implement some larger arrays as buffered random access files (known by implementers as 'virtual storage.') I think the notion of a 'slow variable' is really the same as the better defined concept of 'file.' And just as notion of passing of files between subroutines is recognized in many language definitions so I think the 'slow variable' could be passed as a parameter. In fact, I think we could implement 'slow variables' in terms of 'files' with a few macro preprocessor instructions. Alex Nishri University of Toronto ... utcsstat!nishri