osd7@homxa.UUCP (Orlando Sotomayor-Diaz) (03/14/85)
From: Orlando Sotomayor-Diaz (The Moderator) <cbosgd!std-c> mod.std.c Digest Wed, 13 Mar 85 Volume 4 : Issue 15 Today's Topics: Can Reiser CPP be ISO-Standard? European language support in ISO Standard C ftell ---------------------------------------------------------------------- Date: Wed, 13 Mar 85 04:09:56 pst From: sun!gnu (John Gilmore) Subject: Can Reiser CPP be ISO-Standard? To: std-c@cbosgd.ATT.UUCP Suppose for example that Sun decided that we would like to supply an ISO-Standard C compiler with the new string concatenation stuff, but have an extension that lets our old Reiser-CPP dependencies work. Is this a contradiction in terms? How can we word the standard to permit this as an extension? If we can't, it's gonna be one hell of a flag day for all our customers... ------------------------------ Date: Wed, 13 Mar 85 04:09:56 pst From: sun!gnu (John Gilmore) Subject: European language support in ISO Standard C To: std-c@cbosgd.ATT.UUCP (While people have been talking about ANSI Standard C, I assume there is also a parallel ISO effort which is working from the same drafts, as was true of the draft APL standard. Presumably this is where the issue arises.) > From: utzoo!henry (Henry Spencer @ U of Toronto Zoology) > Subject: trigraphs > ...many European countries need to use those codes for > other things, because they have more than 26 letters in their alphabets! > These people have terrible trouble with Unix and C as they now stand. > ... My personal view is that the > occurrence of trigraph escapes in the same file as non-ISO characters > (i.e., stuff written both ways) should be cause for an error message. > This would at least simplify conversion. The idea of making trigraphs > available only via a compiler option also deserves consideration. The whole idea has not been clearly thought out. If your C source is written in Swedish, the compiler had better interpret the input byte 0x5C in its input as an ALPHABETIC (capital O umlaut) and not as the reverse vigule (also known as "backslash"). If your C source is written in English, it had better do the reverse. Depending which country you are in, different bytes are alphabetic or symbolic -- and the symbols vary widely. In other words, this "compiler option" does not make an "optional feature" available; it's a switch that controls the interpretation of every byte of the input file. This switch can be set to exactly one of N values, based on the character set of the source file. The draft standard doesn't specify the full behavior of the compiler for all N values of this switch, or even how many values there are, yet it sounds like the committee intends that support for C source written in all N languages will be a required part of all ISO Standard compilers. If the countries involved were not interested in using the \{}etc bytes as alphabetics, we wouldn't need to embed the trigraphs in the compiler -- their sources would avoid these bytes (except in strings), and a simple local sed script inside 'cc' could convert an ugly looking but clearly nonalphabetic ??/ (used for editing) into the USASCII byte 0x5C the compiler expects. The graphic representation of the \{}etc bytes in strings, of course, would depend on the output device they were written on at execution time; EXCEPT of course that the compiler puts special interpretation on ONE byte value (besides the quote used to begin the string). That value is 0x5C, '\', and perhaps we should standardize a way to change that character via a pragma -- because with that one change to standard C, this scheme should work for European languages. If the countries involved ARE interested in using \{}etc as alphabetics in C identifiers and such (a great idea -- they can spell all their words now!), the language that results is not compatible with ANY of the current Unix C (and CP/M C, and Mac C, and DEC C, and MSDOS C) source files. I don't see how the resulting language can be part of the ISO C Standard. It can't be intermixed with normal C expressions, functions, or include files. You can link a "normalC" program with a "funnyC" program, but then again you can link it with a "Fortran" program too. People who want to write in a European-alphabet-capable computer language are free to define one. C is unfortunately not it. [As an aside, I might suggest that the prolific European language designers stop inventing languages, eg Pascal, that use up their own national character positions!] I'm not trying to be tough on Europe -- indeed, I'm working to get better European support in Sun products -- but we can't close the barn door after all the characters have been stolen. ------------------------------ Date: Tue, 12 Mar 85 00:02:40 PST From: Richard Mathews <ucbvax!lcc.rich-wiz@UCLA-LOCUS.ARPA> Subject: ftell To: cbosgd!std-c@BERKELEY On a UNIX-like system with 4K blocks (as IX370 is supposed to have) a file may contain more than 2^30 blocks, or about 2^42 bytes = 4.4e12 bytes. This by far exceeds the 4.3e9 bytes accessible from an unsigned, 32 bit long. On the other hand, the suggestion of using a structure to be returned by ftell() would break a large amount of existing code. A provision, however, should be made for these larger machines. C should not restrict itself to mini computers. Any method devised should provide a consistent interface for lseek(), tell(), fseek(), and ftell(). Has the committee considered this? Can anyone think of a clean way around it which is compatible with the old system calls? Richard M. Mathews lcc!richard@ucla-cs {ucivax,trwrb}!lcc!richard {ihnp4,randvax,sdcrdcf,ucbvax,trwspp}!ucla-cs!lcc!richard ------------------------------ End of mod.std.c Digest - Wed, 13 Mar 85 23:38:07 EST ****************************** USENET -> posting only through cbosgd!std-c. ARPA -> ... through cbosgd!std-c@BERKELEY.ARPA (NOT to INFO-C) In all cases, you may also reply to the author(s) above.