mash@mips.UUCP (John Mashey) (12/21/87)
In article <261@ivory.SanDiego.NCR.COM> jan@ivory.UUCP (Jan Stubbs) writes: ..... >Personally, I can't imagine any convenience a null terminated string would have over a string preceded by its length. The 'C' convention forces any string >operation to examine the whole string, where alternative schemes would not. >Even a machine with byte addressability only has to test each byte for zero >as it goes. A string preceded by its length could be easily added or >subtracted from another string as well, an operation present in some >dialects of Pascal. Dmr hasn't commented yet, so he may not. As I recall, either from old memos or discussions, here is some of the reasoning: 1) C doesn't have a string data type at all, on purpose. If necessary, one can always do a macro package, or preprocessor, to implement the type on top of the existing facilities (done many times, with different choices of implementation.) 2) As recounted by Kernighan and Plauger in their tales of converting Software Tools to Pascal, the fixed-length strings implicity required by the need to use many different Pascal implementationswas fairly painful. 3) If you build in a string data-type, you usually end up with: length string OR current-length max-length string OR length pointer-to-string OR current-length max-length pointer-to-string and you've definitely made this decision for the language user, as the decision percolates around thru calling conventions, storage allocation, etc. 4) Note that the choice of 1 of the 4 above versus C's choice can interact very strongly with architectural features. You can't win, but, some of these mesh horridly with some computer architectures' string features, and hence are hard to use anyway in a portable way. 5) C originally had the philosphy that execution time would more-or-less reflect the code-size, specifically, that simple-looking statements wouldn't surprise you swelling gigantically. [This has somewhat been violated, lately, by structure-assignments]. 6) C's early emphasis on general-purpose systems programming argued for representations that didn't clash with realities of things like I/O devices. For example, when you read a "string" from a TTY it doesn't naturally arrive with a length in front of it. 7) I've written code in languages (like PL/I) that had strings, and while it helped some of the time, in other cases, the C code, especially for parsing strings, doing I/O with them, or splitting them into substrings, or passing pointers to suffixes, was actually more natural, and often far more efficient. Consider the act of scanning a string, removing the prefix, and passing a pointer to the suffix. This may well cause materialization of a copy of the entire suffix, just in order to create another string descriptor for it [especially if what you use is {length, string}. 8) This probably belongs in comp.lang.c, although there are architectural implications: for example, those who ported UNIX onto word-addressed machines with special string instructions have had a bunch of fun with this. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086