henry@utzoo.uucp (Henry Spencer) (05/11/89)
In article <10235@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: >Yeah, I think so. Strings, for example. Cobol, PL/I, Algol, >Fortran-77, Snobol, etc., have string types and say what kind of >operations can be done on strings. C says that a string is terminated >with a '\0' byte. Instead of assigning a null string to a target, >C programmers assign a '\0' byte, so the implementation of C library >routines can never be speeded up. For other languages, improvements >are often made to implementations. Improvements to C library routines are quite possible. Like all such, cleverness is sometimes required. One convention is not intrinsically worse than the other. -- Mars in 1980s: USSR, 2 tries, | Henry Spencer at U of Toronto Zoology 2 failures; USA, 0 tries. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
diamond@diamond.csl.sony.junet (Norman Diamond) (05/12/89)
I wrote: >>C says that a string is terminated >>with a '\0' byte. Instead of assigning a null string to a target, >>C programmers assign a '\0' byte, so the implementation of C library >>routines can never be speeded up. For other languages, improvements >>are often made to implementations. In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >Improvements to C library routines are quite possible. Like all such, >cleverness is sometimes required. One convention is not intrinsically >worse than the other. How do you improve a C library routine to look in a string descriptor to just grab the current length of the string? In other languages, libraries can do that. It kind of seems to me that if a C library does that, I can watch it break my legal C program. On the other hand, a correct strlen() function has to scan every byte of (for example) my 300K array. Or my 509-byte array, maybe 510-byte array, but several thousand times. It seems intrinsically worse to me. -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net) The above opinions are my own. | Why are programmers criticized for If they're also your opinions, | re-implementing the wheel, when car you're infringing my copyright. | manufacturers are praised for it?
turner@sdti.SDTI.COM (Prescott K. Turner) (05/12/89)
In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <10235@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: >>Strings, for example. >> ... >>C says that a string is terminated with a '\0' byte. >> ... so the implementation of C library routines can never be speeded up. >>For other languages, improvements are often made to implementations. > >Improvements to C library routines are quite possible. Like all such, >cleverness is sometimes required. One convention is not intrinsically >worse than the other. Diamond is right. C is worse because it specifies not just the operations on string types and their meaning, but the representation of strings. As Paul Abrahams put it in SIGPLAN Notices 23:10, "Some Sad Remarks About String Handling in C", "C strings are not first class objects." He gives details of how this prevents the clever from succeeding. -- Prescott K. Turner, Jr. Software Development Technologies, Inc. P.O. Box 366, Sudbury, MA 01776 USA (508) 443-5779 UUCP: ...{harvard,mit-eddie}!sdti!turner Internet: turner@sdti.sdti.com
bph@buengc.BU.EDU (Blair P. Houghton) (05/14/89)
In article <10245@socslgw.csl.sony.JUNET> diamond@diamond.csl.sony.junet (Norman Diamond) writes: >In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry >Spencer) writes: >>Improvements to C library routines are quite possible. >>One convention is not intrinsically worse than the other. > >How do you improve a C library routine to look in a string descriptor >to just grab the current length of the string? In other languages, >libraries can do that. [...] On the other hand, >a correct strlen() function has to scan every byte of (for example) >my 300K array. Or my 509-byte array, maybe 510-byte array, but several >thousand times. It seems intrinsically worse to me. Slower, yes. Worse, no. If it's that way, then use the string functions so they only have to do it once, and store the knowledge as structs which include the string and a length counter, and then use your own library to implement string functions that know about your string-data structure. However, a strlen() that doesn't count the char's can be fooled, and any data structure you devise can be inadequate by someone else's standards, while not retaining the the elegant completeness of just storing the data as is. --Blair "Or am I overlooking something obvious...again?"
henry@utzoo.uucp (Henry Spencer) (05/14/89)
In article <10245@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: >>Improvements to C library routines are quite possible. Like all such, >>cleverness is sometimes required. One convention is not intrinsically >>worse than the other. > >How do you improve a C library routine to look in a string descriptor >to just grab the current length of the string? You can't, any more than you can improve the equivalent in some other languages to get the length of a trailing substring without having to go back to the beginning and then subtract. The data structures do constrain your ability to improve the functions. That doesn't mean you can't make improvements. (If you're going to tell me that other languages can change the underlying implementation, note that they *have* to use a length-count implementation if the language semantics require that '\0' be a valid string character, unless still worse convolutions are used.) >... On the other hand, >a correct strlen() function has to scan every byte of (for example) >my 300K array... Nonsense, it only has to scan the words in that array that comprise the actual text of your string... which normally is measured in bytes, not hundreds of Kbytes. It doesn't have to do it a byte at a time, by the way, even on machines with no special string-scan facilities -- you just have to be clever. >Or my 509-byte array, maybe 510-byte array, but several thousand times... If you are applying strlen to the same string thousands of times, your code is badly written, period. I recommend re-reading that gem of a paper, "News Need Not Be Slow", co-written by yours truly, in the Winter 87 Usenix proceedings, for sage words of advice on avoiding inefficiency. :-) Nobody ever said that strlen was *always* the right way to get string lengths. > It seems intrinsically worse to me. That depends on what you are doing. In certain ways it is, given that a length-count implementation has more information immediately available. In other ways it isn't, because that semi-redundant information has to be updated whenever the string is modified, and that has a non-zero cost. -- Mars in 1980s: USSR, 2 tries, | Henry Spencer at U of Toronto Zoology 2 failures; USA, 0 tries. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
gwyn@smoke.BRL.MIL (Doug Gwyn) (05/14/89)
In article <10245@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: >It [null-terminated strings] seems intrinsically worse to me. I thought it was well known that each method (associated count vs. terminator) has advantages in some contexts and disadvantages in others. Many interesting operations on strings are faster with null terminator values than when an associated count must be tested. The main drawback to terminated strings is that the terminator value cannot be contained within the string. If you want counted strings, C makes it relatively easy to provide them for yourself.
chris@mimsy.UUCP (Chris Torek) (05/14/89)
In article <456@sdti.SDTI.COM> turner@sdti.SDTI.COM (Prescott K. Turner) writes: >Diamond is right. C is worse because it specifies not just the operations on >string types and their meaning, but the representation of strings. As Paul >Abrahams put it in SIGPLAN Notices 23:10, "Some Sad Remarks About String >Handling in C", "C strings are not first class objects." He gives details of >how this prevents the clever from succeeding. It is true that strings---or rather, string constants; C does not have strings as a basic data type: they are merely a convention, which some programs (Emacs, e.g.) avoid---are second class objects. This is because a double-quoted string constant creates an unnamed array, and C's arrays are second-class. But this only prevents cleverness in a weak sense. If you prefer counted strings, you can create them: struct cstr { int len; char *data; }; #define CSTR(s) { sizeof(s) - 1, s } struct cstr hello = CSTR("hello world"); It is true that the compiler and run-time system cannot arbitrarily choose some alternative representation for C's strings, but neither can they choose other representations for any other form of array. The language is at least consistent. Incidentally, the average string (in the mythical average C program) is shorter than the average Dhrystone string. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
mat@mole-end.UUCP (Mark A Terribile) (05/14/89)
> >>C says that a string is terminated with a '\0' byte. ... C programmers > >>assign a '\0' byte, so the implementation of C library routines can never > >>be speeded up. For other languages, [implementations are often improved]. > >Improvements to C library routines are quite possible. Like all such, > >cleverness is sometimes required. ... > How do you improve a C library routine to look in a string descriptor to > just grab the current length of the string? In other languages, libraries > can do that. ... Or the compilers can emit code ... > ... It kind of seems to me that if a C library does that, [ it will ] break > my legal C program. On the other hand, [ strlen() ] has to scan every byte > of ... my 300K ... or my 509-byte array ... several thousand times. It seems > intrinsically worse to me. Well, in C you are stuck. At the risk of being told to go to my own group, this is the point where you should switch to C++ and define a string type that uses whatever you have available in your particular environment. Just derive it (compatibly) from an existing string type so that if you have to run in a pure-C++ environment, you have a fallback. Oh, and where a compiler would emit code, you can make C++ use inlines. Of course, I could ask you to show me a machine on which the FORTRAN compiler has access to the internal implementation of COBOL, or on which COBOL can be made to use the FORTRAN complex arithmetic algorithms. We could go on. -- (This man's opinions are his own.) From mole-end Mark Terribile
bph@buengc.BU.EDU (Blair P. Houghton) (05/14/89)
In article <456@sdti.SDTI.COM> turner@sdti.UUCP (0006-Prescott K. Turner, Jr.) writes: >In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>In article <10235@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: [...it's all in the ref's. I'm after this guy:...] > >Diamond is right. C is worse because it specifies not just the operations on >string types and their meaning, but the representation of strings. As Paul >Abrahams put it in SIGPLAN Notices 23:10, "Some Sad Remarks About String >Handling in C", "C strings are not first class objects." He gives details of >how this prevents the clever from succeeding. Those ACM SIG* types are usually pretty clever, but if one can't come up with the obvious (see below), then this one wonders where that one learned what data were, and whether the full meaning of 'clever' doesn't apply... typdef struct { char *characters; int length; } string; #define assign_string(s,a) s.characters=a;\ s.length=strlen(s.characters); main() { string lunchbox; assign_string(lunchbox,"Batman"); /* Now all you need do is refer to lunchbox.length * when you need the length of the string * stored in lunchbox.characters... */ printf("%d\n",lunchbox.length); /* * If you think this is less efficient * computationally than the stuff your * 'intelligent' languages do with string * data, then you're sadly mistaken... * * If using lunchbox.length doesn't appeal to you, * try: */ printf("%d\n",stringlength(lunchbox); /* * where you've done * * #define stringlength(x) x.length * * somewhere above the call to stringlength. */ } --Blair "Too damn easy. I have _got_ to be missing some undercurrent in this stream of cruft..."
peter@ficc.uu.net (Peter da Silva) (05/15/89)
C strings have a major disadvantage that's got nothing to do with performance. You can't just stick arbitrary binary data in a string and expect it to work. If there is a null anywhere in that data it's going to cut you. Strings in variant-record form, with a length and data, or dope vectors, with a length an a pointer, are just plain more versatile than C strings. Luckily, however, 'C' is not tied to its runtime library. It's possible to not only use a completely different kind of string in the language, but to mix the two. It's a pity X3J11 didn't see fit to standardise a 'length' escape like the common "\p" (for 'pascal') on the Mac. Maybe "\l". -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
peter@ficc.uu.net (Peter da Silva) (05/15/89)
In article <1989May13.211218.24251@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: > You can't, any more than you can improve the equivalent in some other > languages to get the length of a trailing substring without having to > go back to the beginning and then subtract. Not if the string is stored as a length-and-start-address. -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180. Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.
diamond@csl.sony.co.jp (Norman Diamond) (05/15/89)
I wrote: >>It [null-terminated strings] seems intrinsically worse to me. Doug Gwyn replied: >I thought it was well known that each method (associated count vs. >terminator) has advantages in some contexts and disadvantages in >others. Many interesting operations on strings are faster with null >terminator values than when an associated count must be tested. The >main drawback to terminated strings is that the terminator value >cannot be contained within the string. Fine. So in a language which does not specify how strings are implemented, an implementation could be improved by using BOTH a count and a null terminator. This still is not possible in C. >If you want counted strings, C makes it relatively easy to provide >them for yourself. Yes. You throw away the C library (which I understand is part of the proposed ANSI standard) and the language's definition of how strings are represented, you define your own representation of strings, and you implement your own library. This is perfectly fine. A strictly conforming program is not required to use every feature or every mis-feature of the standard; a program is allowed to be more strict. Good luck porting other people's strictly conforming programs though. They might use C strings. Good luck persuading someone else to port your programs. -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net) The above opinions are my own. | Why are programmers criticized for If they're also your opinions, | re-implementing the wheel, when car you're infringing my copyright. | manufacturers are praised for it?
gwyn@smoke.BRL.MIL (Doug Gwyn) (05/16/89)
In article <10250@socslgw.csl.sony.JUNET> diamond@csl.sony.co.jp.csl.sony.co.jp (Norman Diamond) writes: >Doug Gwyn replied: >>If you want counted strings, C makes it relatively easy to provide >>them for yourself. >Good luck porting other people's strictly conforming programs though. >They might use C strings. >Good luck persuading someone else to port your programs. I don't understand your comment. Of course the C compiler and library continue to support null-terminated strings. Defining your own counted-string data type and functions doesn't affect that at all. Furthermore there is no reason your counted-string implementation should be other than perfectly portable.
diamond@diamond.csl.sony.junet (Norman Diamond) (05/16/89)
In article <190@mole-end.UUCP> mat@mole-end.UUCP (Mark A Terribile) writes: >Well, in C you are stuck. At the risk of being told to go to my own group, >this is the point where you should switch to C++ and define a string type >that uses whatever you have available in your particular environment. Of course. In fact, this is why string handling seems to be a popular topic in C++ library writing. We agree completely. >Of course, I could ask you to show me a machine on which the FORTRAN compiler >has access to the internal implementation of COBOL, or on which COBOL can be >made to use the FORTRAN complex arithmetic algorithms. We could go on. I'm not sure why you ask this question. The answer is VMS. DEC required all of their language developers to conform to implementations specified by the operating system. This is exactly where they ran into problems with C. This conversation has now revolved back to its point of beginning.... -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net) The above opinions are my own. | Why are programmers criticized for If they're also your opinions, | re-implementing the wheel, when car you're infringing my copyright. | manufacturers are praised for it?
bengsig@oracle.nl (Bjorn Engsig) (05/16/89)
In some article, Doug Gwyn wrote that \0 terminated strings and strings with associated length both have advantages and disadvanteges. He also wrote >>If you want counted strings, C makes it relatively easy to provide >>them for yourself. In article <10250@socslgw.csl.sony.JUNET> diamond@csl.sony.co.jp.csl.sony.co.jp (Norman Diamond) writes: > >Yes. You throw away the C library (which I understand is part of the >proposed ANSI standard) and the language's definition of how strings >are represented, you define your own representation of strings, and >you implement your own library. [deletions] > Now come on, why should you not write your own handling of something (e.g. strings) if this can speed up your program. You would still do it in ANSI C; if you provide an interface to the outside world, you would either tell others about your interface or convert to the 'normal' representation, this is no big deal. >Good luck porting other people's strictly conforming programs though. >They might use C strings. So what? This is what your ANSI C compiler knows about. > >Good luck persuading someone else to port your programs. Well, our software is ported to very many Unix and non-Unix platforms, and we do a lot of speed improvements using our own internal representatino of various types of variables. In the very rare (measured in CPU cycles) cases, where we interface to the outside world, we convert between internal and external representation. -- Bjorn Engsig, ORACLE Europe \ / "Hofstadter's Law: It always takes Path: mcvax!orcenl!bengsig X longer than you expect, even if you Domain: bengsig@oracle.nl / \ take into account Hofstadter's Law"
les@chinet.chi.il.us (Leslie Mikesell) (05/17/89)
In article <10250@socslgw.csl.sony.JUNET> diamond@csl.sony.co.jp.csl.sony.co.jp (Norman Diamond) writes: >>If you want counted strings, C makes it relatively easy to provide >>them for yourself. >Yes. You throw away the C library (which I understand is part of the >proposed ANSI standard) and the language's definition of how strings >are represented, you define your own representation of strings, and >you implement your own library. This is perfectly fine. A strictly >conforming program is not required to use every feature or every >mis-feature of the standard; a program is allowed to be more strict. Why throw anything away? You can store the lengths elsewhere and still use the null-terminated representation that the library routines want to see. struct eg { unsigned int str_len; char *the_str; }; This way you only need to store the length when you think it will be useful later. The only problem occurs if you need to store a '\0' value as part of the character array, and then it is only a problem if you want to use the library string routines on that array. >Good luck porting other people's strictly conforming programs though. >They might use C strings. As well they should... The only real problem I see with the C library routines is that they generally don't return a length or pointer to the last character even though it would be trivial to do so. Thus if you want the length you have to make another function call (at least strlen doesn't return the same pointer you gave it). Les Mikesell
wht@tridom.uucp (Warren Tucker) (05/17/89)
This might be a little long, but maybe it'll help reduce some further dialog (or provoke more :-|). I have needed nice formal string-descriptor based strings _RARELY_ and this is a sort of tool box I pick stuff out of. Hope it helps. #!/bin/sh # shar: Shell Archiver (v1.22) # # Run the following text with /bin/sh to create: # esd2.h # esd2util.c # sed 's/^X//' << 'SHAR_EOF' > esd2.h && X/* CHK=0x1793 */ X/*+----------------------------------------------------------------------- X esd2.h -- support header for users of esd2util.c X ...!gatech!emory!tridom!wht X------------------------------------------------------------------------*/ X/*+:EDITS:*/ X/*:10-31-1988-16:37-wht-esd2 adapted from esd.h/esdutil.c */ X Xtypedef struct esd X{ X char *pb; /* full pointer to esd strings */ X short cb; /* count of bytes */ X short maxcb; /* maximum bytes allowed */ X short index; /* next character of significance */ X short old_index; /* last token (backup or error reporting) */ X} ESD; X Xtypedef struct keyword_table_type /* table terminated with null key_word */ X{ X char *key_word; /* key word */ X int key_token; /* token returned on match */ X} KEYTAB; X X/* vi: set tabstop=4 shiftwidth=4: */ SHAR_EOF chmod 0644 esd2.h || echo "restore of esd2.h fails" sed 's/^X//' << 'SHAR_EOF' > esd2util.c && X/* CHK=0x2302 */ X/*+---------------------------------------------------------------- X esd2util.c X ...!gatech!emory!tridom!wht X X Defined functions: X append_zstr_to_esd(tesd,zstr) X esdstrindex(esd1,esd2,index1_flag,index2_flag) X fgetesd(tesd,fileptr) X fputesd(tesd,fileptr,index_flag,nl_flag) X free_esd(tesd) X get_alpha_zstr(param,strbuf,strbuf_maxcb) X get_alphanum_zstr(param,strbuf,strbuf_maxcb) X get_numeric_value(param,value) X get_numeric_zstr(param,strbuf,strbuf_maxcb) X get_word_zstr(param,strbuf,strbuf_maxcb) X init_esd(tesd,cptr,maxcb) X keyword_lookup(ktable,param) X make_esd(maxcb) X null_terminate_esd(tesd) X skip_cmd_break(tesd) X skip_cmd_char(param,skipchar) X skip_comma(param) X skip_ld_break(zstr) X skip_paren(param,fLeft) X strindex(str1,str2) X strip_trailing_spaces_esd(ztext) X X-----------------------------------------------------------------*/ X/*+:EDITS:*/ X/*:10-31-1988-16:37-wht-esd2 adapted from esd.h/esdutil.c */ X/*:04-18-1988-18:19-wht-more routines */ X/*:01-28-1987-12:30-wht-add get_word_zstr */ X/*:01-28-1987-12:00-wht-include MSC 4.0 / MSDOS compatibility */ X/*:01-16-1986-01:00-WHT-Creation of edits (version beta 1.01) */ X X#include <stdio.h> X#include <ctype.h> X#include "esd2.h" X X#if XENIX | MSDOS X#include <string.h> X#else Xextern char *index(); X#endif X X/*+------------------------------------------------------------------------- X void null_terminate_esd(&esd) X puts null at 'cb' position of string (standard esd always X has one more byte in buffer than maxcb says) X--------------------------------------------------------------------------*/ Xvoid Xnull_terminate_esd(tesd) Xregister ESD *tesd; X{ X tesd->pb[tesd->cb] = 0; X} /* end of null_terminate_esd */ X X/*+----------------------------------------------------------------------- X void init_esd(tesd,cptr,maxcb) init an esd X------------------------------------------------------------------------*/ Xvoid init_esd(tesd,cptr,maxcb) Xregister ESD *tesd; Xchar *cptr; Xregister int maxcb; X{ X tesd->pb = cptr; /* pointer to string */ X tesd->maxcb = maxcb; /* max characters in buffer */ X tesd->cb = 0; /* current count == 0 */ X tesd->index = 0; /* parse index to first position */ X tesd->old_index = 0; /* parse index to first position */ X *tesd->pb = 0; /* start with null terminated string */ X X} /* end of init_esd */ X X/*+----------------------------------------------------------------------- X esdptr = make_esd(maxcb) allocate an esd and buffer X------------------------------------------------------------------------*/ XESD * Xmake_esd(maxcb) Xregister int maxcb; /* desired maxcb */ X{ X register ESD *tesd; X register int actual_cb; X X /* we get an extra character to ensure room for null past maxcb */ X actual_cb = maxcb + sizeof(ESD) + 1; X if(actual_cb & 1) /* even allocation */ X ++actual_cb; X if((tesd = (ESD *)malloc( (unsigned)actual_cb )) == NULL) X return((ESD *)0); /* return NULL if failure */ X X init_esd(tesd,(char *)(tesd + 1),maxcb); X return(tesd); X X} /* end of make_esd */ X X/*+----------------------------------------------------------------------- X free_esd(esdptr) X------------------------------------------------------------------------*/ Xvoid free_esd(tesd) Xregister ESD *tesd; X{ X tesd->maxcb = 0; X tesd->cb = 0; X free((char *)tesd); X} X X/*+---------------------------------------------------------------- X strindex: string index function X X Returns position of 'str2' in 'str1' if found X If 'str2' is null, then 0 is returned (null matches anything) X Returns -1 if not found X-----------------------------------------------------------------*/ Xint Xstrindex(str1,str2) Xchar *str1; /* the (target) string to search */ Xchar *str2; /* the (comparand) string to search for */ X{ X register int istr1 = 0; X register int lstr2 = strlen(str2); X register char *mstr = str1; /* the (target) string to search */ X X if(*str2 == 0) /* null string matches anything */ X return(0); X X while(*mstr) X { X if(*mstr == *str2) X { /* we have a first char match... does rest of string match? */ X if(!strncmp(mstr,str2,lstr2)) X return(istr1); /* if so, return match position */ X } X mstr++; X istr1++; X } X X return(-1); /* if we exhaust target string, flunk */ X X} /* end of strindex */ X X/*+------------------------------------------------------------------------- X esdstrindex(esd1,esd2,index1_flag,index2_flag) X X Call strindex with esd1->pb and esd2->pb. X If index1_flag != 0, esd1->pb + esd1->index passed X If index2_flag != 0, esd2->pb + esd2->index passed X--------------------------------------------------------------------------*/ Xesdstrindex(esd1,esd2,index1_flag,index2_flag) Xregister ESD *esd1; Xregister ESD *esd2; Xregister int index1_flag; Xregister int index2_flag; X{ X return(strindex((index1_flag) ? esd1->pb : esd1->pb + esd1->index, X (index2_flag) ? esd2->pb : esd2->pb + esd2->index)); X X} /* end of esdstrindex */ X X/*+---------------------------------------------------------------- X keyword_lookup(ktable,param) X X Lookup string in keyword_table struct array X Returns table->key_token if 'param' found in X 'table', else -1 X X Beware substrings. "type","typedef" will both match "type" X Ordering of table can help this. X-----------------------------------------------------------------*/ Xkeyword_lookup(ktable,param) Xregister KEYTAB *ktable; Xregister char *param; X{ X register int plen = strlen(param); X X while(ktable->key_word) X { X if(!strncmp(ktable->key_word,param,plen)) X return(ktable->key_token); X ++ktable; X } /* end of while */ X X return(-1); /* search failed */ X X} /* end of keyword_lookup */ X X/*+---------------------------------------------------------------- X skip_cmd_break(tesd) X X Finds next non-break or end of command line text X 'tesd' is an esd with valid 'index' field X Returns 0 index field points to non-break character X -1 end of line found X-----------------------------------------------------------------*/ Xint Xskip_cmd_break(tesd) Xregister ESD *tesd; X{ X register int cb = tesd->cb; X register int index = tesd->index; X register char *pb = tesd->pb + index; X X while(index < cb) X { X if(*pb++ != 0x20) X break; X index++; X } X tesd->old_index = tesd->index = index; X if(index >= cb) X return(-1); X else X return(0); X X} /* end of skip_cmd_break */ X X/*+------------------------------------------------------------------------- X erc = skip_cmd_char(param,skipchar) X--------------------------------------------------------------------------*/ Xint Xskip_cmd_char(param,skipchar) Xregister ESD *param; Xregister char skipchar; X{ X register int erc; X X if(erc = skip_cmd_break(param)) X return(erc); X X if(param->pb[param->index] == skipchar) X { X ++param->index; X return(0); X } X X return(-1); X X} /* end of skip_cmd_char */ X X/*+------------------------------------------------------------------------- X erc = skip_comma(param) X--------------------------------------------------------------------------*/ Xint Xskip_comma(param) Xregister ESD *param; X{ X register int erc; X X if(erc = skip_cmd_break(param)) X return(erc); X X if(param->pb[param->index] == ',') X { X ++param->index; X return(0); X } X X return(-1); X X} /* end of skip_comma */ X X/*+------------------------------------------------------------------------- X erc = skip_paren(fparam,LEFT or RIGHT) X--------------------------------------------------------------------------*/ Xint Xskip_paren(param,fLeft) Xregister ESD *param; Xint fLeft; /* if =LEFT , skip left paren, else skip right */ X{ X register int erc; X X if(erc = skip_cmd_break(param)) X return(erc); X X if(fLeft) X { X if(param->pb[param->index++] == 0x28) /* 0x28 == open parenthesis */ X return(0); X else X { X --param->index; X return(-1); X } X } X else X { X if(param->pb[param->index++] == 0x29) /* 0x29 == close parenthesis */ X return(0); X else X { X --param->index; X return(-1); X } X } X X} /* end of skip_paren */ X X/*+---------------------------------------------------------------- X get_alpha_zstr(&esd,&strbuf,strbuf_maxcb) X converts next alphabetic string token to upper case and places it X into the null-terminated 'strbuf' string. returns 0 or -1 X or skip_cmd_break error codes X-----------------------------------------------------------------*/ Xint Xget_alpha_zstr(param,strbuf,strbuf_maxcb) Xregister ESD *param; Xregister char *strbuf; Xregister int strbuf_maxcb; X{ X register int izstr; X register int schar; X register char *param_ptr = param->pb; X X if(izstr = skip_cmd_break(param)) X return(izstr); X izstr = 0; X while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) ) X { X schar = param_ptr[param->index]; X if( !isalpha(schar) ) X break; X param->index++; X strbuf[izstr++] = to_upper(schar); X } X X strbuf[izstr] = 0; /* terminate the string for "C" anal retentives */ X if(izstr) X return(0); X else /* decide whether to return badparam or noparam err */ X return(-1); X X} /* end of get_alpha_zstr */ X X/*+---------------------------------------------------------------- X get_alphanum_zstr(&esd,&strbuf,strbuf_maxcb) X converts next alphabetic string token to upper case and places it X into the null-terminated 'strbuf' string. returns 0 or -1 X or skip_cmd_break error codes X-----------------------------------------------------------------*/ Xint Xget_alphanum_zstr(param,strbuf,strbuf_maxcb) Xregister ESD *param; Xregister char *strbuf; Xregister int strbuf_maxcb; X{ X register int izstr = 0; X register int schar; X X if(izstr = skip_cmd_break(param)) X return(izstr); X X while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) ) X { X schar = param->pb[param->index++]; X if( isalnum(schar) ) X strbuf[izstr++]=to_upper(schar); X else X { X --param->index; X break; X } X } X X strbuf[izstr]=0; /* terminate the string for "C" anal retentives */ X if(strlen(strbuf)) X return(0); X else /* decide whether to return badparam or noparam err */ X return(-1); X X} /* end of get_alphanum_zstr */ X X/*+---------------------------------------------------------------- X get_numeric_zstr(&esd,&strbuf,strbuf_maxcb) X gets next numeric string token places it X into the null-terminated 'strbuf' string. returns 0 or -1 X or skip_cmd_break error codes X-----------------------------------------------------------------*/ Xint Xget_numeric_zstr(param,strbuf,strbuf_maxcb) Xregister ESD *param; Xregister char *strbuf; Xregister int strbuf_maxcb; X{ X register int izstr; X register int schar; X X if(izstr = skip_cmd_break(param)) X return(izstr); X X while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) ) X { X schar = param->pb[param->index++]; X if( isdigit(schar) ) X strbuf[izstr++]=schar; X else X { X --param->index; X break; X } X } X X strbuf[izstr]=0; /* terminate the string for "C" anal retentives */ X X if(strlen(strbuf)) X return(0); X else /* decide whether to return badparam or noparam err */ X { X return(skip_cmd_break(param)); X } X X} /* end of get_numeric_zstr */ X X/*+---------------------------------------------------------------- X get_word_zstr(&esd,&strbuf,strbuf_maxcb) X gets next word (continuous string of characters X without spacesor tabs ) X returns 0 or -1 or skip_cmd_break error codes X-----------------------------------------------------------------*/ Xint Xget_word_zstr(param,strbuf,strbuf_maxcb) Xregister ESD *param; Xregister char *strbuf; Xregister int strbuf_maxcb; X{ X register int izstr; X register int schar; X X if(izstr = skip_cmd_break(param)) X return(izstr); X X while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) ) X { X schar = param->pb[param->index++]; X if( (schar > 0x20) && (schar <= 0x7e)) X strbuf[izstr++]=schar; X else X { X --param->index; X break; X } X } X X strbuf[izstr]=0; /* terminate the string for "C" anal retentives */ X X if(strlen(strbuf)) X return(0); X else /* decide whether to return badparam or noparam err */ X { X return(skip_cmd_break(param)); X } X X} /* end of get_word_zstr */ X X/*+----------------------------------------------------------------------- X get_numeric_value(param,&long_var) X------------------------------------------------------------------------*/ Xget_numeric_value(param,value) Xregister ESD *param; Xregister long *value; X{ X register int erc; X char buf[32]; X X if(erc = get_numeric_zstr(param,buf,sizeof(buf))) X return(erc); X sscanf(buf,"%ld",value); X return(0); X X} /* end of get_numeric_value */ X X/*+------------------------------------------------------------------------- X strip_trailing_spaces_esd(tesd) X--------------------------------------------------------------------------*/ Xvoid Xstrip_trailing_spaces_esd(ztext) Xregister ESD *ztext; X{ X while(ztext->cb && (ztext->pb[ztext->cb-1] == 0x20)) X ztext->cb--; X} /* end of strip_trailing_spaces_esd */ X X/*+------------------------------------------------------------------------- X fgetesd(&esd,fileptr) X X stdio read from FILE *fileptr into esd X returns -1 on stdio error, -2 on line too long, 0 on success X returns tesd->cb set up not including trailing nl, tesd->index == 0 X--------------------------------------------------------------------------*/ Xint fgetesd(tesd,fileptr) Xregister ESD *tesd; Xregister FILE *fileptr; X{ X register char *cptr; X X if(fgets(tesd->pb,tesd->maxcb,fileptr) == NULL) X return(-1); X#if XENIX | MSDOS X if((cptr = strchr(tesd->pb,0x0A)) == NULL) X return(-2); X#else X if((cptr = index(tesd->pb,0x0A)) == NULL) X return(-2); X#endif X tesd->cb = (int)(cptr - tesd->pb); X null_terminate_esd(tesd); X tesd->index = 0; X tesd->old_index = 0; X return(0); X X} /* end of fgetesd */ X X/*+------------------------------------------------------------------------- X fputesd(&esd,fileptr,index_flag,nl_flag) X X write esd contents to stdio FILE *fileptr X if index_flag is true, write from tesd->index thru end of esd X otherwise, from start of esd X if nl_flag is true, append nl to write, else just esd contents X returns -1 on stdio error, 0 on success X--------------------------------------------------------------------------*/ Xint fputesd(tesd,fileptr,index_flag,nl_flag) Xregister ESD *tesd; Xregister FILE *fileptr; Xint index_flag; Xint nl_flag; X{ X register char *cptr; X register int write_length; X X if(index_flag) X { X cptr = &tesd->pb[tesd->index]; X write_length = tesd->cb - tesd->index; X } X else X { X cptr = tesd->pb; X write_length = tesd->cb; X } X X if(write_length) X if(fwrite(cptr,write_length,1,fileptr) == 0) X return(-1); X X if(nl_flag) X if(fputc(0x0A,fileptr) == 0) X return(-1); X X return(0); X} /* end of fputesd */ X X/*+------------------------------------------------------------------------- X cptr = skip_ld_break(cptr) X Skip leading spaces and tabs X--------------------------------------------------------------------------*/ Xchar *skip_ld_break(zstr) Xregister char *zstr; X{ X while((*zstr == 0x20) || (*zstr == 0x09)) X zstr++; X return(zstr); X} /* end of skip_ld_break */ X X/*+----------------------------------------------------------------- X append_zstr_to_esd X------------------------------------------------------------------*/ Xappend_zstr_to_esd(tesd,zstr) XESD *tesd; Xchar *zstr; X{ X register int zstrlen = strlen(zstr); X X if(zstrlen > (tesd->maxcb - tesd->cb)) X zstrlen = tesd->maxcb - tesd->cb; X X if(zstrlen) X { X strncpy(tesd->pb + tesd->cb,zstr,zstrlen); X tesd->cb += zstrlen; X } X} X/* end of esd2util.c */ X X/* vi: set tabstop=4 shiftwidth=4: */ SHAR_EOF chmod 0644 esd2util.c || echo "restore of esd2util.c fails" exit 0 -- ------------------------------------------------------------------- Warren Tucker, Tridom Corporation ...!gatech!emory!tridom!wht Sforzando (It., sfohr-tsahn'-doh). A direction to perform the tone or chord with special stress, or marked and sudden emphasis.
dhesi@bsu-cs.bsu.edu (Rahul Dhesi) (05/17/89)
In article <10255@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: The answer is VMS. DEC required all of their language developers to conform to implementations [of strings] specified by the operating system. This is exactly where they ran into problems with C. Here are excerpts of code adapted from one of the VAX/VMS Pascal manuals. I have omitted some declarations and error-checking. This code reads a record from a channel that has been (in the original code) assigned to a mailbox. var ... other stuff ... read_input : varying [30] of char; (* where input will go *) begin ... other code ... sys_stat := $qiow (chan := channel, func := io$_readvblk, iosb := io_statblk, p1 := read_input.body, p2 := size (read_input.body) ); read_input.length := io_statblk.count; ... other stuff ... end. Note that we could not simply pass our variable-length string variable read_input to QIOW. Instead, we had to separately pass the address of the data area of the string (called read_input.body) and its maximum size. Then when input was complete, we had to copy the byte count from the status block field io_statblk.count into the length field of read_input.length. -- Rahul Dhesi <dhesi@bsu-cs.bsu.edu> UUCP: ...!{iuvax,pur-ee}!bsu-cs!dhesi
diamond@diamond.csl.sony.junet (Norman Diamond) (05/17/89)
Doug Gwyn: >>>If you want counted strings, C makes it relatively easy to provide >>>them for yourself. me: >>Good luck porting other people's strictly conforming programs though. >>They might use C strings. >>Good luck persuading someone else to port your programs. Doug Gwyn: >I don't understand your comment. Of course the C compiler and library >continue to support null-terminated strings. Defining your own >counted-string data type and functions doesn't affect that at all. >Furthermore there is no reason your counted-string implementation >should be other than perfectly portable. Yes, just like the C compiler continues to support { and }, but you can do: #define BEGIN { #define END } or (sorry Bjarne but it's true) #define Case break; case and still be perfectly portable. Everyone will hate you. A lot of C programmers, such as for example Doug Gwyn, expect standard facilities to be used. -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net) The above opinions are my own. | Why are programmers criticized for If they're also your opinions, | re-implementing the wheel, when car you're infringing my copyright. | manufacturers are praised for it?
tneff@bfmny0.UUCP (Tom Neff) (05/17/89)
The major problem with using anything other than \0-terminated strings in C is that you give up the easy ability to define string constants a la "/etc/passwd". Standard C compilers will create a \0-terminated string for these, regardless of what your home-made string utilities prefer. -- Tom Neff UUCP: ...!uunet!bfmny0!tneff "Truisms aren't everything." Internet: tneff@bfmny0.UU.NET
darin@laic.UUCP (Darin Johnson) (05/18/89)
>>Yes. You throw away the C library (which I understand is part of the >>proposed ANSI standard) and the language's definition of how strings >>are represented, you define your own representation of strings, and >>you implement your own library. > >Why throw anything away? You can store the lengths elsewhere and >still use the null-terminated representation that the library >routines want to see. > struct eg { unsigned int str_len; > char *the_str; > }; This is (almost) exactly what I do in VMS. I prefer using null terminated strings, because this is what I am used to, and what the library routines expect. However, VMS functions that deal with strings need a string descriptor (which works fine in stuff like pascal, because the string type is built in). It includes a pointer, a string length, a type (lots of different kinds of descriptors), and something else that I forget. A C header defines a macro $DESCRIPTOR to statically create these descriptors. For constant strings, you just do $DESCRIPTOR(strd, "constant"); For dynamic strings, I do: char str[512]; $descriptor(strd, str); Then I can use str as normal. When I need to pass this to/from a routine, I adjust the string count, or append the null (which is easy, since the count is returned in a lot of calls). It is a bit harder for automatic variables, allocated strings, etc. I wrote a simple library for this purpose once, but it never really caught on for me. VMS has some nice string manipulation routines but I never use these either, since I prefer to use the C library routines (the VMS routines can append a string, allocating new memory if necessary, etc.). So it's not so bad using both formats at once, with only a minimal overhead needed to convert when you need to. -- Darin Johnson (leadsv!laic!darin@pyramid.pyramid.com) We now return you to your regularly scheduled program.
diamond@diamond.csl.sony.junet (Norman Diamond) (05/18/89)
In article <7228@bsu-cs.bsu.edu> dhesi@bsu-cs.bsu.edu (Rahul Dhesi) writes: > var > read_input : varying [30] of char; (* where input will go *) > sys_stat := $qiow (chan := channel, func := io$_readvblk, > iosb := io_statblk, > p1 := read_input.body, > p2 := size (read_input.body) ); > read_input.length := io_statblk.count; >Note that we could not simply pass our variable-length string variable >read_input to QIOW. Instead, we had to separately pass the address of >the data area of the string (called read_input.body) and its maximum >size. Then when input was complete, we had to copy the byte count from >the status block field io_statblk.count into the length field of >read_input.length. This "varying" structure was invented (or re-invented by DEC) long after the QIOW system call was defined. Perhaps a new system call should also have been defined to replace QIOW? OK, it is necessary to clarify my statement. DEC required their language implementors, with one exception, to conform to certain storage and descriptor standards that were specified by the operating system. Therefore, with one exception, hacks are not needed to share data among several languages, when the languages all have syntactic constructs for the data. The exception: assembly language. -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net) The above opinions are my own. | Why are programmers criticized for If they're also your opinions, | re-implementing the wheel, when car you're infringing my copyright. | manufacturers are praised for it?
phil@ux1.cso.uiuc.edu (05/18/89)
> The major problem with using anything other than \0-terminated > strings in C is that you give up the easy ability to define > string constants a la "/etc/passwd". Standard C compilers will > create a \0-terminated string for these, regardless of what > your home-made string utilities prefer. If you wanted to redefine how strings worked as a part of the language or as a special implementation, then the constants would of course be defined that same way. "/etc/passwd" is, of course, NOT a string, but a constant address of array of char. That is part of the origins of C. A language extension could create a string primitive type, and the compiler would have to build "/etc/passwd" as (string) or as (char *) as appropriate to the type of usage. --phil howard--
chris@mimsy.UUCP (Chris Torek) (05/18/89)
In article <558@laic.UUCP> darin@laic.UUCP (Darin Johnson) writes: >... VMS functions that deal with strings need a string descriptor >(which works fine in stuff like pascal, because the string type is >built in). Pascal does not have a string type. VMS Pascal has an extension that provides a string type. There is a difference. (Aside to Norman Diamond: this is not the only case. DEC managed some of their inter- language compatibility by extending the languages in question.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
karl@haddock.ima.isc.com (Karl Heuer) (05/21/89)
In article <31787@sri-unix.SRI.COM> diamond@diamond.csl.sony.junet (Norman Diamond) writes: >[in two separate articles, Doug Gwyn writes:] >>If you want counted strings, C makes it relatively easy to provide them for >>yourself... Furthermore there is no reason your counted-string >>implementation should be other than perfectly portable. > >Yes, just like [you can use Silly Macros to make C look like Algol] >and still be perfectly portable. Everyone will hate you. Not analogous. Yes, I would curse your grave if you used the SHELLGOL macros in a program I had to maintain, but no, I would not object to the use of a struct {size_t; char *} to represent text in cases where the usual model is inappropriate. (Note that last phrase: "in cases where the usual model is inappropriate". For most purposes, \0-termination works just fine.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint