[comp.lang.c] strings

henry@utzoo.uucp (Henry Spencer) (05/11/89)

In article <10235@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>Yeah, I think so.  Strings, for example.  Cobol, PL/I, Algol,
>Fortran-77, Snobol, etc., have string types and say what kind of
>operations can be done on strings.  C says that a string is terminated
>with a '\0' byte.  Instead of assigning a null string to a target,
>C programmers assign a '\0' byte, so the implementation of C library
>routines can never be speeded up.  For other languages, improvements
>are often made to implementations.

Improvements to C library routines are quite possible.  Like all such,
cleverness is sometimes required.  One convention is not intrinsically
worse than the other.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

diamond@diamond.csl.sony.junet (Norman Diamond) (05/12/89)

I wrote:

>>C says that a string is terminated
>>with a '\0' byte.  Instead of assigning a null string to a target,
>>C programmers assign a '\0' byte, so the implementation of C library
>>routines can never be speeded up.  For other languages, improvements
>>are often made to implementations.

In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

>Improvements to C library routines are quite possible.  Like all such,
>cleverness is sometimes required.  One convention is not intrinsically
>worse than the other.

How do you improve a C library routine to look in a string descriptor
to just grab the current length of the string?  In other languages,
libraries can do that.  It kind of seems to me that if a C library
does that, I can watch it break my legal C program.  On the other hand,
a correct strlen() function has to scan every byte of (for example)
my 300K array.  Or my 509-byte array, maybe 510-byte array, but several
thousand times.  It seems intrinsically worse to me.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-implementing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

turner@sdti.SDTI.COM (Prescott K. Turner) (05/12/89)

In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <10235@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>>Strings, for example.
>> ...
>>C says that a string is terminated with a '\0' byte. 
>> ... so the implementation of C library routines can never be speeded up.
>>For other languages, improvements are often made to implementations.
>
>Improvements to C library routines are quite possible.  Like all such,
>cleverness is sometimes required.  One convention is not intrinsically
>worse than the other.

Diamond is right.  C is worse because it specifies not just the operations on
string types and their meaning, but the representation of strings.  As Paul
Abrahams put it in SIGPLAN Notices 23:10, "Some Sad Remarks About String
Handling in C", "C strings are not first class objects."  He gives details of
how this prevents the clever from succeeding.
--
Prescott K. Turner, Jr.
Software Development Technologies, Inc.
P.O. Box 366, Sudbury, MA 01776 USA         (508) 443-5779
UUCP: ...{harvard,mit-eddie}!sdti!turner    Internet: turner@sdti.sdti.com

bph@buengc.BU.EDU (Blair P. Houghton) (05/14/89)

In article <10245@socslgw.csl.sony.JUNET> diamond@diamond.csl.sony.junet
(Norman Diamond) writes:
>In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry
>Spencer) writes:
>>Improvements to C library routines are quite possible.
>>One convention is not intrinsically worse than the other.
>
>How do you improve a C library routine to look in a string descriptor
>to just grab the current length of the string?  In other languages,
>libraries can do that.  [...] On the other hand,
>a correct strlen() function has to scan every byte of (for example)
>my 300K array.  Or my 509-byte array, maybe 510-byte array, but several
>thousand times.  It seems intrinsically worse to me.

Slower, yes.  Worse, no.

If it's that way, then use the string functions so they only
have to do it once, and store the knowledge as structs which
include the string and a length counter, and then use your own
library to implement string functions that know about your
string-data structure.

However, a strlen() that doesn't count the char's can be fooled,
and any data structure you devise can be inadequate by someone
else's standards, while not retaining the the elegant completeness
of just storing the data as is.

				--Blair
				  "Or am I overlooking something
				   obvious...again?"

henry@utzoo.uucp (Henry Spencer) (05/14/89)

In article <10245@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>>Improvements to C library routines are quite possible.  Like all such,
>>cleverness is sometimes required.  One convention is not intrinsically
>>worse than the other.
>
>How do you improve a C library routine to look in a string descriptor
>to just grab the current length of the string? 

You can't, any more than you can improve the equivalent in some other
languages to get the length of a trailing substring without having to
go back to the beginning and then subtract.  The data structures do
constrain your ability to improve the functions.  That doesn't mean
you can't make improvements.

(If you're going to tell me that other languages can change the underlying
implementation, note that they *have* to use a length-count implementation
if the language semantics require that '\0' be a valid string character,
unless still worse convolutions are used.)

>... On the other hand,
>a correct strlen() function has to scan every byte of (for example)
>my 300K array...

Nonsense, it only has to scan the words in that array that comprise the
actual text of your string... which normally is measured in bytes, not
hundreds of Kbytes.  It doesn't have to do it a byte at a time, by the way,
even on machines with no special string-scan facilities -- you just have
to be clever.

>Or my 509-byte array, maybe 510-byte array, but several thousand times...

If you are applying strlen to the same string thousands of times, your
code is badly written, period.  I recommend re-reading that gem of a paper,
"News Need Not Be Slow", co-written by yours truly, in the Winter 87
Usenix proceedings, for sage words of advice on avoiding inefficiency. :-)
Nobody ever said that strlen was *always* the right way to get string
lengths.

> It seems intrinsically worse to me.

That depends on what you are doing.  In certain ways it is, given that
a length-count implementation has more information immediately available.
In other ways it isn't, because that semi-redundant information has to
be updated whenever the string is modified, and that has a non-zero cost.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/14/89)

In article <10245@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>It [null-terminated strings] seems intrinsically worse to me.

I thought it was well known that each method (associated count vs.
terminator) has advantages in some contexts and disadvantages in
others.  Many interesting operations on strings are faster with null
terminator values than when an associated count must be tested.  The
main drawback to terminated strings is that the terminator value
cannot be contained within the string.

If you want counted strings, C makes it relatively easy to provide
them for yourself.

chris@mimsy.UUCP (Chris Torek) (05/14/89)

In article <456@sdti.SDTI.COM> turner@sdti.SDTI.COM (Prescott K. Turner)
writes:
>Diamond is right.  C is worse because it specifies not just the operations on
>string types and their meaning, but the representation of strings.  As Paul
>Abrahams put it in SIGPLAN Notices 23:10, "Some Sad Remarks About String
>Handling in C", "C strings are not first class objects."  He gives details of
>how this prevents the clever from succeeding.

It is true that strings---or rather, string constants; C does not have
strings as a basic data type: they are merely a convention, which some
programs (Emacs, e.g.) avoid---are second class objects.  This is because
a double-quoted string constant creates an unnamed array, and C's arrays
are second-class.  But this only prevents cleverness in a weak sense.
If you prefer counted strings, you can create them:

	struct cstr {
		int	len;
		char	*data;
	};
	#define CSTR(s) { sizeof(s) - 1, s }

	struct cstr hello = CSTR("hello world");

It is true that the compiler and run-time system cannot arbitrarily
choose some alternative representation for C's strings, but neither
can they choose other representations for any other form of array.
The language is at least consistent.

Incidentally, the average string (in the mythical average C program)
is shorter than the average Dhrystone string.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

mat@mole-end.UUCP (Mark A Terribile) (05/14/89)

> >>C says that a string is terminated with a '\0' byte.  ...  C programmers
> >>assign a '\0' byte, so the implementation of C library routines can never
> >>be speeded up.  For other languages, [implementations are often improved].
 
> >Improvements to C library routines are quite possible.  Like all such,
> >cleverness is sometimes required.  ...
 
> How do you improve a C library routine to look in a string descriptor to
> just grab the current length of the string?  In other languages, libraries
> can do that.  ...

Or the compilers can emit code ...

> ...  It kind of seems to me that if a C library does that, [ it will ] break
> my legal C program.  On the other hand, [ strlen() ] has to scan every byte
> of ... my 300K ... or my 509-byte array ... several thousand times.  It seems
> intrinsically worse to me.

Well, in C you are stuck.  At the risk of being told to go to my own group,
this is the point where you should switch to C++ and define a string type
that uses whatever you have available in your particular environment.  Just
derive it (compatibly) from an existing string type so that if you have to
run in a pure-C++ environment, you have a fallback.   Oh, and where a compiler
would emit code, you can make C++ use inlines.

Of course, I could ask you to show me a machine on which the FORTRAN compiler
has access to the internal implementation of COBOL, or on which COBOL can be
made to use the FORTRAN complex arithmetic algorithms.  We could go on.
-- 

 (This man's opinions are his own.)
 From mole-end				Mark Terribile

bph@buengc.BU.EDU (Blair P. Houghton) (05/14/89)

In article <456@sdti.SDTI.COM> turner@sdti.UUCP (0006-Prescott K. Turner, Jr.) writes:
>In article <1989May11.155935.22324@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>>In article <10235@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
[...it's all in the ref's.  I'm after this guy:...]
>
>Diamond is right.  C is worse because it specifies not just the operations on
>string types and their meaning, but the representation of strings.  As Paul
>Abrahams put it in SIGPLAN Notices 23:10, "Some Sad Remarks About String
>Handling in C", "C strings are not first class objects."  He gives details of
>how this prevents the clever from succeeding.

Those ACM SIG* types are usually pretty clever, but if one can't come up
with the obvious (see below), then this one wonders where that one learned
what data were, and whether the full meaning of 'clever' doesn't apply...

	typdef struct {
		char *characters;
		int length;
	} string;

	#define assign_string(s,a) s.characters=a;\
				   s.length=strlen(s.characters);

	main()
	{
		string lunchbox;

		assign_string(lunchbox,"Batman");

		/* Now all you need do is refer to lunchbox.length
		 *	when you need the length of the string
		 *	stored in lunchbox.characters...
		 */
		
		printf("%d\n",lunchbox.length);

		/*
		 *	If you think this is less efficient
		 *	computationally than the stuff your
		 *	'intelligent' languages do with string
		 *	data, then you're sadly mistaken...
		 *
		 * If using lunchbox.length doesn't appeal to you,
		 *	try:
		 */

		printf("%d\n",stringlength(lunchbox);

		/*
		 *	where you've done
		 *	
		 *	#define stringlength(x) x.length
		 *
		 *	somewhere above the call to stringlength.
		 */
	}

				--Blair
				  "Too damn easy.  I have _got_
				   to be missing some undercurrent
				   in this stream of cruft..."

peter@ficc.uu.net (Peter da Silva) (05/15/89)

C strings have a major disadvantage that's got nothing to do with performance.
You can't just stick arbitrary binary data in a string and expect it to work.
If there is a null anywhere in that data it's going to cut you.

Strings in variant-record form, with a length and data, or dope vectors, with
a length an a pointer, are just plain more versatile than C strings.

Luckily, however, 'C' is not tied to its runtime library. It's possible to
not only use a completely different kind of string in the language, but to
mix the two. It's a pity X3J11 didn't see fit to standardise a 'length' escape
like the common "\p" (for 'pascal') on the Mac. Maybe "\l".
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

peter@ficc.uu.net (Peter da Silva) (05/15/89)

In article <1989May13.211218.24251@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> You can't, any more than you can improve the equivalent in some other
> languages to get the length of a trailing substring without having to
> go back to the beginning and then subtract.

Not if the string is stored as a length-and-start-address.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

diamond@csl.sony.co.jp (Norman Diamond) (05/15/89)

I wrote:

>>It [null-terminated strings] seems intrinsically worse to me.

Doug Gwyn replied:

>I thought it was well known that each method (associated count vs.
>terminator) has advantages in some contexts and disadvantages in
>others.  Many interesting operations on strings are faster with null
>terminator values than when an associated count must be tested.  The
>main drawback to terminated strings is that the terminator value
>cannot be contained within the string.

Fine.  So in a language which does not specify how strings are
implemented, an implementation could be improved by using BOTH
a count and a null terminator.  This still is not possible in C.

>If you want counted strings, C makes it relatively easy to provide
>them for yourself.

Yes.  You throw away the C library (which I understand is part of the
proposed ANSI standard) and the language's definition of how strings
are represented, you define your own representation of strings, and
you implement your own library.  This is perfectly fine.  A strictly
conforming program is not required to use every feature or every
mis-feature of the standard; a program is allowed to be more strict.

Good luck porting other people's strictly conforming programs though.
They might use C strings.

Good luck persuading someone else to port your programs.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-implementing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/16/89)

In article <10250@socslgw.csl.sony.JUNET> diamond@csl.sony.co.jp.csl.sony.co.jp (Norman Diamond) writes:
>Doug Gwyn replied:
>>If you want counted strings, C makes it relatively easy to provide
>>them for yourself.
>Good luck porting other people's strictly conforming programs though.
>They might use C strings.
>Good luck persuading someone else to port your programs.

I don't understand your comment.  Of course the C compiler and library
continue to support null-terminated strings.  Defining your own
counted-string data type and functions doesn't affect that at all.
Furthermore there is no reason your counted-string implementation
should be other than perfectly portable.

diamond@diamond.csl.sony.junet (Norman Diamond) (05/16/89)

In article <190@mole-end.UUCP> mat@mole-end.UUCP (Mark A Terribile) writes:

>Well, in C you are stuck.  At the risk of being told to go to my own group,
>this is the point where you should switch to C++ and define a string type
>that uses whatever you have available in your particular environment.

Of course.  In fact, this is why string handling seems to be a popular
topic in C++ library writing.  We agree completely.

>Of course, I could ask you to show me a machine on which the FORTRAN compiler
>has access to the internal implementation of COBOL, or on which COBOL can be
>made to use the FORTRAN complex arithmetic algorithms.  We could go on.

I'm not sure why you ask this question.  The answer is VMS.  DEC
required all of their language developers to conform to implementations
specified by the operating system.  This is exactly where they ran into
problems with C.  This conversation has now revolved back to its point
of beginning....

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-implementing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

bengsig@oracle.nl (Bjorn Engsig) (05/16/89)

In some article, Doug Gwyn wrote that \0 terminated strings and strings
with associated length both have advantages and disadvanteges.  He also wrote
>>If you want counted strings, C makes it relatively easy to provide
>>them for yourself.

In article <10250@socslgw.csl.sony.JUNET> diamond@csl.sony.co.jp.csl.sony.co.jp (Norman Diamond) writes:
>
>Yes.  You throw away the C library (which I understand is part of the
>proposed ANSI standard) and the language's definition of how strings
>are represented, you define your own representation of strings, and
>you implement your own library.  [deletions]
>
Now come on, why should you not write your own handling of something (e.g.
strings) if this can speed up your program.  You would still do it in 
ANSI C; if you provide an interface to the outside world, you would either 
tell others about your interface or convert to the 'normal' representation,
this is no big deal.

>Good luck porting other people's strictly conforming programs though.
>They might use C strings.
So what?  This is what your ANSI C compiler knows about.
>
>Good luck persuading someone else to port your programs.
Well, our software is ported to very many Unix and non-Unix platforms, and
we do a lot of speed improvements using our own internal representatino of
various types of variables.  In the very rare (measured in CPU cycles) 
cases, where we interface to the outside world, we convert between internal
and external representation.
-- 
Bjorn Engsig, ORACLE Europe         \ /    "Hofstadter's Law:  It always takes
Path:   mcvax!orcenl!bengsig         X      longer than you expect, even if you
Domain: bengsig@oracle.nl           / \     take into account Hofstadter's Law"

les@chinet.chi.il.us (Leslie Mikesell) (05/17/89)

In article <10250@socslgw.csl.sony.JUNET> diamond@csl.sony.co.jp.csl.sony.co.jp (Norman Diamond) writes:

>>If you want counted strings, C makes it relatively easy to provide
>>them for yourself.

>Yes.  You throw away the C library (which I understand is part of the
>proposed ANSI standard) and the language's definition of how strings
>are represented, you define your own representation of strings, and
>you implement your own library.  This is perfectly fine.  A strictly
>conforming program is not required to use every feature or every
>mis-feature of the standard; a program is allowed to be more strict.

Why throw anything away?  You can store the lengths elsewhere and
still use the null-terminated representation that the library
routines want to see.
 struct eg { unsigned int str_len;
             char *the_str;
            };
This way you only need to store the length when you think it will be
useful later.  The only problem occurs if you need to store a '\0'
value as part of the character array, and then it is only a problem if
you want to use the library string routines on that array.

>Good luck porting other people's strictly conforming programs though.
>They might use C strings.
 
As well they should...  
The only real problem I see with the C library routines is that they
generally don't return a length or pointer to the last character even
though it would be trivial to do so.  Thus if you want the length you
have to make another function call (at least strlen doesn't return
the same pointer you gave it).

Les Mikesell

wht@tridom.uucp (Warren Tucker) (05/17/89)

This might be a little long, but maybe it'll help reduce
some further dialog (or provoke more :-|).  I have needed
nice formal string-descriptor based strings _RARELY_ and
this is a sort of tool box I pick stuff out of.  Hope it
helps.

#!/bin/sh
# shar:	Shell Archiver  (v1.22)
#
#	Run the following text with /bin/sh to create:
#	  esd2.h
#	  esd2util.c
#
sed 's/^X//' << 'SHAR_EOF' > esd2.h &&
X/* CHK=0x1793 */
X/*+-----------------------------------------------------------------------
X	esd2.h -- support header for users of esd2util.c
X	...!gatech!emory!tridom!wht
X------------------------------------------------------------------------*/
X/*+:EDITS:*/
X/*:10-31-1988-16:37-wht-esd2 adapted from esd.h/esdutil.c */
X
Xtypedef struct esd
X{
X	char	*pb;		/* full pointer to esd strings */
X	short	cb;			/* count of bytes */
X	short	maxcb;		/* maximum bytes allowed */
X	short	index;		/* next character of significance */
X	short	old_index;	/* last token (backup or error reporting) */
X}	ESD;
X
Xtypedef struct keyword_table_type	/* table terminated with null key_word */
X{
X    char    *key_word;          /* key word */
X    int     key_token;          /* token returned on match */
X} KEYTAB;
X
X/* vi: set tabstop=4 shiftwidth=4: */
SHAR_EOF
chmod 0644 esd2.h || echo "restore of esd2.h fails"
sed 's/^X//' << 'SHAR_EOF' > esd2util.c &&
X/* CHK=0x2302 */
X/*+----------------------------------------------------------------
X	esd2util.c
X	...!gatech!emory!tridom!wht
X
X  Defined functions:
X	append_zstr_to_esd(tesd,zstr)
X	esdstrindex(esd1,esd2,index1_flag,index2_flag)
X	fgetesd(tesd,fileptr)
X	fputesd(tesd,fileptr,index_flag,nl_flag)
X	free_esd(tesd)
X	get_alpha_zstr(param,strbuf,strbuf_maxcb)
X	get_alphanum_zstr(param,strbuf,strbuf_maxcb)
X	get_numeric_value(param,value)
X	get_numeric_zstr(param,strbuf,strbuf_maxcb)
X	get_word_zstr(param,strbuf,strbuf_maxcb)
X	init_esd(tesd,cptr,maxcb)
X	keyword_lookup(ktable,param)
X	make_esd(maxcb)
X	null_terminate_esd(tesd)
X	skip_cmd_break(tesd)
X	skip_cmd_char(param,skipchar)
X	skip_comma(param)
X	skip_ld_break(zstr)
X	skip_paren(param,fLeft)
X	strindex(str1,str2)
X	strip_trailing_spaces_esd(ztext)
X
X-----------------------------------------------------------------*/
X/*+:EDITS:*/
X/*:10-31-1988-16:37-wht-esd2 adapted from esd.h/esdutil.c */
X/*:04-18-1988-18:19-wht-more routines */
X/*:01-28-1987-12:30-wht-add get_word_zstr */
X/*:01-28-1987-12:00-wht-include MSC 4.0 / MSDOS compatibility */
X/*:01-16-1986-01:00-WHT-Creation of edits (version beta 1.01) */
X
X#include <stdio.h>
X#include <ctype.h>
X#include "esd2.h"
X
X#if XENIX | MSDOS
X#include <string.h>
X#else
Xextern char	*index();
X#endif
X
X/*+-------------------------------------------------------------------------
X    void null_terminate_esd(&esd)
X    puts null at 'cb' position of string (standard esd always
X    has one more byte in buffer than maxcb says)
X--------------------------------------------------------------------------*/
Xvoid
Xnull_terminate_esd(tesd)
Xregister ESD *tesd;
X{
X	tesd->pb[tesd->cb] = 0;
X}   /* end of null_terminate_esd */
X
X/*+-----------------------------------------------------------------------
X	void init_esd(tesd,cptr,maxcb)  init an esd 
X------------------------------------------------------------------------*/
Xvoid init_esd(tesd,cptr,maxcb)
Xregister ESD *tesd;
Xchar	*cptr;
Xregister int maxcb;
X{
X	tesd->pb = cptr;			/* pointer to string */
X	tesd->maxcb = maxcb;		/* max characters in buffer */
X	tesd->cb = 0;				/* current count == 0 */
X	tesd->index = 0;			/* parse index to first position */
X	tesd->old_index = 0;		/* parse index to first position */
X	*tesd->pb = 0;				/* start with null terminated string */
X
X}	/* end of init_esd */
X
X/*+-----------------------------------------------------------------------
X	esdptr = make_esd(maxcb)	allocate an esd and buffer
X------------------------------------------------------------------------*/
XESD	*
Xmake_esd(maxcb)
Xregister int maxcb;		/* desired maxcb */
X{
X	register ESD *tesd;
X	register int actual_cb;
X
X	/* we get an extra character to ensure room for null past maxcb */
X	actual_cb = maxcb + sizeof(ESD) + 1;
X	if(actual_cb & 1)		/* even allocation */
X		++actual_cb;
X	if((tesd = (ESD *)malloc( (unsigned)actual_cb )) == NULL)
X		return((ESD *)0); 	/* return NULL if failure */
X
X	init_esd(tesd,(char *)(tesd + 1),maxcb);
X	return(tesd);
X
X}	/* end of make_esd */
X
X/*+-----------------------------------------------------------------------
X	free_esd(esdptr)
X------------------------------------------------------------------------*/
Xvoid free_esd(tesd)
Xregister ESD *tesd;
X{
X	tesd->maxcb = 0;
X	tesd->cb = 0;
X	free((char *)tesd);
X}
X
X/*+----------------------------------------------------------------
X    strindex:  string index function
X
X  Returns position of 'str2' in 'str1' if found
X  If 'str2' is null, then 0 is returned (null matches anything)
X  Returns -1 if not found
X-----------------------------------------------------------------*/
Xint
Xstrindex(str1,str2)
Xchar *str1;	/* the (target) string to search */
Xchar *str2;	/* the (comparand) string to search for */
X{
X	register int istr1 = 0;
X	register int lstr2 = strlen(str2);
X	register char *mstr = str1;	/* the (target) string to search */
X
X	if(*str2 == 0)			/* null string matches anything */
X		return(0);
X
X	while(*mstr)
X	{
X		if(*mstr == *str2)
X		{ /* we have a first char match... does rest of string match? */
X			if(!strncmp(mstr,str2,lstr2))
X				return(istr1);		/* if so, return match position */
X		}
X		mstr++;
X		istr1++;
X	}
X
X	return(-1);		/* if we exhaust target string, flunk */
X
X}   /* end of strindex */
X
X/*+-------------------------------------------------------------------------
X	esdstrindex(esd1,esd2,index1_flag,index2_flag)
X
X  Call strindex with esd1->pb and esd2->pb.
X  If index1_flag != 0, esd1->pb + esd1->index passed
X  If index2_flag != 0, esd2->pb + esd2->index passed
X--------------------------------------------------------------------------*/
Xesdstrindex(esd1,esd2,index1_flag,index2_flag)
Xregister ESD *esd1;
Xregister ESD *esd2;
Xregister int index1_flag;
Xregister int index2_flag;
X{
X	return(strindex((index1_flag) ? esd1->pb : esd1->pb + esd1->index,
X	    (index2_flag) ? esd2->pb : esd2->pb + esd2->index));
X
X}	/* end of esdstrindex */
X
X/*+----------------------------------------------------------------
X    keyword_lookup(ktable,param)
X
X  Lookup string in keyword_table struct array
X  Returns table->key_token if 'param' found in
X  'table', else -1
X
X  Beware substrings.  "type","typedef" will both match "type"
X  Ordering of table can help this.
X-----------------------------------------------------------------*/
Xkeyword_lookup(ktable,param)
Xregister KEYTAB  *ktable;
Xregister char *param;
X{
X	register int plen = strlen(param);
X
X	while(ktable->key_word)
X	{
X		if(!strncmp(ktable->key_word,param,plen))
X			return(ktable->key_token);
X		++ktable;
X	}   /* end of while */
X
X	return(-1);     /* search failed */
X
X}   /* end of keyword_lookup */
X
X/*+----------------------------------------------------------------
X    skip_cmd_break(tesd)
X
X  Finds next non-break or end of command line text
X  'tesd' is an esd with valid 'index' field
X  Returns   0		index field points to non-break character
X            -1		end of line found
X-----------------------------------------------------------------*/
Xint
Xskip_cmd_break(tesd)
Xregister ESD *tesd;
X{
X	register int cb = tesd->cb;
X	register int index = tesd->index;
X	register char *pb = tesd->pb + index;
X
X	while(index < cb)
X	{
X		if(*pb++ != 0x20)
X			break;
X		index++;
X	}
X	tesd->old_index = tesd->index = index;
X	if(index >= cb)
X		return(-1);
X	else
X		return(0);
X
X}   /* end of skip_cmd_break */
X
X/*+-------------------------------------------------------------------------
X    erc = skip_cmd_char(param,skipchar)
X--------------------------------------------------------------------------*/
Xint
Xskip_cmd_char(param,skipchar)
Xregister ESD *param;
Xregister char skipchar;
X{
X	register int erc;
X
X	if(erc = skip_cmd_break(param))
X		return(erc);
X
X	if(param->pb[param->index] == skipchar)
X	{
X		++param->index;
X		return(0);
X	}
X
X	return(-1);
X
X}   /* end of skip_cmd_char */
X
X/*+-------------------------------------------------------------------------
X    erc = skip_comma(param)
X--------------------------------------------------------------------------*/
Xint
Xskip_comma(param)
Xregister ESD *param;
X{
X	register int erc;
X
X	if(erc = skip_cmd_break(param))
X		return(erc);
X
X	if(param->pb[param->index] == ',')
X	{
X		++param->index;
X		return(0);
X	}
X
X	return(-1);
X
X}   /* end of skip_comma */
X
X/*+-------------------------------------------------------------------------
X    erc = skip_paren(fparam,LEFT or RIGHT)
X--------------------------------------------------------------------------*/
Xint
Xskip_paren(param,fLeft)
Xregister ESD *param;
Xint fLeft;          /* if =LEFT , skip left paren, else skip right */
X{
X	register int erc;
X
X	if(erc = skip_cmd_break(param))
X		return(erc);
X
X	if(fLeft)
X	{
X		if(param->pb[param->index++] == 0x28)   /* 0x28 == open parenthesis */
X			return(0);
X		else
X		{
X			--param->index;
X			return(-1);
X		}
X	}
X	else
X	{
X		if(param->pb[param->index++] == 0x29)   /* 0x29 == close parenthesis */
X			return(0);
X		else
X		{
X			--param->index;
X			return(-1);
X		}
X	}
X
X}   /* end of skip_paren */
X
X/*+----------------------------------------------------------------
X    get_alpha_zstr(&esd,&strbuf,strbuf_maxcb)
X    converts next alphabetic string token to upper case and places it
X    into the null-terminated 'strbuf' string.  returns 0 or -1
X    or skip_cmd_break error codes
X-----------------------------------------------------------------*/
Xint
Xget_alpha_zstr(param,strbuf,strbuf_maxcb)
Xregister ESD *param;
Xregister char *strbuf;
Xregister int strbuf_maxcb;
X{
X	register int izstr;
X	register int schar;
X	register char *param_ptr = param->pb;
X
X	if(izstr = skip_cmd_break(param))
X		return(izstr);
X	izstr = 0;
X	while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) )
X	{
X		schar = param_ptr[param->index];
X		if( !isalpha(schar) )
X			break;
X		param->index++;
X		strbuf[izstr++] = to_upper(schar);
X	}
X
X	strbuf[izstr] = 0;		/* terminate the string for "C" anal retentives */
X	if(izstr)
X		return(0);
X	else /* decide whether to return badparam or noparam err */
X		return(-1);
X
X}   /* end of get_alpha_zstr */
X
X/*+----------------------------------------------------------------
X    get_alphanum_zstr(&esd,&strbuf,strbuf_maxcb)
X    converts next alphabetic string token to upper case and places it
X    into the null-terminated 'strbuf' string.  returns 0 or -1 
X    or skip_cmd_break error codes
X-----------------------------------------------------------------*/
Xint
Xget_alphanum_zstr(param,strbuf,strbuf_maxcb)
Xregister ESD *param;
Xregister char *strbuf;
Xregister int strbuf_maxcb;
X{
X	register int izstr = 0;
X	register int schar;
X
X	if(izstr = skip_cmd_break(param))
X		return(izstr);
X
X	while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) )
X	{
X		schar = param->pb[param->index++];
X		if( isalnum(schar) )
X			strbuf[izstr++]=to_upper(schar);
X		else
X		{
X			--param->index;
X			break;
X		}
X	}
X
X	strbuf[izstr]=0;         /* terminate the string for "C" anal retentives */
X	if(strlen(strbuf))
X		return(0);
X	else /* decide whether to return badparam or noparam err */
X		return(-1);
X
X}   /* end of get_alphanum_zstr */
X
X/*+----------------------------------------------------------------
X    get_numeric_zstr(&esd,&strbuf,strbuf_maxcb)
X    gets next numeric string token places it
X    into the null-terminated 'strbuf' string.  returns 0 or -1 
X    or skip_cmd_break error codes
X-----------------------------------------------------------------*/
Xint
Xget_numeric_zstr(param,strbuf,strbuf_maxcb)
Xregister ESD *param;
Xregister char *strbuf;
Xregister int strbuf_maxcb;
X{
X	register int izstr;
X	register int schar;
X
X	if(izstr = skip_cmd_break(param))
X		return(izstr);
X
X	while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) )
X	{
X		schar = param->pb[param->index++];
X		if( isdigit(schar) )
X			strbuf[izstr++]=schar;
X		else
X		{
X			--param->index;
X			break;
X		}
X	}
X
X	strbuf[izstr]=0;         /* terminate the string for "C" anal retentives */
X
X	if(strlen(strbuf))
X		return(0);
X	else /* decide whether to return badparam or noparam err */
X	{
X		return(skip_cmd_break(param));
X	}
X
X}   /* end of get_numeric_zstr */
X
X/*+----------------------------------------------------------------
X    get_word_zstr(&esd,&strbuf,strbuf_maxcb)
X    gets next word (continuous string of characters
X    without spacesor tabs )
X    returns 0 or -1 or skip_cmd_break error codes
X-----------------------------------------------------------------*/
Xint
Xget_word_zstr(param,strbuf,strbuf_maxcb)
Xregister ESD *param;
Xregister char *strbuf;
Xregister int strbuf_maxcb;
X{
X	register int izstr;
X	register int schar;
X
X	if(izstr = skip_cmd_break(param))
X		return(izstr);
X
X	while( (izstr < strbuf_maxcb-1) && (param->index < param->cb) )
X	{
X		schar = param->pb[param->index++];
X		if( (schar > 0x20) && (schar <= 0x7e))
X			strbuf[izstr++]=schar;
X		else
X		{
X			--param->index;
X			break;
X		}
X	}
X
X	strbuf[izstr]=0;         /* terminate the string for "C" anal retentives */
X
X	if(strlen(strbuf))
X		return(0);
X	else /* decide whether to return badparam or noparam err */
X	{
X		return(skip_cmd_break(param));
X	}
X
X}   /* end of get_word_zstr */
X
X/*+-----------------------------------------------------------------------
X	get_numeric_value(param,&long_var)
X------------------------------------------------------------------------*/
Xget_numeric_value(param,value)
Xregister ESD *param;
Xregister long *value;
X{
X	register int erc;
X	char	buf[32];
X
X	if(erc = get_numeric_zstr(param,buf,sizeof(buf)))
X		return(erc);
X	sscanf(buf,"%ld",value);
X	return(0);
X
X}	/* end of get_numeric_value */
X
X/*+-------------------------------------------------------------------------
X    strip_trailing_spaces_esd(tesd)
X--------------------------------------------------------------------------*/
Xvoid
Xstrip_trailing_spaces_esd(ztext)
Xregister ESD *ztext;
X{
X	while(ztext->cb && (ztext->pb[ztext->cb-1] == 0x20))
X		ztext->cb--;
X}   /* end of strip_trailing_spaces_esd */
X
X/*+-------------------------------------------------------------------------
X	fgetesd(&esd,fileptr)
X
X  stdio read from FILE *fileptr into esd
X  returns -1 on stdio error, -2 on line too long, 0 on success
X  returns tesd->cb set up not including trailing nl, tesd->index == 0
X--------------------------------------------------------------------------*/
Xint fgetesd(tesd,fileptr)
Xregister ESD *tesd;
Xregister FILE *fileptr;
X{
X	register char *cptr;
X
X	if(fgets(tesd->pb,tesd->maxcb,fileptr) == NULL)
X		return(-1);
X#if XENIX | MSDOS
X	if((cptr = strchr(tesd->pb,0x0A)) == NULL)
X		return(-2);
X#else
X	if((cptr = index(tesd->pb,0x0A)) == NULL)
X		return(-2);
X#endif
X	tesd->cb = (int)(cptr - tesd->pb);
X	null_terminate_esd(tesd);
X	tesd->index = 0;
X	tesd->old_index = 0;
X	return(0);
X
X}	/* end of fgetesd */
X
X/*+-------------------------------------------------------------------------
X	fputesd(&esd,fileptr,index_flag,nl_flag)
X
X  write esd contents to stdio FILE *fileptr
X  if index_flag is true, write from tesd->index thru end of esd
X  otherwise, from start of esd
X  if nl_flag is true, append nl to write, else just esd contents
X  returns -1 on stdio error, 0 on success
X--------------------------------------------------------------------------*/
Xint fputesd(tesd,fileptr,index_flag,nl_flag)
Xregister ESD *tesd;
Xregister FILE *fileptr;
Xint index_flag;
Xint nl_flag;
X{
X	register char *cptr;
X	register int write_length;
X
X	if(index_flag)
X	{
X		cptr = &tesd->pb[tesd->index];
X		write_length = tesd->cb - tesd->index;
X	}
X	else
X	{
X		cptr = tesd->pb;
X		write_length = tesd->cb;
X	}
X
X	if(write_length)
X		if(fwrite(cptr,write_length,1,fileptr) == 0)
X			return(-1);
X
X	if(nl_flag)
X		if(fputc(0x0A,fileptr) == 0)
X			return(-1);
X
X	return(0);
X}	/* end of fputesd */
X
X/*+-------------------------------------------------------------------------
X	cptr = skip_ld_break(cptr)
X  Skip leading spaces and tabs
X--------------------------------------------------------------------------*/
Xchar *skip_ld_break(zstr)
Xregister char *zstr;
X{
X	while((*zstr == 0x20) || (*zstr == 0x09))
X		zstr++;
X	return(zstr);
X}	/* end of skip_ld_break */
X
X/*+-----------------------------------------------------------------
X	append_zstr_to_esd
X------------------------------------------------------------------*/
Xappend_zstr_to_esd(tesd,zstr)
XESD		*tesd;
Xchar	*zstr;
X{
X	register int zstrlen = strlen(zstr);
X
X	if(zstrlen > (tesd->maxcb - tesd->cb))
X		zstrlen = tesd->maxcb - tesd->cb;
X
X	if(zstrlen)
X	{
X		strncpy(tesd->pb + tesd->cb,zstr,zstrlen);
X		tesd->cb += zstrlen;
X	}
X}
X/* end of esd2util.c */
X
X/* vi: set tabstop=4 shiftwidth=4: */
SHAR_EOF
chmod 0644 esd2util.c || echo "restore of esd2util.c fails"
exit 0
-- 
-------------------------------------------------------------------
Warren Tucker, Tridom Corporation       ...!gatech!emory!tridom!wht 
Sforzando (It., sfohr-tsahn'-doh).  A direction to perform the tone
or chord with special stress, or marked and sudden emphasis.

dhesi@bsu-cs.bsu.edu (Rahul Dhesi) (05/17/89)

In article <10255@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman
Diamond) writes:
     The answer is VMS.  DEC required all of their language developers
     to conform to implementations [of strings] specified by the
     operating system.  This is exactly where they ran into problems
     with C.

Here are excerpts of code adapted from one of the VAX/VMS Pascal
manuals.  I have omitted some declarations and error-checking.  This
code reads a record from a channel that has been (in the original code)
assigned to a mailbox.

     var
	... other stuff ...
        read_input : varying [30] of char;  (* where input will go *)
     
     begin
     ... other code ...
     sys_stat := $qiow (chan := channel, func := io$_readvblk, 
                        iosb := io_statblk,
                        p1 := read_input.body,
                        p2 := size (read_input.body) );
     read_input.length := io_statblk.count;
     ... other stuff ...
     end.

Note that we could not simply pass our variable-length string variable
read_input to QIOW.  Instead, we had to separately pass the address of
the data area of the string (called read_input.body) and its maximum
size.  Then when input was complete, we had to copy the byte count from
the status block field io_statblk.count into the length field of
read_input.length.
-- 
Rahul Dhesi <dhesi@bsu-cs.bsu.edu>
UUCP:    ...!{iuvax,pur-ee}!bsu-cs!dhesi

diamond@diamond.csl.sony.junet (Norman Diamond) (05/17/89)

Doug Gwyn:

>>>If you want counted strings, C makes it relatively easy to provide
>>>them for yourself.

me:

>>Good luck porting other people's strictly conforming programs though.
>>They might use C strings.
>>Good luck persuading someone else to port your programs.

Doug Gwyn:

>I don't understand your comment.  Of course the C compiler and library
>continue to support null-terminated strings.  Defining your own
>counted-string data type and functions doesn't affect that at all.
>Furthermore there is no reason your counted-string implementation
>should be other than perfectly portable.

Yes, just like the C compiler continues to support { and }, but you
can do:
  #define BEGIN {
  #define END   }
or (sorry Bjarne but it's true)
  #define Case  break; case
and still be perfectly portable.  Everyone will hate you.  A lot of
C programmers, such as for example Doug Gwyn, expect standard facilities
to be used.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-implementing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

tneff@bfmny0.UUCP (Tom Neff) (05/17/89)

The major problem with using anything other than \0-terminated
strings in C is that you give up the easy ability to define
string constants a la "/etc/passwd".  Standard C compilers will
create a \0-terminated string for these, regardless of what
your home-made string utilities prefer.
-- 
Tom Neff				UUCP:     ...!uunet!bfmny0!tneff
    "Truisms aren't everything."	Internet: tneff@bfmny0.UU.NET

darin@laic.UUCP (Darin Johnson) (05/18/89)

>>Yes.  You throw away the C library (which I understand is part of the
>>proposed ANSI standard) and the language's definition of how strings
>>are represented, you define your own representation of strings, and
>>you implement your own library.
>
>Why throw anything away?  You can store the lengths elsewhere and
>still use the null-terminated representation that the library
>routines want to see.
> struct eg { unsigned int str_len;
>             char *the_str;
>            };

This is (almost) exactly what I do in VMS.  I prefer using null
terminated strings, because this is what I am used to, and what the
library routines expect.  However, VMS functions that deal with strings
need a string descriptor (which works fine in stuff like pascal, because
the string type is built in).  It includes a pointer, a string length,
a type (lots of different kinds of descriptors), and something else
that I forget.

A C header defines a macro $DESCRIPTOR to statically create these
descriptors.  For constant strings, you just do
$DESCRIPTOR(strd, "constant");  For dynamic strings, I do:

  char str[512];
  $descriptor(strd, str);

Then I can use str as normal.  When I need to pass this to/from a
routine, I adjust the string count, or append the null (which is easy,
since the count is returned in a lot of calls).  It is a bit harder for
automatic variables, allocated strings, etc.  I wrote a simple library
for this purpose once, but it never really caught on for me.  VMS has
some nice string manipulation routines but I never use these either, since
I prefer to use the C library routines (the VMS routines can append a
string, allocating new memory if necessary, etc.).

So it's not so bad using both formats at once, with only a minimal
overhead needed to convert when you need to.

-- 
Darin Johnson (leadsv!laic!darin@pyramid.pyramid.com)
	We now return you to your regularly scheduled program.

diamond@diamond.csl.sony.junet (Norman Diamond) (05/18/89)

In article <7228@bsu-cs.bsu.edu> dhesi@bsu-cs.bsu.edu (Rahul Dhesi) writes:

>     var
>        read_input : varying [30] of char;  (* where input will go *)

>     sys_stat := $qiow (chan := channel, func := io$_readvblk, 
>                        iosb := io_statblk,
>                        p1 := read_input.body,
>                        p2 := size (read_input.body) );
>     read_input.length := io_statblk.count;

>Note that we could not simply pass our variable-length string variable
>read_input to QIOW.  Instead, we had to separately pass the address of
>the data area of the string (called read_input.body) and its maximum
>size.  Then when input was complete, we had to copy the byte count from
>the status block field io_statblk.count into the length field of
>read_input.length.

This "varying" structure was invented (or re-invented by DEC) long
after the QIOW system call was defined.  Perhaps a new system call
should also have been defined to replace QIOW?

OK, it is necessary to clarify my statement.  DEC required their
language implementors, with one exception, to conform to certain
storage and descriptor standards that were specified by the operating
system.  Therefore, with one exception, hacks are not needed to
share data among several languages, when the languages all have
syntactic constructs for the data.  The exception:  assembly language.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.co.jp@relay.cs.net)
  The above opinions are my own.   |  Why are programmers criticized for
  If they're also your opinions,   |  re-implementing the wheel, when car
  you're infringing my copyright.  |  manufacturers are praised for it?

phil@ux1.cso.uiuc.edu (05/18/89)

> The major problem with using anything other than \0-terminated
> strings in C is that you give up the easy ability to define
> string constants a la "/etc/passwd".  Standard C compilers will
> create a \0-terminated string for these, regardless of what
> your home-made string utilities prefer.

If you wanted to redefine how strings worked as a part of the language
or as a special implementation, then the constants would of course be
defined that same way.  "/etc/passwd" is, of course, NOT a string, but
a constant address of array of char.  That is part of the origins of C.
A language extension could create a string primitive type, and the
compiler would have to build "/etc/passwd" as (string) or as (char *)
as appropriate to the type of usage.

--phil howard--

chris@mimsy.UUCP (Chris Torek) (05/18/89)

In article <558@laic.UUCP> darin@laic.UUCP (Darin Johnson) writes:
>... VMS functions that deal with strings need a string descriptor
>(which works fine in stuff like pascal, because the string type is
>built in).

Pascal does not have a string type.  VMS Pascal has an extension that
provides a string type.  There is a difference.  (Aside to Norman
Diamond: this is not the only case.  DEC managed some of their inter-
language compatibility by extending the languages in question.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

karl@haddock.ima.isc.com (Karl Heuer) (05/21/89)

In article <31787@sri-unix.SRI.COM> diamond@diamond.csl.sony.junet (Norman Diamond) writes:
>[in two separate articles, Doug Gwyn writes:]
>>If you want counted strings, C makes it relatively easy to provide them for
>>yourself...  Furthermore there is no reason your counted-string
>>implementation should be other than perfectly portable.
>
>Yes, just like [you can use Silly Macros to make C look like Algol]
>and still be perfectly portable.  Everyone will hate you.

Not analogous.  Yes, I would curse your grave if you used the SHELLGOL macros
in a program I had to maintain, but no, I would not object to the use of a
struct {size_t; char *} to represent text in cases where the usual model is
inappropriate.

(Note that last phrase: "in cases where the usual model is inappropriate".
For most purposes, \0-termination works just fine.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint