[comp.lang.c] Finding Available Length Of Strings...

gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) (11/09/90)

When passing strings to other functions, what is the BEST way to find
the bytes remaining in the formal string parameter (to prevent over-
writting the end while in the function)?? Does it involve using the
current starting address of the string parameter and calculating
(somehow) the DEFINED end??

Thanks for any help here...

gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) (11/09/90)

Just a note of clarification here...I am talking about a character array
and I am looking for a solution (not the obvious '...add another length
parameter')...I would like the function to be able to 'figure it out!'

Thanks again!
  
-- 
John L. Bradberry        |Georgia Tech Research Inst|uucp:..!prism!gt4512c
Scientific Concepts Inc. |Microwaves and Antenna Lab|Int : gt4512c@prism
2359 Windy Hill Rd. 201-J|404 528-5325 (GTRI)       |GTRI:jbrad@msd.gatech.
Marietta, Ga. 30067      |404 438-4181 (SCI)        |'...is this thing on..?'   

karl@ima.isc.com (Karl Heuer) (11/10/90)

In article <16758@hydra.gatech.EDU> gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:
>Just a note of clarification here...I am talking about a character array
>and I am looking for a solution (not the obvious '...add another length
>parameter')...I would like the function to be able to 'figure it out!'

Here are the options that spring to mind:
(a) Pass a length parameter.
(b) Pass a pointer to the end of the string.
(c) Implement a string structure that does one of the above for you, e.g.
    typedef struct { char *start; char *current; char *end; } string_t;
(d) Use only implementations that support the "Read Operator's Mind" syscall.

Next question?

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint

gwyn@smoke.brl.mil (Doug Gwyn) (11/10/90)

In article <16752@hydra.gatech.EDU> gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:
>When passing strings to other functions, what is the BEST way to find
>the bytes remaining in the formal string parameter (to prevent over-
>writting the end while in the function)?? Does it involve using the
>current starting address of the string parameter and calculating
>(somehow) the DEFINED end??

What on earth do you mean?  You don't pass strings to C functions; the
best you can do is pass pointers to arrays containing chars.  There is
no way to determine in the called function where the end of the array
allocation might be, given merely a pointer into it.

Conventionally, C programming relies heavily on 0-terminated char arrays
to represent character strings, but the 0 terminator value does not
normally indicate anything about the valid extent of the array within
which it lies.  (For string LITERALS it does, but you can't count on
being able to write into a string literal.  Some systems put them into
read-only memory.)

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/10/90)

In article <16758@hydra.gatech.EDU>, gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:
> Just a note of clarification here...I am talking about a character array
> and I am looking for a solution (not the obvious '...add another length
> parameter')...I would like the function to be able to 'figure it out!'

When you pass an array to a function, the function gets a pointer.
Even ANSI C has no way of declaration a function argument that really
_is_ an array.  The syntax "SomeType an_arg[]" declares a pointer.
That pointer looks just like any other pointer into that array.
The function can't even find the _beginning_ of the array, let alone
the end.  (Not portably, at any rate.)

If you want your function to receive things like one-dimensional arrays
in Smalltalk or Lisp or Pop or Algol 68 or PL/I, then you'll have to
implement that data structure yourself using C primitives.

-- 
The problem about real life is that moving one's knight to QB3
may always be replied to with a lob across the net.  --Alasdair Macintyre.

fraser@bilby.cs.uwa.oz.au (Fraser Wilson) (11/10/90)

In <16752@hydra.gatech.EDU> gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:

>When passing strings to other functions, what is the BEST way to find
>the bytes remaining in the formal string parameter (to prevent over-
>writting the end while in the function)?? Does it involve using the
>current starting address of the string parameter and calculating
>(somehow) the DEFINED end??

The language supports no feature like this.  All the function knows is
that it has a pointer to char.  If you want to know how much space is
available, you have to do it yourself.

eg
	#define LAST_CHARACTER	'\1';
	char s[32];

	space(char *s)
	{
	int i;
		for(i=0;s[i]!=LAST_CHARACTER;i++);
		return i;
	}

	main()
	{
		s[31]=LAST_CHARACTER;
		printf("%i\n",space(s));
	}

>Thanks for any help here...

You're welcome.  Hope it does.

Fraser.

hagins@gamecock.rtp.dg.com (Jody Hagins) (11/11/90)

In article <16758@hydra.gatech.EDU>, gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:
|> Just a note of clarification here...I am talking about a character array
|> and I am looking for a solution (not the obvious '...add another length
|> parameter')...I would like the function to be able to 'figure it out!'
|> 
|> Thanks again!
|>   
|> -- 
|> John L. Bradberry        |Georgia Tech Research Inst|uucp:..!prism!gt4512c
|> Scientific Concepts Inc. |Microwaves and Antenna Lab|Int : gt4512c@prism
|> 2359 Windy Hill Rd. 201-J|404 528-5325 (GTRI)       |GTRI:jbrad@msd.gatech.
|> Marietta, Ga. 30067      |404 438-4181 (SCI)        |'...is this thing on..?'   


There are several ways to do this.

1.

typedef struct
    {
    int		size;	/* Alternatively, this could be the last address */
    char *	s;
    } string_t;


void strinit(string_t *sp, char *cp, int size)
{
    sp->s = cp;
    sp->size = size;
}


main()
{
    char	some_var[SOME_SIZE];
    string_t	string;

    strinit(&string, some_var, SOME_SIZE);

    ...
}

Whenever you want to use the C string functions, send string->s.
If you want to use your own, send &string and access both start address
and the length.


2.

#define DEFINED_EOS	((char)1)	/* Any char you can guarantee not in a string */
#define FILL_CHAR	'\0'		/* Any char except DEFINED_EOS */

char *strinit(char *s, int size)
{
    int		i;

    for(i=0; i<size; i++)
        s[i] = FILL_CHAR;
    s[SIZE] = DEFINED_EOS;
    return(s);
}

int defined_strlen(char *s)
{
    int		i=0;

    while (*s++ != DEFINED_EOS)
        i++;
    return(i);
}

main()
{
    char	s[SOME_SIZE+1];		/* For this implementation, you will need one
					   extra byte to store the defined-end-of-string
					   location.  Otherwise, it could get overwritten
					   by '\0'. */

    strinit(s, SOME_SIZE);

    ...

}

Now, the C routines work, and you can get the defined length of any
string by calling defined_strlen(s).



Hope this helps.

Jody Hagins
hagins@gamecock.rtp.dg.com

gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) (11/11/90)

The 'curt' response here is unnecessary! The question was raised 
only because this can be done in many other languages (easily!) I was
simply looking for a few creative ideas if possible in C. If it's not
possible (the current average response) a simple 'it can't be done in C'
would suffice...Maturity seems elusive for some...


-- 
John L. Bradberry        |Georgia Tech Research Inst|uucp:..!prism!gt4512c
Scientific Concepts Inc. |Microwaves and Antenna Lab|Int : gt4512c@prism
2359 Windy Hill Rd. 201-J|404 528-5325 (GTRI)       |GTRI:jbrad@msd.gatech.
Marietta, Ga. 30067      |404 438-4181 (SCI)        |'...is this thing on..?'   

bhoughto@cmdnfs.intel.com (Blair P. Houghton) (11/11/90)

After much email, John's got the picture. (Hint.)

In article <14411@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <16752@hydra.gatech.EDU> gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:
>>When passing strings to other functions, what is the BEST way to find
>>the bytes remaining in the formal string parameter (to prevent over-
>
>What on earth do you mean?  You don't pass strings to C functions; the
>best you can do is pass pointers to arrays containing chars.  There is
>no way to determine in the called function where the end of the array
>allocation might be, given merely a pointer into it.

"pointers to arrays containing chars"?

    typedef char A[SIZE];
    ...
	A *foo;
	...
	func(foo);


    func( A *bar )	/* bar points to array of chars */
    {
	printf( "%d\n", sizeof (*bar) );
    }

Note, however, that foo may not point to an arbitrary
string (unless you're careful to put all your strings in
these arrays).  Note also that we're not really using an
arbitrary string, but only one that's stored in an
array SIZE of char.  Note also that, since it is a pointer
to an array rather than a pointer to char, you have to get
the chars through `*foo[i]' or `**foo' rather than `*foo'
or `foo[i]'.

The most important thing to note is that `func()' *knows*
the size of the array.  You have compiled it in.  This
defeats certain purposes.

The usual convention for "passing strings" is to pass a
pointer to the first character of the array rather than a
pointer to the array.  This depends on the fact that you've
stored the string as chars in contiguous memory locations
and on the assumption that the last character of the string
is a '\0'.  This is also how strings are handled in all of
the library functions and system calls you're likely to
encounter.  It provides flexibility; otherwise, all strings
would have a minimum storage allocation and a maximum
length (although, technically, with the current convention
the minimum usable is one char ('\0'), and certain features
of systemic i/o often lead one to limit the SIZE
consistently to BUFSIZ...)).

The only reason you should want the size of the array a
string is stored in is if you are at some point more
interested in the array than the string (as is fgets(),
for example, which of course asks for a pointer to the
first character's location and for the size).

				--Blair
				  "Did I hear an 'oops'?"

bhoughto@cmdnfs.intel.com (Blair P. Houghton) (11/13/90)

In article <921@inews.intel.com> bhoughto@cmdnfs.intel.com (Blair P. Houghton) writes:
>"pointers to arrays containing chars"?
>    typedef char A[SIZE];
>    A *foo;
>
>Note also that, since it is a pointer
>to an array rather than a pointer to char, you have to get
>the chars through `*foo[i]' or `**foo' rather than `*foo'
>or `foo[i]'.

I'm sorry, this is wrong.  The `[]' have a higher precedence
than the `*', so it would have to be `(*foo)[]' for the subscripted
version, if `foo' were a pointer to a pointer.

In order to use `foo' (as anything other than NULL)
you have to provide an array of the proper type, properly
allocated, and then assign the "address" of that array to `foo'. 
But, in the act of assigning that address, you turn the
array into the pointer to its first element.  There is in
fact no way to get the address of an array without actually
getting the address of the first element.  This still does not
make the two of them compatible types[*].  However, it means
that `**foo' is not correct.

`*foo' is the proper pointer dereference to get `array[0]',
`foo[i]' is the proper subscripted version, even though
`foo' is a proper pointer-to-array.

[*]  The assignment `foo = &bar' is right, the assignment
`foo = bar' is incorrect (gcc -ansi just produces a
warning, but, if you believe in the Pointer Fairy (as you
and Doug know I do :-), you believe that all pointers are the
same, anyway, so `foo = bar' works even though it's hubris;
but don't trust it).

				--Blair
				  "The PF still owes me a quarter.
				   She's probably just waiting for those
				   pointer-subtraction semantics I promised
				   I'd post... :-/"

weimer@ssd.kodak.com (Gary Weimer) (11/13/90)

In article <1990Nov09.183957.15122@dirtydog.ima.isc.com> karl@ima.isc.com (Karl Heuer) writes:
>In article <16758@hydra.gatech.EDU> gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) writes:
>>Just a note of clarification here...I am talking about a character array
>>and I am looking for a solution (not the obvious '...add another length
>>parameter')...I would like the function to be able to 'figure it out!'
>
>Here are the options that spring to mind:
>(a) Pass a length parameter.
>(b) Pass a pointer to the end of the string.
>(c) Implement a string structure that does one of the above for you, e.g.
>    typedef struct { char *start; char *current; char *end; } string_t;
>(d) Use only implementations that support the "Read Operator's Mind" syscall.

(e) Use the "standard" C workaround for this problem.

As other people have pointed out, this question looks like it was posed
by a C crossover from another language, so why don't we tell them what C
can do, instead of what it can't.

NOTE: for both solutions given below, don't forget to count the extra
space (byte, or whatever you want to call it) required by the end-of-string
character (\0).

EASY SOLUTION:

If all the strings you will be using are less than some number N (and
you have enough memory), then create a constant:

    #define MAX_LEN        N

where N can be any number greater than 0 (I like 255 for most cases). Now
define all your character arrays as:

    char name[MAX_LEN];

when performing loops, range checking, etc., use MAX_LEN 

ROBUST SOLUTION:

If you don't have a maximum length, or can't afford to waste memory, use
character pointers and malloc() memory as it is needed. This will allow
you to continue using the C string library; however, functions like strcat()
should probably be avoided (unless you malloc'd enough space for this). An
example (note I didn't say good) of a strcat() replacement is:

    char *mystrcat(char *s, char *t)
    {
        char *str;

        str = (char *) malloc(strlen(s) + strlen(t) + 1);
        /* should add check for str == NULL here (malloc() failed) */

        strcpy(str, s);
        strcat(str, t);
        /* NOTE: these next 2 statements disallow passing an */
        /*       array of char as s (use strcat() for this)  */
        free(s);
        s = str;

        return(s);
    }

With this solution, you will still want one string of some maximum size
to read in strings of unknown length. This could then be copied to a
string of the appropriate size (strdup() might be a good method):

    char *strdup(char *s) /* no, not a C library function */
    {
        char *str;
        str = (char *) malloc(strlen(s) + 1);
        strcpy(str, s);
        return(str);     /* notice that s is unchanged, and could */
                         /* have been declared: char s[MAX_LEN]   */
    }

(OH BOY, now I get to see how many people think this is stupid...)

Gary Weimer

gt4512c@prism.gatech.EDU (BRADBERRY,JOHN L) (11/14/90)

In article < 34449 weimer@ssd.kodak.com> (Gary Weimer) writes:

>>>In article <16758@hydra.gatech.EDU> gt4512c@prism.gatech.EDU
>>>(BRADBERRY,JOHN L) writes:
.
.
.
>>>Here are the options that spring to mind:
>>>(a) Pass a length parameter.
>>>(b) Pass a pointer to the end of the string.
>>>(c) Implement a string structure that does one of the above for
>>>you, e.g.
>>>    typedef struct { char *start; char *current; char *end; }
>>>string_t;
>>>(d) Use only implementations that support the "Read Operator's
>>>Mind" syscall.
>>
>>(e) Use the "standard" C workaround for this problem.

>
>As other people have pointed out, this question looks like it was
>posed by a C crossover from another language, so why don't we
>tell them what C can do, instead of what it can't.
>

Actually, in the graphics and signal processing area, I frequently
have to port (rewrite) thousands of lines of code from other
languages to C. In the process, I find it quite interesting to
attempt where practical (possible) to duplicate some features so
that the code algorithms appear as similar as possible. C makes
this possible more often than not. 

The original post was in no way a criticism of C (I think the
language is tremendous!), but a question of how something might be
done! To date I've gotten close to 100 very creative 'workarounds'
which is the next best thing to an exact solution. For that I am
very thankful because I'm sure few of us would like to
'intentionally' recreate the wheel...
-- 
John L. Bradberry        |Georgia Tech Research Inst|uucp:..!prism!gt4512c
Scientific Concepts Inc. |Microwaves and Antenna Lab|Int : gt4512c@prism
2359 Windy Hill Rd. 201-J|404 528-5325 (GTRI)       |GTRI:jbrad@msd.gatech.
Marietta, Ga. 30067      |404 438-4181 (SCI)        |'...is this thing on..?'   

hamish@mate.sybase.com (Just Another Deckchair on the Titanic) (11/15/90)

In article <931@inews.intel.com> bhoughto@cmdnfs.intel.com (Blair P. Houghton) writes:
>
<				--Blair
>				  "The PF still owes me a quarter.
<				   She's probably just waiting for those
>				   pointer-subtraction semantics I promised
<				   I'd post... :-/"

Ummm, Blair, the pointer fairy keeps telling me it was pointer
*addition* you owe her for. My how the humble are risen....

This is your past tapping you on the shoulder...

	Hamish
----------------------------------------------------------------------------
Hamish Reid           Sybase Inc, 6475 Christie Ave, Emeryville CA 94608 USA
+1 415 596-3917       hamish@sybase.com       ...!{mtxinu,sun}!sybase!hamish

msb@sq.sq.com (Mark Brader) (11/17/90)

> EASY SOLUTION:
> If all the strings you will be using are less than some number N (and
> you have enough memory), then create a constant:
>     #define MAX_LEN        N
> ... Now define all your character arrays as:
>     char name[MAX_LEN];
> when performing loops, range checking, etc., use MAX_LEN 

It generally seems to me to produce clearer code if the constant that
one defines specifies, not the length of the buffer (as above), but
the maximum length of the string contained in it.  That is:

	char name[MAX_LEN+1];		/* +1 for '\0' */

If you use this declaration style routinely, you get rid of a lot
of -1's scattered through the code wherever there are loops and limit
checks; and if you do make an off-by-one error, it tends to fail safe.

It is conceded that the preference for this is somewhat a matter of
opinion, and followups merely to agree or disagree with this opinion
are dissuaded.  Likewise for the presence or absence of the comment.
-- 
Mark Brader			"It's simply a matter of style, and while there
SoftQuad Inc., Toronto		 are many wrong styles, there really isn't any
utzoo!sq!msb, msb@sq.com	 one right style."	-- Ray Butterworth

This article is in the public domain.

bhoughto@cmdnfs.intel.com (Blair P. Houghton) (11/19/90)

In article <11757@sybase.sybase.com> hamish@mate.sybase.com (Just Another Deckchair on the Titanic) writes:
>Ummm, Blair, the pointer fairy keeps telling me it was pointer
>*addition* you owe her for. My how the humble are risen....

Addition, subtraction, who can tell the diff? :-)

				--Blair
				  "Sedition, Abstraction, who
				   can tell a Wizard?"

boyd@necisa.ho.necisa.oz (Boyd Roberts) (11/20/90)

In article <1990Nov17.070228.29295@sq.sq.com> msb@sq.sq.com (Mark Brader) writes:
>It generally seems to me to produce clearer code if the constant that
>one defines specifies, not the length of the buffer (as above), but
>the maximum length of the string contained in it.  That is:
>
>	char name[MAX_LEN+1];		/* +1 for '\0' */

Ever written any Pascal?  Therein lies madness.

All these fixed length character strings are a poor man's solution.
Dynamic structures aren't that hard to code up, and once you have
the right library routines it's trivial to use them with future code.
Code them once, use them everywhere.

Only today, did my mail user agent trash the RFC 822 address parser
because it [the parser*] decided that lines _never_ exceeded 256
characters.  My user agent says lines are long as you've got virtual
memory for.  So it's time to persuade the parser about dynamics.

Should be easy.


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

* Snarfed off the net at some stage.

martin@mwtech.UUCP (Martin Weitzel) (11/24/90)

In article <1946@necisa.ho.necisa.oz> boyd@necisa.ho.necisa.oz (Boyd Roberts) writes:
[in a thread dealing with the C representation for character strings]
>
>All these fixed length character strings are a poor man's solution.
>Dynamic structures aren't that hard to code up, and once you have
>the right library routines it's trivial to use them with future code.
>Code them once, use them everywhere.

I can second this. I once tried to enhance C's elegant and space
efficient character string representation with dynamic space
allocation. I ended up in a similar trick as the one often used
in malloc:

Character strings of varying lenght were represented by a pointer
to the first byte and terminated by a '\0'-character - exactly the
way C does this normaly. So, a program could normaly use "char *" variables
for such strings and hand them to all functions with only read-access
in the normal way.
	                                    +----------+
	for "read only" access              |   '\0'   |
	use normaly, for write              | ........ |
	access use special functions        | 3rd char |
	+--------+                          | 2nd char |
	| char * -------------------------> | 1st char |
	+--------+                          +----------+
	    ^                               |  length  |
	    |                               +----------+
	    +--------------------------------- char ** |
	            back-pointer            +----------+

Additionally for such varying strings there were a special setup
operation for the said "char *" variable, which allocated space for
the string itself and additionally, at some adress immediatly *below*
the string space for a length field and a back-pointer to the
variable. After such an initialization, things looked basically as
above.

There were some special functions, that - when changing the string -
looked for the length field, and did eventually realloc the space.
The only thing which had to be done carefully (besides only using
the special functions to change them and not to forget freeing them
when the pointer goes out of scope) was not to set up pointers into
them, as the reallocation could only change the one reference which
it new from the back-pointer. This included that these strings should
never be passed to functions as arguments, and then changed more
than once from within the function ...

(Well, no problem without solution: One could either pass pointers
to them or "swap them out and in" from a local instances.)
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83