[comp.lang.c] best way to return

joe@gistdev.UUCP (06/20/89)

Here is a question I haven't seen recently, and I'd like to get opinions from
the collective wisdom of the group.  Suppose I am writing a function that is
going to construct a character string, and is going to return a pointer to
that string.  What is the best way to do this so that your pointer is sure
to be valid when used?  I have seen several approaches to this problem:

    .	Have the caller pass a (char *) and let the caller worry about
	allocating whatever space is needed.

    .	Have the routine malloc() space, and let the caller free() it when
	done with the returned pointer.

    .	Have the routine allocate the buffer pointed to by the returned
	(char *) as a static.

    .	Assume it's the caller's problem to strcpy() (or other such) from the
	pointer before something else can use the space.

    .	Don't worry about it at all -- nothing is going to trash your memory
	at the pointed-to address before you can actually use it.

I'm sure there are other approaches, but these were the ones I could think of
off the top of my head.  In general, how _should_ this be done to be safest?

-------------------------------------------------------------------------------
Joe Brownlee       | Captain, please -- not in front of the Klingons...
GIST, Inc.         |                                -- Mr. Spock, Star Trek V
1800 Woodfield Dr. | Pay attention to what I say, and you might start a trend.
Savoy, IL 61874	   | ARPANET: joe%gistdev@uxc.cso.uiuc.edu
(217) 352-1165	   | UUCP   : {uunet,pur-ee,convex}!uiucuxc!gistdev!joe
-------------------------------------------------------------------------------

maart@cs.vu.nl (Maarten Litmaath) (06/23/89)

joe@gistdev.UUCP writes:
\...  Suppose I am writing a function that is
\going to construct a character string, and is going to return a pointer to
\that string.  What is the best way to do this so that your pointer is sure
\to be valid when used?  I have seen several approaches to this problem:
\
\    .	Have the caller pass a (char *) and let the caller worry about
\	allocating whatever space is needed.

That's the way, I tell thee! But who am I, since this macro business?

\    .	Have the routine malloc() space, and let the caller free() it when
\	done with the returned pointer.

In general you want to deal with the memory all on the same level.
It simplifies administration.

\    .	Have the routine allocate the buffer pointed to by the returned
\	(char *) as a static.

In general: NO! Consider routines like getpwent(): if you want to keep the
info, you have to copy it yourself, doubling the work. I say: if the caller
wants a static buffer, let HIM do the arrangements. He's quite competent.

\    .	Assume it's the caller's problem to strcpy() (or other such) from the
\	pointer before something else can use the space.

That's precisely what you want to avoid: HOW can you be SURE some other
(low-level) routine doesn't invoke the function too, thereby destroying YOUR
data? Consider something like printf() invoking malloc(). (I KNOW this isn't
a very good example.)

\    .	Don't worry about it at all -- nothing is going to trash your memory
\	at the pointed-to address before you can actually use it.

Huh?
-- 
"I HATE arbitrary limits, especially when |Maarten Litmaath @ VU Amsterdam:
   they're small."  (Stephen Savitzky)    |maart@cs.vu.nl, mcvax!botter!maart

mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/23/89)

In article <7800013@gistdev>, joe@gistdev.UUCP writes:
> Here is a question I haven't seen recently, and I'd like to get opinions from
> the collective wisdom of the group.  Suppose I am writing a function that is
> going to construct a character string, and is going to return a pointer to
> that string.  What is the best way to do this so that your pointer is sure
> to be valid when used?  I have seen several approaches to this problem:
	I don't know if I qualify as collective wisdom, but here's my opinion.

>     .	Have the caller pass a (char *) and let the caller worry about
> 	allocating whatever space is needed.
Bad.  In general, the caller knows little about the expected size, so must
pass a large array.  Also, if the thing overflows, the callee has no way of
knowing unless you add args to describe the size, which is ugly.  Even then,
there is no sane thing which can be done on overflow, since the callee doesn't know where the array is coming from.

>     .	Have the routine malloc() space, and let the caller free() it when
> 	done with the returned pointer.
Bad.  Lots of times the caller doesn't need the space malloc'd, and it's
a big pain to remember what to free (so much so that many many people will
forget to free it).

>     .	Have the routine allocate the buffer pointed to by the returned
> 	(char *) as a static.
AND
>     .	Assume it's the caller's problem to strcpy() (or other such) from the
> 	pointer before something else can use the space.
if they need to.  I usually do this, where I can.

Of course, my opinions are my own...

Mike Lindner
attunix!mpl
AT&T Bell Laboratories
190 River Rd.
Summit, NJ 07901

chris@mimsy.UUCP (Chris Torek) (06/23/89)

In article <7800013@gistdev> joe@gistdev.UUCP writes:
>... Suppose I am writing a function that is going to construct a
>character string, and is going to return a pointer to that string.
>What is the best way to do this so that your pointer is sure
>to be valid when used?

What you are asking is not `How does one go about returning an object
of type pointer-to-char?', but rather `Where should one allocate space
for the characters?'.  This question does not have a single best answer;
there is not sufficient information here to choose one.  Most of the
approaches you listed are reasonable in some contexts, although if I
am right in my interpretation of this one:

>    .	Don't worry about it at all -- nothing is going to trash your memory
>	at the pointed-to address before you can actually use it.

it is a bad idea.  (My interpretation is that you mean something like

	char *fn() {
		char buf[SIZE];		/* but this is automatic storage */
		...
		return (buf);
	}

This approach is particularly dangerous precisely because it often
works.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

bill@twwells.com (T. William Wells) (06/23/89)

In article <2793@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
: joe@gistdev.UUCP writes:
: \...  Suppose I am writing a function that is
: \going to construct a character string, and is going to return a pointer to
: \that string.  What is the best way to do this so that your pointer is sure
: \to be valid when used?  I have seen several approaches to this problem:
: \
: \    .        Have the caller pass a (char *) and let the caller worry about
: \     allocating whatever space is needed.
:
: That's the way, I tell thee! But who am I, since this macro business?
:
: \    .        Have the routine malloc() space, and let the caller free() it when
: \     done with the returned pointer.
:
: In general you want to deal with the memory all on the same level.
: It simplifies administration.

No. In this kind of thing, it makes life much more complex. The
fundamental problem is this: if the caller makes the allocation
decisions, the caller may well be wrong. That involves complex error
recovery, or it is equivalent to fixed buffer sizes (as far as the
called routine is concerned).

The caller can not do the allocation, not if you want good code; the
called function must do the allocation. Let's consider what this
means. A not atypical function might be one that reads a string from
a file and returns the string in a buffer.

The simple method looks like this:

char *                          /* it gets allocated by mygets */
mygets(stream)
FILE    *stream;
{
}

	ptr = mygets(stdin);
	...
	free(ptr);

But this has several drawbacks: one is that the caller may well fail
to free the pointer, causing allocated memory to grow overmuch, and
there is the excessive number of malloc and free calls it requires,
another is that the caller can't do things like extend the string
without always doing yet another malloc, even though mygets may well
have allocated more space than needed.

A better method would be something like:

typedef struct XSTRING {
	char    *_xs_string;    /* pointer to the string */
	size_t  _xs_length;     /* bytes in the string */
	size_t  _xs_alloc;      /* allocate length, may be > _xs_length */
	int     _xs_ahint;      /* suggests method of extending the string */
} XSTRING;

(Why the underscores? So that macros like xs_string can be written to
access the structure members.)

int                             /* error result */
xs_gets(stream, xstring)
FILE    *stream;
XSTRING *xstring;

The caller would create an XSTRING by calling an xs_new function and
do all his string work with it. Repeated calls to xs_gets can use the
same XSTRING; there is no requirement to free the string after each
use. When one is done, one would call a dispose function for the
XSTRING; it would be responsible for getting rid of the XSTRING and
the associated string.

To make this really valuable, one should also have xs_* functions to
provide the functionality of the other functions in the C library
which return variable length strings.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

steve@umigw.MIAMI.EDU (steve emmerson) (06/23/89)

In article <18234@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>  This question does not have a single best answer;
>there is not sufficient information here to choose one.

Chris is right: without some criteria, it is difficult to choose one
method over another.

I believe you did, however, mention a safety criterion.  In that case,
the _safest_ method is probably to allocate the memory in the called
routine as you can be certain of valid storage.  As someone pointed out,
however, this can lead to allocation and deallocation calls at
different levels.  It can also lead to clutter, if you forget to
deallocate.

A close second in safety, and one which reminds the user to deallocate,
is to have the caller instantiate the buffer.

Both these methods have wide usage.  It's your call.
-- 
Steve Emmerson                     Inet: steve@umigw.miami.edu [128.116.10.1]
SPAN: miami::emmerson (host 3074::)      emmerson%miami.span@star.stanford.edu
UUCP: ...!ncar!umigw!steve               emmerson%miami.span@vlsi.jpl.nasa.gov
"Computers are like God in the Old Testament: lots of rules and no mercy"

joe@gistdev.UUCP (06/23/89)

Thanks for all the responses to my posting on the best way to return (char *)
from a function!  By the way, I was asked about the ridiculous "don't worry
about the pointer being valid" option.  I simply provided that for contrast,
but I _have_ seen that type of thing in some bad programs.  Anyway, if anyone
else would like to contribute their $0.02 worth, send e-mail or reply, and I
will sumarize if there is interest.

-------------------------------------------------------------------------------
Joe Brownlee       | Captain, please -- not in front of the Klingons.
GIST, Inc.         |                                -- Mr. Spock, Star Trek V
1800 Woodfield Dr. | Pay attention to what I say, and you might start a trend.
Savoy, IL 61874	   | ARPANET: joe%gistdev@uxc.cso.uiuc.edu
(217) 352-1165	   | UUCP   : {uunet,pur-ee,convex}!uiucuxc!gistdev!joe
-------------------------------------------------------------------------------

henry@utzoo.uucp (Henry Spencer) (06/24/89)

In article <2793@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>\    .	Have the routine malloc() space, and let the caller free() it when
>\	done with the returned pointer.
>
>In general you want to deal with the memory all on the same level.
>It simplifies administration.

Untidy though this approach is, it's often the best -- it alone avoids
setting arbitrary bounds on the size of the returned value.  (Of course,
there are situations where the size of the returned value is inherently
bounded...)  The penalties are some loss of efficiency -- malloc and
free take time -- and a management hassle.

If you want to combine high speed and unbounded returned values and are
willing to commit unspeakable acts to do it :-), have the caller pass in
a buffer (and its size!) which is *usually* big enough, and have the
function return either that buffer or (if it's not large enough) malloced
memory.  This avoids the malloc overhead most of the time and still lets
values be of unlimited size.  It's definitely a nuisance to manage, though.
-- 
NASA is to spaceflight as the  |     Henry Spencer at U of Toronto Zoology
US government is to freedom.   | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

hascall@atanasoff.cs.iastate.edu (John Hascall) (06/24/89)

In article <894@cbnewsl.ATT.COM> mpl@cbnewsl.ATT.COM (michael.p.lindner) writes:
>In article <7800013@gistdev>, joe@gistdev.UUCP writes:
>> Here is a question I haven't seen recently, and I'd like to get opinions from
>> the collective wisdom of the group.  Suppose I am writing a function that is
>> going to construct a character string, and is going to return a pointer to
>> that string.  What is the best way to do this so that your pointer is sure
>> to be valid when used?  I have seen several approaches to this problem:
 
>>     .	Have the routine allocate the buffer pointed to by the returned
>> 	(char *) as a static.
>AND
>>     .	Assume it's the caller's problem to strcpy() (or other such) from the
>> 	pointer before something else can use the space.
>if they need to.  I usually do this, where I can.
 
   Yuck.  Routines which are not re-entrant are IMHO a "bad thing".

   Too often you want to do some thing like:

	 foo( bar(3), bar(4) );

   and if bar uses static storage for its result you are scr*wed.

   Unfortunately, the solution looks a bit untidy:

	 for( bar(ptr1, 3), bar(ptr2, 4) );


 John Hascall  /  ISU Comp Center  /  Ames, IA

kevin@claris.com (Kevin Watts) (06/25/89)

From article <1989Jun23.170749.23253@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> If you want to combine high speed and unbounded returned values and are
> willing to commit unspeakable acts to do it :-), have the caller pass in
> a buffer (and its size!) which is *usually* big enough, and have the
> function return either that buffer or (if it's not large enough) malloced
> memory.  This avoids the malloc overhead most of the time and still lets
> values be of unlimited size.  It's definitely a nuisance to manage, though.

A similar approach, used by Apple in their upcoming GS/OS is as follows:
On input, pass a buffer with its length (including the space for the length)
in the first word (one could use a long instead).  If the buffer is large
enough, the routine uses it, otherwise the routine returns an error code and
puts the size of the buffer it needs into the _second_ word of the buffer.
The caller is supposed to allocate a buffer of that size and call the routine
again, thus:
	char buffer[20];
	*(int *) buffer = 20;
	int size;

	if (the_routine(buffer) = ERROR) {
		size = *(int *)(buffer+2);
		new_buffer = malloc(size); /* should check for error here */
		*(int *) buffer = size;
		the_routine(new_buffer);
	}
I'm not convinced that this is a clean solution, but it does keep the memory
allocation all at the same level and will work for any size (up to 64K in this
case).


My preference is to use C++ which allows safe allocation and deallocation at
different levels, but this is the wrong group for that.

The other alternative which comes to mind is to use a garbage collector.
-- 
 Kevin Watts        ! Any opinions expressed here are my own, and are not
 Claris Corporation ! neccessarily shared by anyone else.  Unless they are
 kevin@claris.com   ! patently absurd, in which case they're not mine either.

gwyn@smoke.BRL.MIL (Doug Gwyn) (06/25/89)

In article <10345@claris.com> kevin@claris.com (Kevin Watts) writes:
>	char buffer[20];
>	*(int *) buffer = 20;

Of course, this fails on a word-oriented architecture when the buffer
happens not to be properly aligned.

diamond@diamond.csl.sony.junet (Norman Diamond) (06/26/89)

Someone:

>>>    .	Have the routine malloc() space, and let the caller free() it when
>>>	done with the returned pointer.

In article <2793@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:

>>In general you want to deal with the memory all on the same level.
>>It simplifies administration.

In article <1989Jun23.170749.23253@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

>Untidy though this approach is, it's often the best -- it alone avoids
>setting arbitrary bounds on the size of the returned value.

It's often necessary.  But there ARE other solutions.

>If you want to combine high speed and unbounded returned values and are
>willing to commit unspeakable acts to do it :-), have the caller pass in
>a buffer (and its size!) which is *usually* big enough, and have the
>function return either that buffer or (if it's not large enough) malloced
>memory.

That's not so unspeakable.  Compare it with one more solution, where the
caller remains responsible for the memory AND there are no arbitrary
bounds.  Have the function return either that buffer or (if it's not
large enough) a size of buffer that must be supplied by the caller next
time in order to get the complete result.  Or even more unspeakable,
simply advise the caller that the buffer wasn't big enough, but the
caller should keep the partial result obtained because it won't be
included in the result of the next call, and the total necessary length
isn't even known yet.  Now, would any Unix I/O routine be so unspeakable?

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net)
 The above opinions are claimed by your machine's init process (pid 1), after
 being disowned and orphaned.  However, if you see this at Waterloo, Stanford,
 or Anterior, then their administrators must have approved of these opinions.

ftw@masscomp.UUCP (Farrell Woods) (06/26/89)

In article <7800013@gistdev> joe@gistdev.UUCP writes:

>What is the best way to [construct a string and return a pointer to it] so
>that your pointer is sure
>to be valid when used?  I have seen several approaches to this problem:

>    .	Have the caller pass a (char *) and let the caller worry about
>	allocating whatever space is needed.

You will have to guess how much space you need for the string before you
call the function to build it.  This is not much different from...

>    .	Have the routine allocate the buffer pointed to by the returned
>	(char *) as a static.

...except here you make your guess at compile-time instead of run-time.
Both of these could violate Henry Spencer's Fifth Commandment.

>    .	Have the routine malloc() space, and let the caller free() it when
>	done with the returned pointer.

I assumed that the knowledge of how much space the string requires is
contained within the function building the string.  If the knowledge is more
global, then the caller could take care malloc/free instead of the callee
doing the malloc and the caller doing the free.  I guess it depends on
whether or not you want to break the malloc/free across two functions.

>    .	Assume it's the caller's problem to strcpy() (or other such) from the
>	pointer before something else can use the space.

Bad assumption.  You might get away with it sometimes, but then it also might
crash in some unfathomable way.

>    .	Don't worry about it at all -- nothing is going to trash your memory
>	at the pointed-to address before you can actually use it.

This is simply asking for trouble.  Can you say "interrupt"?
-- 
Farrell T. Woods				Voice:  (508) 392-2471
Concurrent Computer Corporation			Domain: ftw@masscomp.com
1 Technology Way				uucp:   {backbones}!masscomp!ftw
Westford, MA 01886				OS/2:   Half an operating system

awd@dbase.UUCP (Alastair Dallas) (06/28/89)

I have two favorite "places to store the characters" from the list
given by the original poster.

1) Let the caller allocate storage and pass a pointer is a sure winner.
	void func(char *)

2) Let the function malloc() storage which the caller then frees.
	char *func()

Other options are just too dangerous on a large multi-programmer project,
in my opinion.  There are some pitfalls even with these two.  Many 
functions are written such that a pointer is expected (#1), but a 0
passed in its place implies method #2.  Thus, #1 becomes:

	char *func(char *)

where the original argument is returned if != 0.  The problem here
is that some callers will malloc() and others will pass 0 and both
results should be freed.  However, still other callers will pass
static and automatic arrays, which must not be freed.  And, finally,
how can the function protect itself from "bad" (.e.g <0) pointers?
At least method #2 keeps the function in control.

/alastair/

scs@envy.pika.mit.edu (Steve Summit) (07/02/89)

In article <7800013@gistdev> joe@gistdev.UUCP lists several
common ways of implementing routines which return strings.
One technique I haven't seen mentioned is to implement a routine
which returns a pointer to not one, but to one of several static
areas.

A perfect example is a routine which generates a string
representation of an internal encoding, for use when generating
human-readable printouts.  For instance, in a C language
processor, an error must be generated when two types are
incompatible for a binary or casting operator.  I have used
code like the following:

	extern char *printtype(struct type *);
	extern void error(char *, ...);

	/* cast rhs to type of lhs, before assignment */

	if(rhs->type incompatible with lhs->type) {
		error("can't assign %s to %s", printtype(rhs->type),
							printtype(lhs->type));
		return ERROR;
	}

	cast(rhs, lhs->type);

	/* now do assignment... */

where printtype takes an internal structure describing a C type
and returns a string like "pointer to function returning int".
The code for printtype() looks something like

	#define NRETBUF 3
	#define MAXLEN 100

	char *
	printtype(type)
	struct type type;
	{
	static char retbufs[NRETBUF][MAXLEN];
	static int retbufi = 0;
	char *retbuf;

	retbuf = retbufs[retbufi];

	/* build descriptive string in retbuf */

	retbufi = (retbufi + 1) % NRETBUF;

	return retbuf;
	}

I agree with previous posters that general-purpose routines
should dynamically allocate the returned buffer, deallocation
difficulties notwithstanding.  For little special-purpose utility
routines, however, such as the output format helper above, any
handling of return buffer allocation by the caller would be
inconvenient (and might therefore make it less likely that good
error messages would be generated).  The multiple return buffer
trick is useful in these cases, since they are often the ones
(i.e. multiple calls within one printf) where the overwriting of
a single buffer would be a problem.  (Circumloqutions like

	error("can't assign %s ", printtype(rhs->type));
	error(" to %s", printtype(lhs->type));

are ugly, and don't work well if the error routine adds file or
line number information with each invocation.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu

scs@envy.pika.mit.edu (Steve Summit) (07/02/89)

In article <10345@claris.com> kevin@claris.com (Kevin Watts) writes:
>A similar approach, used by Apple in their upcoming GS/OS is as follows:
>On input, pass a buffer with its length (including the space for the length)
>in the first word (one could use a long instead).  If the buffer is large
>enough, the routine uses it, otherwise the routine returns an error code and
>puts the size of the buffer it needs into the _second_ word of the buffer.

Yuck.  This sort of type punning within arrays always seems to me
to be a holdover from assembly language days, before we had
record structures.  If you want an aggregate consisting of a
buffer size, a required length, and a buffer, use a real
structure; don't try to "assemble one out of spare parts."  (I am
aware that implementing variable-sized structures can be tricky,
perhaps requiring an extra level of indirection, which is
probably one reason punned array techniques remain popular.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu

bet@orion.mc.duke.edu (Bennett Todd) (07/12/89)

As has been said, in general the best mechanism to use depends on other
features of the problem at hand. Here's a specific solution that I liked
for a particular problem.

I wanted to be able to easily loop along line at a time through a file,
without having to worry about maximum line lengths. So I wrote a routine
getline(3b):

	char *getline(char *, FILE *);

which might be used like this:

	char *line = NULL;
	...
	while (line = getline(line, fp)) {
		/* process the line */
	}

getline(3b) allocates the line buffer if it is passed in NULL for a
buffer pointer; on EOF it frees the buffer and returns NULL. It actually
malloc's a 2^n byte long buffer, stores n in the first element, and the
string starting in the second element, and returns a pointer to the
second element. That way it knows the length, and can realloc as
necessary to avoid overfilling the buffer.

-Bennett
bet@orion.mc.duke.edu