joe@gistdev.UUCP (06/20/89)
Here is a question I haven't seen recently, and I'd like to get opinions from the collective wisdom of the group. Suppose I am writing a function that is going to construct a character string, and is going to return a pointer to that string. What is the best way to do this so that your pointer is sure to be valid when used? I have seen several approaches to this problem: . Have the caller pass a (char *) and let the caller worry about allocating whatever space is needed. . Have the routine malloc() space, and let the caller free() it when done with the returned pointer. . Have the routine allocate the buffer pointed to by the returned (char *) as a static. . Assume it's the caller's problem to strcpy() (or other such) from the pointer before something else can use the space. . Don't worry about it at all -- nothing is going to trash your memory at the pointed-to address before you can actually use it. I'm sure there are other approaches, but these were the ones I could think of off the top of my head. In general, how _should_ this be done to be safest? ------------------------------------------------------------------------------- Joe Brownlee | Captain, please -- not in front of the Klingons... GIST, Inc. | -- Mr. Spock, Star Trek V 1800 Woodfield Dr. | Pay attention to what I say, and you might start a trend. Savoy, IL 61874 | ARPANET: joe%gistdev@uxc.cso.uiuc.edu (217) 352-1165 | UUCP : {uunet,pur-ee,convex}!uiucuxc!gistdev!joe -------------------------------------------------------------------------------
maart@cs.vu.nl (Maarten Litmaath) (06/23/89)
joe@gistdev.UUCP writes:
\... Suppose I am writing a function that is
\going to construct a character string, and is going to return a pointer to
\that string. What is the best way to do this so that your pointer is sure
\to be valid when used? I have seen several approaches to this problem:
\
\ . Have the caller pass a (char *) and let the caller worry about
\ allocating whatever space is needed.
That's the way, I tell thee! But who am I, since this macro business?
\ . Have the routine malloc() space, and let the caller free() it when
\ done with the returned pointer.
In general you want to deal with the memory all on the same level.
It simplifies administration.
\ . Have the routine allocate the buffer pointed to by the returned
\ (char *) as a static.
In general: NO! Consider routines like getpwent(): if you want to keep the
info, you have to copy it yourself, doubling the work. I say: if the caller
wants a static buffer, let HIM do the arrangements. He's quite competent.
\ . Assume it's the caller's problem to strcpy() (or other such) from the
\ pointer before something else can use the space.
That's precisely what you want to avoid: HOW can you be SURE some other
(low-level) routine doesn't invoke the function too, thereby destroying YOUR
data? Consider something like printf() invoking malloc(). (I KNOW this isn't
a very good example.)
\ . Don't worry about it at all -- nothing is going to trash your memory
\ at the pointed-to address before you can actually use it.
Huh?
--
"I HATE arbitrary limits, especially when |Maarten Litmaath @ VU Amsterdam:
they're small." (Stephen Savitzky) |maart@cs.vu.nl, mcvax!botter!maart
mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/23/89)
In article <7800013@gistdev>, joe@gistdev.UUCP writes: > Here is a question I haven't seen recently, and I'd like to get opinions from > the collective wisdom of the group. Suppose I am writing a function that is > going to construct a character string, and is going to return a pointer to > that string. What is the best way to do this so that your pointer is sure > to be valid when used? I have seen several approaches to this problem: I don't know if I qualify as collective wisdom, but here's my opinion. > . Have the caller pass a (char *) and let the caller worry about > allocating whatever space is needed. Bad. In general, the caller knows little about the expected size, so must pass a large array. Also, if the thing overflows, the callee has no way of knowing unless you add args to describe the size, which is ugly. Even then, there is no sane thing which can be done on overflow, since the callee doesn't know where the array is coming from. > . Have the routine malloc() space, and let the caller free() it when > done with the returned pointer. Bad. Lots of times the caller doesn't need the space malloc'd, and it's a big pain to remember what to free (so much so that many many people will forget to free it). > . Have the routine allocate the buffer pointed to by the returned > (char *) as a static. AND > . Assume it's the caller's problem to strcpy() (or other such) from the > pointer before something else can use the space. if they need to. I usually do this, where I can. Of course, my opinions are my own... Mike Lindner attunix!mpl AT&T Bell Laboratories 190 River Rd. Summit, NJ 07901
chris@mimsy.UUCP (Chris Torek) (06/23/89)
In article <7800013@gistdev> joe@gistdev.UUCP writes: >... Suppose I am writing a function that is going to construct a >character string, and is going to return a pointer to that string. >What is the best way to do this so that your pointer is sure >to be valid when used? What you are asking is not `How does one go about returning an object of type pointer-to-char?', but rather `Where should one allocate space for the characters?'. This question does not have a single best answer; there is not sufficient information here to choose one. Most of the approaches you listed are reasonable in some contexts, although if I am right in my interpretation of this one: > . Don't worry about it at all -- nothing is going to trash your memory > at the pointed-to address before you can actually use it. it is a bad idea. (My interpretation is that you mean something like char *fn() { char buf[SIZE]; /* but this is automatic storage */ ... return (buf); } This approach is particularly dangerous precisely because it often works.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
bill@twwells.com (T. William Wells) (06/23/89)
In article <2793@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: : joe@gistdev.UUCP writes: : \... Suppose I am writing a function that is : \going to construct a character string, and is going to return a pointer to : \that string. What is the best way to do this so that your pointer is sure : \to be valid when used? I have seen several approaches to this problem: : \ : \ . Have the caller pass a (char *) and let the caller worry about : \ allocating whatever space is needed. : : That's the way, I tell thee! But who am I, since this macro business? : : \ . Have the routine malloc() space, and let the caller free() it when : \ done with the returned pointer. : : In general you want to deal with the memory all on the same level. : It simplifies administration. No. In this kind of thing, it makes life much more complex. The fundamental problem is this: if the caller makes the allocation decisions, the caller may well be wrong. That involves complex error recovery, or it is equivalent to fixed buffer sizes (as far as the called routine is concerned). The caller can not do the allocation, not if you want good code; the called function must do the allocation. Let's consider what this means. A not atypical function might be one that reads a string from a file and returns the string in a buffer. The simple method looks like this: char * /* it gets allocated by mygets */ mygets(stream) FILE *stream; { } ptr = mygets(stdin); ... free(ptr); But this has several drawbacks: one is that the caller may well fail to free the pointer, causing allocated memory to grow overmuch, and there is the excessive number of malloc and free calls it requires, another is that the caller can't do things like extend the string without always doing yet another malloc, even though mygets may well have allocated more space than needed. A better method would be something like: typedef struct XSTRING { char *_xs_string; /* pointer to the string */ size_t _xs_length; /* bytes in the string */ size_t _xs_alloc; /* allocate length, may be > _xs_length */ int _xs_ahint; /* suggests method of extending the string */ } XSTRING; (Why the underscores? So that macros like xs_string can be written to access the structure members.) int /* error result */ xs_gets(stream, xstring) FILE *stream; XSTRING *xstring; The caller would create an XSTRING by calling an xs_new function and do all his string work with it. Repeated calls to xs_gets can use the same XSTRING; there is no requirement to free the string after each use. When one is done, one would call a dispose function for the XSTRING; it would be responsible for getting rid of the XSTRING and the associated string. To make this really valuable, one should also have xs_* functions to provide the functionality of the other functions in the C library which return variable length strings. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
steve@umigw.MIAMI.EDU (steve emmerson) (06/23/89)
In article <18234@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: > This question does not have a single best answer; >there is not sufficient information here to choose one. Chris is right: without some criteria, it is difficult to choose one method over another. I believe you did, however, mention a safety criterion. In that case, the _safest_ method is probably to allocate the memory in the called routine as you can be certain of valid storage. As someone pointed out, however, this can lead to allocation and deallocation calls at different levels. It can also lead to clutter, if you forget to deallocate. A close second in safety, and one which reminds the user to deallocate, is to have the caller instantiate the buffer. Both these methods have wide usage. It's your call. -- Steve Emmerson Inet: steve@umigw.miami.edu [128.116.10.1] SPAN: miami::emmerson (host 3074::) emmerson%miami.span@star.stanford.edu UUCP: ...!ncar!umigw!steve emmerson%miami.span@vlsi.jpl.nasa.gov "Computers are like God in the Old Testament: lots of rules and no mercy"
joe@gistdev.UUCP (06/23/89)
Thanks for all the responses to my posting on the best way to return (char *) from a function! By the way, I was asked about the ridiculous "don't worry about the pointer being valid" option. I simply provided that for contrast, but I _have_ seen that type of thing in some bad programs. Anyway, if anyone else would like to contribute their $0.02 worth, send e-mail or reply, and I will sumarize if there is interest. ------------------------------------------------------------------------------- Joe Brownlee | Captain, please -- not in front of the Klingons. GIST, Inc. | -- Mr. Spock, Star Trek V 1800 Woodfield Dr. | Pay attention to what I say, and you might start a trend. Savoy, IL 61874 | ARPANET: joe%gistdev@uxc.cso.uiuc.edu (217) 352-1165 | UUCP : {uunet,pur-ee,convex}!uiucuxc!gistdev!joe -------------------------------------------------------------------------------
henry@utzoo.uucp (Henry Spencer) (06/24/89)
In article <2793@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >\ . Have the routine malloc() space, and let the caller free() it when >\ done with the returned pointer. > >In general you want to deal with the memory all on the same level. >It simplifies administration. Untidy though this approach is, it's often the best -- it alone avoids setting arbitrary bounds on the size of the returned value. (Of course, there are situations where the size of the returned value is inherently bounded...) The penalties are some loss of efficiency -- malloc and free take time -- and a management hassle. If you want to combine high speed and unbounded returned values and are willing to commit unspeakable acts to do it :-), have the caller pass in a buffer (and its size!) which is *usually* big enough, and have the function return either that buffer or (if it's not large enough) malloced memory. This avoids the malloc overhead most of the time and still lets values be of unlimited size. It's definitely a nuisance to manage, though. -- NASA is to spaceflight as the | Henry Spencer at U of Toronto Zoology US government is to freedom. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
hascall@atanasoff.cs.iastate.edu (John Hascall) (06/24/89)
In article <894@cbnewsl.ATT.COM> mpl@cbnewsl.ATT.COM (michael.p.lindner) writes: >In article <7800013@gistdev>, joe@gistdev.UUCP writes: >> Here is a question I haven't seen recently, and I'd like to get opinions from >> the collective wisdom of the group. Suppose I am writing a function that is >> going to construct a character string, and is going to return a pointer to >> that string. What is the best way to do this so that your pointer is sure >> to be valid when used? I have seen several approaches to this problem: >> . Have the routine allocate the buffer pointed to by the returned >> (char *) as a static. >AND >> . Assume it's the caller's problem to strcpy() (or other such) from the >> pointer before something else can use the space. >if they need to. I usually do this, where I can. Yuck. Routines which are not re-entrant are IMHO a "bad thing". Too often you want to do some thing like: foo( bar(3), bar(4) ); and if bar uses static storage for its result you are scr*wed. Unfortunately, the solution looks a bit untidy: for( bar(ptr1, 3), bar(ptr2, 4) ); John Hascall / ISU Comp Center / Ames, IA
kevin@claris.com (Kevin Watts) (06/25/89)
From article <1989Jun23.170749.23253@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > If you want to combine high speed and unbounded returned values and are > willing to commit unspeakable acts to do it :-), have the caller pass in > a buffer (and its size!) which is *usually* big enough, and have the > function return either that buffer or (if it's not large enough) malloced > memory. This avoids the malloc overhead most of the time and still lets > values be of unlimited size. It's definitely a nuisance to manage, though. A similar approach, used by Apple in their upcoming GS/OS is as follows: On input, pass a buffer with its length (including the space for the length) in the first word (one could use a long instead). If the buffer is large enough, the routine uses it, otherwise the routine returns an error code and puts the size of the buffer it needs into the _second_ word of the buffer. The caller is supposed to allocate a buffer of that size and call the routine again, thus: char buffer[20]; *(int *) buffer = 20; int size; if (the_routine(buffer) = ERROR) { size = *(int *)(buffer+2); new_buffer = malloc(size); /* should check for error here */ *(int *) buffer = size; the_routine(new_buffer); } I'm not convinced that this is a clean solution, but it does keep the memory allocation all at the same level and will work for any size (up to 64K in this case). My preference is to use C++ which allows safe allocation and deallocation at different levels, but this is the wrong group for that. The other alternative which comes to mind is to use a garbage collector. -- Kevin Watts ! Any opinions expressed here are my own, and are not Claris Corporation ! neccessarily shared by anyone else. Unless they are kevin@claris.com ! patently absurd, in which case they're not mine either.
gwyn@smoke.BRL.MIL (Doug Gwyn) (06/25/89)
In article <10345@claris.com> kevin@claris.com (Kevin Watts) writes: > char buffer[20]; > *(int *) buffer = 20; Of course, this fails on a word-oriented architecture when the buffer happens not to be properly aligned.
diamond@diamond.csl.sony.junet (Norman Diamond) (06/26/89)
Someone: >>> . Have the routine malloc() space, and let the caller free() it when >>> done with the returned pointer. In article <2793@solo8.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: >>In general you want to deal with the memory all on the same level. >>It simplifies administration. In article <1989Jun23.170749.23253@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >Untidy though this approach is, it's often the best -- it alone avoids >setting arbitrary bounds on the size of the returned value. It's often necessary. But there ARE other solutions. >If you want to combine high speed and unbounded returned values and are >willing to commit unspeakable acts to do it :-), have the caller pass in >a buffer (and its size!) which is *usually* big enough, and have the >function return either that buffer or (if it's not large enough) malloced >memory. That's not so unspeakable. Compare it with one more solution, where the caller remains responsible for the memory AND there are no arbitrary bounds. Have the function return either that buffer or (if it's not large enough) a size of buffer that must be supplied by the caller next time in order to get the complete result. Or even more unspeakable, simply advise the caller that the buffer wasn't big enough, but the caller should keep the partial result obtained because it won't be included in the result of the next call, and the total necessary length isn't even known yet. Now, would any Unix I/O routine be so unspeakable? -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net) The above opinions are claimed by your machine's init process (pid 1), after being disowned and orphaned. However, if you see this at Waterloo, Stanford, or Anterior, then their administrators must have approved of these opinions.
ftw@masscomp.UUCP (Farrell Woods) (06/26/89)
In article <7800013@gistdev> joe@gistdev.UUCP writes: >What is the best way to [construct a string and return a pointer to it] so >that your pointer is sure >to be valid when used? I have seen several approaches to this problem: > . Have the caller pass a (char *) and let the caller worry about > allocating whatever space is needed. You will have to guess how much space you need for the string before you call the function to build it. This is not much different from... > . Have the routine allocate the buffer pointed to by the returned > (char *) as a static. ...except here you make your guess at compile-time instead of run-time. Both of these could violate Henry Spencer's Fifth Commandment. > . Have the routine malloc() space, and let the caller free() it when > done with the returned pointer. I assumed that the knowledge of how much space the string requires is contained within the function building the string. If the knowledge is more global, then the caller could take care malloc/free instead of the callee doing the malloc and the caller doing the free. I guess it depends on whether or not you want to break the malloc/free across two functions. > . Assume it's the caller's problem to strcpy() (or other such) from the > pointer before something else can use the space. Bad assumption. You might get away with it sometimes, but then it also might crash in some unfathomable way. > . Don't worry about it at all -- nothing is going to trash your memory > at the pointed-to address before you can actually use it. This is simply asking for trouble. Can you say "interrupt"? -- Farrell T. Woods Voice: (508) 392-2471 Concurrent Computer Corporation Domain: ftw@masscomp.com 1 Technology Way uucp: {backbones}!masscomp!ftw Westford, MA 01886 OS/2: Half an operating system
awd@dbase.UUCP (Alastair Dallas) (06/28/89)
I have two favorite "places to store the characters" from the list given by the original poster. 1) Let the caller allocate storage and pass a pointer is a sure winner. void func(char *) 2) Let the function malloc() storage which the caller then frees. char *func() Other options are just too dangerous on a large multi-programmer project, in my opinion. There are some pitfalls even with these two. Many functions are written such that a pointer is expected (#1), but a 0 passed in its place implies method #2. Thus, #1 becomes: char *func(char *) where the original argument is returned if != 0. The problem here is that some callers will malloc() and others will pass 0 and both results should be freed. However, still other callers will pass static and automatic arrays, which must not be freed. And, finally, how can the function protect itself from "bad" (.e.g <0) pointers? At least method #2 keeps the function in control. /alastair/
scs@envy.pika.mit.edu (Steve Summit) (07/02/89)
In article <7800013@gistdev> joe@gistdev.UUCP lists several common ways of implementing routines which return strings. One technique I haven't seen mentioned is to implement a routine which returns a pointer to not one, but to one of several static areas. A perfect example is a routine which generates a string representation of an internal encoding, for use when generating human-readable printouts. For instance, in a C language processor, an error must be generated when two types are incompatible for a binary or casting operator. I have used code like the following: extern char *printtype(struct type *); extern void error(char *, ...); /* cast rhs to type of lhs, before assignment */ if(rhs->type incompatible with lhs->type) { error("can't assign %s to %s", printtype(rhs->type), printtype(lhs->type)); return ERROR; } cast(rhs, lhs->type); /* now do assignment... */ where printtype takes an internal structure describing a C type and returns a string like "pointer to function returning int". The code for printtype() looks something like #define NRETBUF 3 #define MAXLEN 100 char * printtype(type) struct type type; { static char retbufs[NRETBUF][MAXLEN]; static int retbufi = 0; char *retbuf; retbuf = retbufs[retbufi]; /* build descriptive string in retbuf */ retbufi = (retbufi + 1) % NRETBUF; return retbuf; } I agree with previous posters that general-purpose routines should dynamically allocate the returned buffer, deallocation difficulties notwithstanding. For little special-purpose utility routines, however, such as the output format helper above, any handling of return buffer allocation by the caller would be inconvenient (and might therefore make it less likely that good error messages would be generated). The multiple return buffer trick is useful in these cases, since they are often the ones (i.e. multiple calls within one printf) where the overwriting of a single buffer would be a problem. (Circumloqutions like error("can't assign %s ", printtype(rhs->type)); error(" to %s", printtype(lhs->type)); are ugly, and don't work well if the error routine adds file or line number information with each invocation.) Steve Summit scs@adam.pika.mit.edu
scs@envy.pika.mit.edu (Steve Summit) (07/02/89)
In article <10345@claris.com> kevin@claris.com (Kevin Watts) writes: >A similar approach, used by Apple in their upcoming GS/OS is as follows: >On input, pass a buffer with its length (including the space for the length) >in the first word (one could use a long instead). If the buffer is large >enough, the routine uses it, otherwise the routine returns an error code and >puts the size of the buffer it needs into the _second_ word of the buffer. Yuck. This sort of type punning within arrays always seems to me to be a holdover from assembly language days, before we had record structures. If you want an aggregate consisting of a buffer size, a required length, and a buffer, use a real structure; don't try to "assemble one out of spare parts." (I am aware that implementing variable-sized structures can be tricky, perhaps requiring an extra level of indirection, which is probably one reason punned array techniques remain popular.) Steve Summit scs@adam.pika.mit.edu
bet@orion.mc.duke.edu (Bennett Todd) (07/12/89)
As has been said, in general the best mechanism to use depends on other
features of the problem at hand. Here's a specific solution that I liked
for a particular problem.
I wanted to be able to easily loop along line at a time through a file,
without having to worry about maximum line lengths. So I wrote a routine
getline(3b):
char *getline(char *, FILE *);
which might be used like this:
char *line = NULL;
...
while (line = getline(line, fp)) {
/* process the line */
}
getline(3b) allocates the line buffer if it is passed in NULL for a
buffer pointer; on EOF it frees the buffer and returns NULL. It actually
malloc's a 2^n byte long buffer, stores n in the first element, and the
string starting in the second element, and returns a pointer to the
second element. That way it knows the length, and can realloc as
necessary to avoid overfilling the buffer.
-Bennett
bet@orion.mc.duke.edu