chad@lakesys.UUCP (D. Chadwick Gibbons) (02/04/89)
I believe my understanding is mislead when it comes to the interpretation of string constants and the effect the standard library functions can preform on them. Insofar as I have been told, strings can not be modified - note, the type char *blah = "this is a string"; not the everyday normal strings we use. This would appear that if you attempted to modify their contents, you would either get a core dump of some various flavor, or the program would ignore your request. In general, the function of strcat is defined as char *strcat(s, ct) char *s, *ct; Where s is the original string you wish to add too, and ct is the string you wish to append - as I'm sure you really didn't know that :) With that definition, consider the following artificial sequence char *blah = "meow"; char *tmp; tmp = strcpy(blah, "grr, snarl, hiss"); I would think since the string 'blah' is considered to be nonmodifiable that it would not be changed, but the result would be placed into tmp. However, on different systems, this provides different results: SCO XENIX/286 2.2.2 core dumps on next access of anytype to 'blah' SCO XENIX/386 2.3.2 gives various warning messages but treats 'blah' like a normal string BSD 4.2 does random things AT&T System V r3 refuses to work on Thursdays, but acts like XENIX/386 on others Apparently, either the effect of strings is not yet defined in these implementations, or, more likely, what I was taught is incorrect. Enlightenment is welcomed. -- D. Chadwick Gibbons, chad@lakesys.lakesys.com, ...!uunet!marque!lakesys!chad
gwyn@smoke.BRL.MIL (Doug Gwyn ) (02/04/89)
In article <345@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes: >Insofar as I have been told, strings can not be modified ... That depends on the implementation. Some permit it. However, you cannot portably count on being able to modify a string literal. > char *blah = "meow"; > char *tmp; > tmp = strcpy(blah, "grr, snarl, hiss"); >I would think since the string 'blah' is considered to be nonmodifiable that >it would not be changed, but the result would be placed into tmp. No, check the definition of strcpy(). You're attempting to modify a string literal. strcpy() is not obliged to second-guess your intentions and somehow save your ass. In fact in most implementations it isn't able to efficiently ascertain that you're misusing it until it makes the actual write attempt, at which point it's already too late.
gandalf@csli.STANFORD.EDU (Juergen Wagner) (02/05/89)
In article <345@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes: >... > Insofar as I have been told, strings can not >be modified - note, the type char *blah = "this is a string"; not the everyday >normal strings we use. This would appear that if you attempted to modify >their contents, you would either get a core dump of some various flavor, or >the program would ignore your request. Actually, the effect you get depends on the system you're trying this on. If your machine puts the string into text space, together with your code (HP-UX does that), you loose when you try to change the string. Other systems usually have a loader option to specify this behavior. For the sake of portability, assume constant strings are read-only. >...[TFM quote deleted] > > char *blah = "meow"; > char *tmp; > > tmp = strcpy(blah, "grr, snarl, hiss"); Ok. New let's see what this piece of code does. 'blah' is a pointer to char. It is initialized to point at the first character of the char vector < 'm', 'e', 'o', 'w', '\0' > which occupies five bytes. 'tmp' is just another pointer to char but uninitialized. The strcpy statement copies a string of length (humph!) 16 + 1 (for the zero byte) into consecutive byte locations from the point 'blah' points to on. Hmmm.... your compiler allocated five bytes for the string but you are now using 17 for the new string. Strcpy will just overwrite whatever follows the string. If that happens to be another statically allocated string, it will show changed contents. If that happens to be some data space, variables seem to change values. If that happens to be just beyond the allocated memory page, you get some kind of error (segmentation fault et al.). If your compiler happily put the string in the midst of code, you will either overwrite code or get some error like segmentation fault (text space is Read-Only). If the string was allocated on the stack, your return address might be f***ed up. If.... As you can see, there is a vast number of alternatives, and you tried some of them. As a rule of thumb, I suggest to check calls to destructive functions like strcat, strcpy, et al. very carefully. Sometimes, they cause errors by over- writing pieces of memory used in completely different portions of your program, and the stuff becomes hard to debug. Allocate all the memory you need, and don't try to overwrite static strings. -- Juergen Wagner gandalf@csli.stanford.edu wagner@arisia.xerox.com
jeenglis@nunki.usc.edu (Joe English) (02/06/89)
chad@lakesys.UUCP writes: > char *blah = "meow"; > char *tmp; > > tmp = strcpy(blah, "grr, snarl, hiss"); > >I would think since the string 'blah' is considered to be nonmodifiable that >it would not be changed, but the result would be placed into tmp. >[...] >Apparently, either the effect of strings is not yet defined in these >implementations, or, more likely, what I was taught is incorrect. What you were taught is incorrect. The type "char *" means, "pointer to char." A char * can point to either a single character or an array of characters (or NULL or a garbage value.) Since strings are stored as arrays of characters, "char *" is the type used to reference them; but you still get pointer semantics, not string semantics as in other languages. The str... functions give some string manipulation functionality, but you still have to allocate space for the strings themselves. For example, strcat(char *s1,char *s2) places a copy of the string pointed to by s2 immediately after the string pointed to by s1, where the end of each string is determined by a '\0' character value. If s1 doesn't point to an area of memory large enough to hold both strings, you have problems. Another note: the return value of strcat, strcpy, etc., is for the most part useless. strcat(s1,s2) returns s1 (which the caller presumably already knows); it does *not* make a new string. So in your example above, blah points to an array 5 characters long which is initialized to {'m','e','o','w','\0' }. Since the array is only 5 characters long, any attempts to write data past its end (like the call to strcat() does) is going to cause undefined, usually harmful behaviour. Hope this helps, --Joe English jeenglis@nunki.usc.edu
norm@oglvee.UUCP (Norman Joseph) (02/06/89)
From article <7429@csli.STANFORD.EDU>, by gandalf@csli.STANFORD.EDU (Juergen Wagner): > In article <345@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes: >> >> char *blah = "meow"; >> char *tmp; >> >> tmp = strcpy(blah, "grr, snarl, hiss"); > > and the stuff becomes hard to debug. Allocate all the memory you need, and > don't try to overwrite static strings. I can see that the above strcpy() will overwrite something somewhere since strlen( "meow" ) < strlen( "grr, snarl, hiss" ). But what if the code looked like this (ignoring `tmp' for this example): char *blah = "meow\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"; strcpy( blah, "grr, snarl, hiss" ); assuming that you could write to the space into which `blah' pointed? -- Norm Joseph - Oglevee Computer System, Inc. UUCP: ...!{pitt,cgh}!amanue!oglvee!norm "Mate, that parrot wouldn't *VROOM* if you put four million volts through it!"
karl@haddock.ima.isc.com (Karl Heuer) (02/09/89)
In article <466@oglvee.UUCP> norm@oglvee.UUCP (Norman Joseph) writes: >[The previous example overflows,] but what if the code looked like this: > char *blah = "meow\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"; > strcpy( blah, "grr, snarl, hiss" ); >assuming that you could write to the space into which `blah' pointed? You'd better also assume that string literals are not shared. Even so, you may be in for a surprise when you execute this code fragment the second time, and find that blah[0]=='g' immediately after the initialization to (apparently) "meow". This kludge is confusing and unportable. Don't use it. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
henry@utzoo.uucp (Henry Spencer) (02/09/89)
In article <345@lakesys.UUCP> chad@lakesys.UUCP (D. Chadwick Gibbons) writes: >... Insofar as I have been told, strings can not >be modified - note, the type char *blah = "this is a string"; not the everyday >normal strings we use. This would appear that if you attempted to modify >their contents, you would either get a core dump of some various flavor, or >the program would ignore your request... Not quite; the situation is that either of those things, or something much more bizarre, can happen. Note, "can", not "will". Civilized/portable programs should never attempt to modify a string literal. The effects of trying to modify one are entirely unpredictable. -- Allegedly heard aboard Mir: "A | Henry Spencer at U of Toronto Zoology toast to comrade Van Allen!!" | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
evil@arcturus.UUCP (Wade Guthrie) (02/11/89)
In article <11711@haddock.ima.isc.com>, karl@haddock.ima.isc.com (Karl Heuer) writes: > In article <466@oglvee.UUCP> norm@oglvee.UUCP (Norman Joseph) writes: > >[The previous example overflows,] but what if the code looked like this: > > char *blah = "meow\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"; > > strcpy( blah, "grr, snarl, hiss" ); [. . .] > This kludge is confusing and unportable. Don't use it. Just thought I might leap into the foray: One aspect of the portability issue is, unless I misremember, that the compiler may place string literals in protected memory (if that exists on your system) causing an exception and subsequent *BOMB* upon attempted modification. Wade Guthrie evil@arcturus.UUCP Rockwell International Anaheim, CA (Rockwell doesn't necessarily believe / stand by what I'm saying; how could they when *I* don't even know what I'm talking about???)
lupton@uhccux.uhcc.hawaii.edu (Robert Lupton) (02/16/89)
Rumour has it that sscanf modifies strings passed as a first argument on at least some machines (e.g. some suns?). Well, it doesn't actually modify the contents, but the compiler doesn't know that. Does anyone have any information? Robert
guy@auspex.UUCP (Guy Harris) (02/18/89)
>Rumour has it that sscanf modifies strings passed as a first argument >on at least some machines (e.g. some suns?). "Some" Suns? Yeesh, "_doscan" isn't one of the machine-dependent modules; the same source is used on *all* Suns. In fact, the same source is used on a bunch of non-Sun machines as well; the SunOS 3.2-3.5 version is based on the S5R2 version, the SunOS 4.0 version is based on the S5R3 version, and the version in SunOS releases prior to 3.2 is based on the 4.2BSD version, which is probably based on the V7 version. The bug exists in S5 releases from AT&T, as well as 4.xBSD. The problem is that "*scanf" - or, to be precise, "_doscan" and the routines it calls, which are the "guts" of the "scanf" routines in many implementations - uses "ungetc". All very well and good when you're doing I/O to a file; "ungetc" stuffs the ungotten character back into the I/O buffer. However, the way "sprintf" and "sscanf" work in many (most?) UNIX C implementations is that it turns the string in question into a "funny" I/O buffer; however, most "ungetc" implementations don't understand this, and try to stuff the character back into the "buffer" anyway, which means they try to modify the string. >Well, it doesn't actually modify the contents, Which, in this particular case, is, I think, true; the character being stuffed back is a character that's just been "read" from the string. >but the compiler doesn't know that. It's not the compiler that has to know that; it's "ungetc". In "comp.bugs.4bsd" this very "sscanf" bug is being discussed; one suggested fix is to have "ungetc" check whether the character it's stuffing back into the buffer is the one that is in the buffer and, if so, just back up the buffer pointer and count.
decot@hpisod2.HP.COM (Dave Decot) (02/21/89)
Note, however, that: static char blah[20] = "meow"; char *tmp; tmp = strcpy(blah, "grr, snarl, hiss"); works nicely, because enough space is allocated to hold the longer value, and the space is guaranteed to be writable. Dave