ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU (08/20/87)
I was trying to write a C program that would read MIX commands from
stdin. I also wanted to be able to verify that the string opcode
was actually internally equal to the string LDA in case the MIX
command was LDA 2000,2(0:3) <CR>. After some experimentation I
arrived at the following code. It works, but I am somewhat dismayed
by the expression (*opcode == *"LDA") . It just looks so peculiar.
Is it really OK?
#include <stdio.h>
main()
{
char opcode[4];
int address, index, left, right ;
printf("Type assembly language statement:\n\n");
scanf("%s %d,%d(%d:%d)",opcode, &address, &index, &left, &right);
printf("Opcode\t=%s\n",opcode);
printf("Address\t=%d\n",address);
printf("Index\t=%d\n",index);
printf("Field\t= (%d:%d)\n",left,right);
if (*opcode == *"LDA") printf("Gotcha!\n");
else printf("No match...\n");
}
ADLER1@BRANDEIS.BITNET
chris@mimsy.UUCP (Chris Torek) (08/22/87)
In article <8877@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU writes: >I was trying to write a C program that would read MIX commands from >stdin. I also wanted to be able to verify that the string opcode >was actually internally equal to the string LDA.... > char opcode[4]; > int address, index, left, right; > > printf("Type assembly language statement:\n\n"); > scanf("%s %d,%d(%d:%d)",opcode, &address, &index, &left, &right); > if (*opcode == *"LDA") printf("Gotcha!\n"); > else printf("No match...\n"); No doubt this has already been answered in mail directed to adler1@brandeis.bitnet, but I want to expand on this a bit. Aside from the missing test for scanf's return value, this code can be called correct: there is nothing a typechecker like lint could diagnose, for instance. Yet it does not do what was desired. To compare the characters in `opcode' with the string "LDA" for equality, one should use if (strcmp(opcode, "LDA") == 0) which is such a common idiom that old-time C programmers understand it at a glance. It seems to come late to neophyte programmers, though, and it seems reasonable to ask why. Perhaps it is because other languages provide string comparison within the language itself: if opcode stringequal "LDA" then ... or if opcode = "LDA" then ... A straightforward (but wrong) translation yeilds if (opcode == "LDA") ... which is syntactically and semantically valid, but is always false (or usually false in some compilers, and certainly false in this case.) Programming by patching (a technique familiar to mathematicians as well, in the form known as `proof by patching': `oops, well for case 2, change the original equation to . . .') leads to if (*opcode == *"LDA") which works for some test cases, since it compares opcode[0] with 'L'. I have even seen something like if (*opcode == *"LDA" && *(opcode + 1) == *("LDA" + 1) && *(opcode + 2) == *("LDA" + 2)) which works for even more test cases, but is still wrong as well as wasteful (at least in compilers for which "LDA"=="LDA" is false). Eventually it seems to dawn upon these programmers that "LDA" generates an anonymous character array holding the letters L, D, A, and NUL (\0) and evaluates to the address of this array. Then the purpose of strcmp() becomes clear, and they live happily ever after :-). All I want to know is this: Why does it take so long for some programmers to see this, and how can we speed up the process? -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
flaps@utcsri.UUCP (08/22/87)
ADLER1@BRANDEIS.BITNET writes: (char opcode[something];) > if (*opcode == *"LDA") printf("Gotcha!\n"); This compares the first letter of opcode with the first letter of "LDA". Not what you want. Strings are not fundamental types in C. You need a library function to compare them. if(strcmp(opcode,"LDA") == 0) printf... ajr <flaps@csri.toronto.edu> (also flaps at utorgpu on bitnet) "Your donation will be used to torture animals in useless experiments."
gwyn@brl-smoke.UUCP (08/22/87)
In article <8088@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >All I want to know is this: Why does it take so long for some >programmers to see this, and how can we speed up the process? Seems to me the issue is more basic -- that people are trying to GUESS how things work rather than study a good text (of which there are several, Tom Plum's among them) to KNOW how they work. If this assessment is correct, then the issue is really: How do we encourage the development of more precise thinking rather than fuzzy, approximate thinking? This is probably something best attempted while the very young are still developing their characteristic methods of thought; remedial action at an advanced age is much more difficult. It's hard enough anyway, given the dominant state of our culture.
barts@tekchips.UUCP (08/23/87)
In article <8088@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: ]In article <8877@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU writes: ]>I was trying to write a C program that would read MIX commands from ]>stdin. I also wanted to be able to verify that the string opcode ]>was actually internally equal to the string LDA.... ]> if (*opcode == *"LDA") printf("Gotcha!\n"); ]> else printf("No match...\n"); ]No doubt this has already been answered in mail directed to ]adler1@brandeis.bitnet, but I want to expand on this a bit. ] [...] ]To compare the characters in `opcode' with the string "LDA" for ]equality, one should use ] if (strcmp(opcode, "LDA") == 0) ]which is such a common idiom that old-time C programmers understand ]it at a glance. It seems to come late to neophyte programmers, ]though, and it seems reasonable to ask why. ] ]Perhaps it is because other languages provide string comparison within ]the language itself: ] [...examples deleted...] ]Eventually it seems to dawn upon these programmers that ] "LDA" ]generates an anonymous character array holding the letters L, D, ]A, and NUL (\0) and evaluates to the address of this array. Then ]the purpose of strcmp() becomes clear, and they live happily ever ]after :-). ] ]All I want to know is this: Why does it take so long for some ]programmers to see this, and how can we speed up the process? I got my first C experience about 3 years ago when I was handed a code fragment containing all sorts of marvelous UN*X ioctl() and fork()/wait() calls and told to turn it into an interactive editor/parser. Since then I have (hopefully) improved in my understanding of C and my programming style, but my introoduction to C is recent enough that I can comment on strcmp(). The single greatest problem I had in learning to use strcmp() is its return of 0 on "equality" of the strings. I was expecting a boolean-valued comparison, and this apparent sense-reversal (false on equality) threw more monkey wrenches into my early programs than I would ever have believed. Perhaps inexperienced programmers resort to trying direct comparisons ala string1 == string2 or *string1 == *string2 after a few failures of if (strcmp(string1,string2)) print("They match!\n"); to do what they expect. I'm not sure how to speed up learning the right way to use strcmp(). Maybe inexperienced programmers should be encouraged to use something like #define streq(s1,s2) (strcmp(s1,s2) == 0) until they get used to non-boolean-valued comparisons. I suppose, however, that it could be argued that this will only delay understanding strcmp(), but at least the novice will have a "function" that does what built-in equivalency tests in other languages already do. ]-- ]In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) ]Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris -- Bart Schaefer Oregon Graduate Center ...!tektronix!ogcvax!schaefer Guest at Tekchips ...!tektronix!tekchips!barts
jay@splut.UUCP (Jay Maynard) (08/24/87)
In article <8088@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > In article <8877@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU writes: > >I was trying to write a C program that would read MIX commands from > >stdin. I also wanted to be able to verify that the string opcode > >was actually internally equal to the string LDA.... > > if (*opcode == *"LDA") printf("Gotcha!\n"); > > else printf("No match...\n"); [description of learning process deleted] > Eventually it seems to dawn upon these programmers that > > "LDA" > > generates an anonymous character array holding the letters L, D, > A, and NUL (\0) and evaluates to the address of this array. Then > the purpose of strcmp() becomes clear, and they live happily ever > after :-). > > All I want to know is this: Why does it take so long for some > programmers to see this, and how can we speed up the process? Because most other languages, and all of the other languages that a programmer new to C is likely to know, handle strings intrinsically. C is the only major language that doesn't know itself what to do with strings, but instead forces programmers to kludge around with pointers and function calls instead of allowing precisely the construct described above. This is the source of most of C's crypticness (crypticity? naaaaaah.) to the inexperienced programmer. About the only way I can think of to speed up the process is to add string intrinsics to C. (asbestos suit on) -- Jay Maynard, K5ZC...>splut!< | uucp: hoptoad!academ!uhnix1!nuchat!splut!jay "Don't ask ME about Unix... | (or sun!housun!nuchat) CI$: 71036,1603 I speak SNA!" | internet: beats me GEnie: JAYMAYNARD The opinions herein are shared by neither of my cats, much less anyone else.
peter@sugar.UUCP (Peter da Silva) (08/24/87)
> Because most other languages, and all of the other languages that a > programmer new to C is likely to know, handle strings intrinsically. Pascal. Pascal doesn't even have a "variable length packed byte array" type. In fact it *can't* have one unless you extend it. I know you love Turbo, but it ain't Jensen & Wirth compatible. As for your Volvo/68000 comment. What do *you* do on the 80x86 that doesn't cause you to painfully code around segments? Use Turbo & never go over 64K? -- -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter -- U <--- not a copyrighted cartoon :->
gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/24/87)
In article <1623@tekchips.TEK.COM> barts@tekchips.UUCP (Bart Schaefer) writes: >I suppose, however, that it could be argued that this will only delay >understanding strcmp(), but at least the novice will have a "function" >that does what built-in equivalency tests in other languages already do. Perhaps it would help if they were told that strcmp() does NOT test for string equality; rather, it compare the lexical ordering of two strings. This makes it useful sometimes for the function used with qsort(). The test for exact match is simply a common special case. People often have the same problem understanding the function of the UNIX "cat" utility; they think of it as "printing a file", but that is just a special case of its general use as a file concatenator. This attempt to achive maximal generality is characteristic of UNIX, at least as it was originally developed, and is one of the first things that a person learning to program in C or on UNIX should learn. Kernighan & Plauger's "Software Tools" is a good introduction; Kernighan & Pike's "The UNIX Programming Environment" also teaches this point.
billc@trsvax.UUCP (08/25/87)
>/* Written 10:55 pm Aug 19, 1987 by wiscvm.wisc.EDU!ADLER1%BRANDEIS.*/ >/* ---------- "*\"LDA\" ok?" ---------- */ >I was trying to write a C program that would read MIX commands from >stdin. I also wanted to be able to verify that the string opcode >was actually internally equal to the string LDA in case the MIX >command was LDA 2000,2(0:3) <CR>. After some experimentation I >arrived at the following code. It works, but I am somewhat dismayed >by the expression (*opcode == *"LDA") . It just looks so peculiar. >Is it really OK? NO!!! What you're doing here is simply comparing the first character from each string. Instead, use something like this: strupr (opcode); /* convert any lower case chars to upper case */ if (! strcmp (opcode, "LDA")) printf ("Got match.\n");
jay@splut.UUCP (Jay Maynard) (08/25/87)
In article <560@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes: > > Because most other languages, and all of the other languages that a > > programmer new to C is likely to know, handle strings intrinsically. > > Pascal doesn't even have a "variable length packed byte array" type. In > fact it *can't* have one unless you extend it. I know you love Turbo, > but it ain't Jensen & Wirth compatible. Turbo isn't the only Pascal that handles strings...in fact, how many strictly-J&W-compatible commercial Pascals do you know of? How many non-J&Ws? > As for your Volvo/68000 comment. What do *you* do on the 80x86 that > doesn't cause you to painfully code around segments? Use Turbo & never > go over 64K? I use linked lists allocated off the heap, where appropriate...or some similar technique. Generally, it can be dealt with through appropriate choice of algorithm (have we seen that discussion before...?) I've never done anything that required a single data element >64K, but such applications are fairly exotic. > -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter > -- U <--- not a copyrighted cartoon :-> Yeah, I know...bleh. -- Jay Maynard, K5ZC...>splut!< | uucp: hoptoad!academ!uhnix1!nuchat!splut!jay "Don't ask ME about Unix... | (or sun!housun!nuchat) CI$: 71036,1603 I speak SNA!" | internet: beats me GEnie: JAYMAYNARD The opinions herein are shared by neither of my cats, much less anyone else.
peter@sugar.UUCP (Peter da Silva) (08/25/87)
In article <92@splut.UUCP>, jay@splut.UUCP (Jay Maynard) writes: > In article <560@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes: > > > Because most other languages, and all of the other languages that a > > > programmer new to C is likely to know, handle strings intrinsically. > > > > Pascal doesn't even have a "variable length packed byte array" type. In > > fact it *can't* have one unless you extend it. I know you love Turbo, > > but it ain't Jensen & Wirth compatible. > > Turbo isn't the only Pascal that handles strings...in fact, how many > strictly-J&W-compatible commercial Pascals do you know of? How many > non-J&Ws? I learned Pascal using a J&W compiler. We're talking about "other languages that a programmer new to 'C' is likely to know" here... not some weird variant of Pascal that isn't even a proper superset of J&W (as UCSD, for example, is). Before you come back with some variant of "Turbo is becoming (or even is) a standard", let me remind you that UCSD once had the same cachet. And of course Turbo strings aren't the same as UCSD strings aren't the same as Pascal/2 strings... How about Fortran pre-F77? How about assembler? How about PL/M? Also, many of the languages that do have strings don't give you much more than the equivalent of "strcpy", "strcmp", "strncpy", and so on. For example, Fortran 77. About the only place you can do more with strings than copying bytes into preallocated data is in I/O statements. I'll take *printf over Fortran formatted I/O any day. -- -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter -- U <--- not a copyrighted cartoon :->
rbutterworth@orchid.UUCP (08/26/87)
In article <6332@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > Perhaps it would help if they were told that strcmp() does NOT test for > string equality; rather, it compare the lexical ordering of two strings. > This makes it useful sometimes for the function used with qsort(). The > test for exact match is simply a common special case. The biggest problem I've found with my and other's understanding of strcmp() is its name. Until you get used to it, "!strcmp()", "strmcp() != 0", and other such usages are quite non-obvious. In most cases the function is simply used as a true or false test, and it isn't obvious that a true comparison should mean that the strings are different. If it were named say strdif(), then something like "if (!strdif(a,b)) ..." or "if (strdif(a,b))" would be much more readable for the beginner. i.e. the truth indicates that the strings were different, something that even beginners should be able to understand, as opposed to the truth indicating that the strings were comparable, a concept that isn't all that obvious to me even after years of use. The word "compare" says what you want to do with the arguments, the word "difference" says what result you want. It's much easier to think of this particular function in terms of what it returns rather than in terms of what it does with its arguments. On the other hand, functions such as printf() are appropriately named according to what they do to their arguments, not the value they return, and so there is much less confusion. Of course there isn't much we can do about it now, but this is something that should be considered when making up names for new functions. We speak of "evolving" languages, but somehow I think that if Darwin had had to contend with the concept of "backward compatibility" he would have given up.
arnold@emory.uucp (Arnold D. Robbins {EUCC}) (08/26/87)
There has been considerable discussion about C's strings and the fact that the lack of string operands is a hindrance. Several years ago I suggested a string operator, but I got little response. Here's my idea again. Add a new symbol for use in comparison, assignment, and argument declarations and functions calls that pass arrays *by value*, say "`". It would be used analogously to * in pointer declarations/use. Array comparison: if (`x == `y) if (`x == `"LDA") Function declaration: int foo (char `arg); /* requires dope vector */ x = foo (`x); char (`junk[5])(); /* function returning array! (length 5) */ Array assignment: `x = `y; There would have to be a number of new rules relating to arrays of the same type but of different length, and using arrays of different types. In particular, it would probably be necessary to special case array of char so that even if two arrays are of different length, all operations would work as if the str* functions had been called, i.e. terminating on a 0 byte. The advantages of this proposal is that it adds something many people feel has long been missing (array operations, passing arrays by value), but without overloading an existing operator or breaking any current code. The disadvantages are that function calls would now require the use of dope vectors, and assigments and comparisons would be compound operations (i.e. a hidden loop); so what looks like a simple, quick operation (like comparing two integers) could be a very long, slow operation. Function call/return times also could increase. Well, so much for throwing out ideas. Any comments? -- Arnold Robbins ARPA, CSNET: arnold@emory.ARPA BITNET: arnold@emory UUCP: { decvax, gatech, sun!sunatl }!emory!arnold ONE-OF-THESE-DAYS: arnold@emory.mathcs.emory.edu
edw@ius1.cs.cmu.edu (Eddie Wyatt) (08/28/87)
In article <2211@emory.uucp>, arnold@emory.uucp (Arnold D. Robbins {EUCC}) writes: > There has been considerable discussion about C's strings and the fact that > the lack of string operands is a hindrance. Several years ago I suggested > a string operator, but I got little response. Here's my idea again. > > Add a new symbol for use in comparison, assignment, and argument declarations > and functions calls that pass arrays *by value*, say "`". It would be > used analogously to * in pointer declarations/use. > > Array comparison: > if (`x == `y) > if (`x == `"LDA") > > Function declaration: > int foo (char `arg); /* requires dope vector */ > x = foo (`x); > char (`junk[5])(); /* function returning array! (length 5) */ > > Array assignment: > `x = `y; The problem with applying these operations to arrays in general is that the size of an array may be (is usually) unknown to the compiler. x = (int *) malloc(sizeof(int)*4000); `y = `x; How many bytes should be copied???? Can't know unless the compiler understands the sematics of the first statement. I'm sure you can start to imagine all the posible bad situations. You may restrict the ` operator to arrays with known bounds (ie an array declaration for the variables involved is within scope - int x[3], y[3], not int *x, y[3]) But if this restrict is made then the facility becomes of very little use for a lot of code assumes unbounded array and hence could not take advantage of this construct. > > There would have to be a number of new rules relating to arrays > of the same type but of different length, and using arrays of different > types. In particular, it would probably be necessary to special case > array of char so that even if two arrays are of different length, all > operations would work as if the str* functions had been called, i.e. > terminating on a 0 byte. > > The advantages of this proposal is that it adds something many > people feel has long been missing (array operations, passing arrays by ^^^^^^^^^^^^^^^ > value), but without overloading an existing operator or breaking any ^^^^^ > current code. No, what is missing from the language is the "concept" of input and output parameters. The user of the language should be insolated from the actually way parameters are passed around. It should be up to the compiler to determine whether an input parameter should be passed by reference or passed by value based on which ever method is faster for the particular parameter. The problems of passed by value for arrays is that the dimensions of the array are again generally unknown. > The disadvantages are that function calls would now require > the use of dope vectors, and assigments and comparisons would be > compound operations (i.e. a hidden loop); so what looks like a simple, > quick operation (like comparing two integers) could be a very long, > slow operation. Function call/return times also could increase. > > Well, so much for throwing out ideas. Any comments? > -- > Arnold Robbins > ARPA, CSNET: arnold@emory.ARPA BITNET: arnold@emory > UUCP: { decvax, gatech, sun!sunatl }!emory!arnold > ONE-OF-THESE-DAYS: arnold@emory.mathcs.emory.edu -- Eddie Wyatt e-mail: edw@ius1.cs.cmu.edu
jpn@teddy.UUCP (John P. Nelson) (08/28/87)
>C is the only major language that doesn't know itself what to do with >strings, but instead forces programmers to kludge around with pointers and >function calls instead of allowing precisely the construct described above. What about Pascal? I mean ISO standard Pascal, not some nonstandard extension. Text manipulation in standard pascal is an order of magnitude more painful than in C. How about fortran IV? I know, the 77 standard includes a character type, but before that, strings were pretty painful. Not all fortran compilers are up to the 77 standard, yet. Even the '77 standard leaves something to be desired when it comes to text work.
peter@sugar.UUCP (08/29/87)
#define EQUAL 0 if(strcmp() == EQUAL) if(strcmp() > EQUAL) if(strcmp() <= EQUAL) etc... It even makes sense: SUI #23 JZ match / :-> -- -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter -- U <--- not a copyrighted cartoon :->