[comp.lang.c] *\"LDA\" ok?

ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU (08/20/87)

I was trying to write a C program that would read MIX commands from
stdin. I also wanted to be able to verify that the string opcode
was actually internally equal to the string LDA in case the MIX
command was  LDA 2000,2(0:3)  <CR>. After some experimentation I
arrived at the following code. It works, but I am somewhat dismayed
by the expression (*opcode == *"LDA") . It just looks so peculiar.
Is it really OK?

#include <stdio.h>

main()
{
        char opcode[4];
        int  address, index, left, right ;

        printf("Type assembly language statement:\n\n");
        scanf("%s %d,%d(%d:%d)",opcode, &address, &index, &left, &right);
        printf("Opcode\t=%s\n",opcode);
        printf("Address\t=%d\n",address);
        printf("Index\t=%d\n",index);
        printf("Field\t= (%d:%d)\n",left,right);

        if (*opcode == *"LDA") printf("Gotcha!\n");
        else printf("No match...\n");
}

ADLER1@BRANDEIS.BITNET

chris@mimsy.UUCP (Chris Torek) (08/22/87)

In article <8877@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU writes:
>I was trying to write a C program that would read MIX commands from
>stdin. I also wanted to be able to verify that the string opcode
>was actually internally equal to the string LDA....

>        char opcode[4];
>        int  address, index, left, right;
>
>        printf("Type assembly language statement:\n\n");
>        scanf("%s %d,%d(%d:%d)",opcode, &address, &index, &left, &right);

>        if (*opcode == *"LDA") printf("Gotcha!\n");
>        else printf("No match...\n");

No doubt this has already been answered in mail directed to
adler1@brandeis.bitnet, but I want to expand on this a bit.  Aside
from the missing test for scanf's return value, this code can be
called correct: there is nothing a typechecker like lint could
diagnose, for instance.  Yet it does not do what was desired.
To compare the characters in `opcode' with the string "LDA" for
equality, one should use

	if (strcmp(opcode, "LDA") == 0)

which is such a common idiom that old-time C programmers understand
it at a glance.  It seems to come late to neophyte programmers,
though, and it seems reasonable to ask why.

Perhaps it is because other languages provide string comparison within
the language itself:

	if opcode stringequal "LDA" then ...

or

	if opcode = "LDA" then ...

A straightforward (but wrong) translation yeilds

	if (opcode == "LDA") ...

which is syntactically and semantically valid, but is always false
(or usually false in some compilers, and certainly false in this case.)
Programming by patching (a technique familiar to mathematicians as
well, in the form known as `proof by patching': `oops, well for case
2, change the original equation to . . .') leads to

	if (*opcode == *"LDA")

which works for some test cases, since it compares opcode[0] with
'L'.  I have even seen something like

	if (*opcode == *"LDA" &&
	    *(opcode + 1) == *("LDA" + 1) &&
	    *(opcode + 2) == *("LDA" + 2))

which works for even more test cases, but is still wrong as well
as wasteful (at least in compilers for which "LDA"=="LDA" is false).

Eventually it seems to dawn upon these programmers that

	"LDA"

generates an anonymous character array holding the letters L, D,
A, and NUL (\0) and evaluates to the address of this array.  Then
the purpose of strcmp() becomes clear, and they live happily ever
after :-).

All I want to know is this:  Why does it take so long for some
programmers to see this, and how can we speed up the process?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

flaps@utcsri.UUCP (08/22/87)

ADLER1@BRANDEIS.BITNET writes:
    (char opcode[something];)
>   if (*opcode == *"LDA") printf("Gotcha!\n");

This compares the first letter of opcode with the first letter of "LDA".
Not what you want.

Strings are not fundamental types in C.  You need a library function to compare
them.

    if(strcmp(opcode,"LDA") == 0)
	printf...

ajr <flaps@csri.toronto.edu> (also flaps at utorgpu on bitnet)

"Your donation will be used to torture animals in useless experiments."

gwyn@brl-smoke.UUCP (08/22/87)

In article <8088@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>All I want to know is this:  Why does it take so long for some
>programmers to see this, and how can we speed up the process?

Seems to me the issue is more basic -- that people are trying to
GUESS how things work rather than study a good text (of which
there are several, Tom Plum's among them) to KNOW how they work.
If this assessment is correct, then the issue is really:  How do
we encourage the development of more precise thinking rather
than fuzzy, approximate thinking?  This is probably something
best attempted while the very young are still developing their
characteristic methods of thought; remedial action at an
advanced age is much more difficult.  It's hard enough anyway,
given the dominant state of our culture.

barts@tekchips.UUCP (08/23/87)

In article <8088@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
]In article <8877@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU writes:
]>I was trying to write a C program that would read MIX commands from
]>stdin. I also wanted to be able to verify that the string opcode
]>was actually internally equal to the string LDA....
]>        if (*opcode == *"LDA") printf("Gotcha!\n");
]>        else printf("No match...\n");
]No doubt this has already been answered in mail directed to
]adler1@brandeis.bitnet, but I want to expand on this a bit.
] [...]
]To compare the characters in `opcode' with the string "LDA" for
]equality, one should use
]	if (strcmp(opcode, "LDA") == 0)
]which is such a common idiom that old-time C programmers understand
]it at a glance.  It seems to come late to neophyte programmers,
]though, and it seems reasonable to ask why.
]
]Perhaps it is because other languages provide string comparison within
]the language itself:
] [...examples deleted...]
]Eventually it seems to dawn upon these programmers that
]	"LDA"
]generates an anonymous character array holding the letters L, D,
]A, and NUL (\0) and evaluates to the address of this array.  Then
]the purpose of strcmp() becomes clear, and they live happily ever
]after :-).
]
]All I want to know is this:  Why does it take so long for some
]programmers to see this, and how can we speed up the process?

I got my first C experience about 3 years ago when I was handed a code fragment
containing all sorts of marvelous UN*X ioctl() and fork()/wait() calls and told
to turn it into an interactive editor/parser.  Since then I have (hopefully)
improved in my understanding of C and my programming style, but my introoduction
to C is recent enough that I can comment on strcmp().

The single greatest problem I had in learning to use strcmp() is its return of
0 on "equality" of the strings.  I was expecting a boolean-valued comparison,
and this apparent sense-reversal (false on equality) threw more monkey wrenches
into my early programs than I would ever have believed.

Perhaps inexperienced programmers resort to trying direct comparisons ala
	string1 == string2	or	*string1 == *string2
after a few failures of
	if (strcmp(string1,string2)) print("They match!\n");
to do what they expect.

I'm not sure how to speed up learning the right way to use strcmp().  Maybe
inexperienced programmers should be encouraged to use something like
	#define streq(s1,s2)	(strcmp(s1,s2) == 0)
until they get used to non-boolean-valued comparisons.  I suppose, however,
that it could be argued that this will only delay understanding strcmp(), but
at least the novice will have a "function" that does what built-in equivalency
tests in other languages already do.

]-- 
]In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
]Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris
-- 
Bart Schaefer
Oregon Graduate Center				...!tektronix!ogcvax!schaefer
Guest at Tekchips				...!tektronix!tekchips!barts

jay@splut.UUCP (Jay Maynard) (08/24/87)

In article <8088@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <8877@brl-adm.ARPA> ADLER1%BRANDEIS.BITNET@wiscvm.wisc.EDU writes:
> >I was trying to write a C program that would read MIX commands from
> >stdin. I also wanted to be able to verify that the string opcode
> >was actually internally equal to the string LDA....
> >        if (*opcode == *"LDA") printf("Gotcha!\n");
> >        else printf("No match...\n");

 [description of learning process deleted]

> Eventually it seems to dawn upon these programmers that
> 
> 	"LDA"
> 
> generates an anonymous character array holding the letters L, D,
> A, and NUL (\0) and evaluates to the address of this array.  Then
> the purpose of strcmp() becomes clear, and they live happily ever
> after :-).
> 
> All I want to know is this:  Why does it take so long for some
> programmers to see this, and how can we speed up the process?

Because most other languages, and all of the other languages that a
programmer new to C is likely to know, handle strings intrinsically.

C is the only major language that doesn't know itself what to do with
strings, but instead forces programmers to kludge around with pointers and
function calls instead of allowing precisely the construct described above.

This is the source of most of C's crypticness (crypticity? naaaaaah.) to the
inexperienced programmer.

About the only way I can think of to speed up the process is to add string
intrinsics to C. (asbestos suit on)

-- 
Jay Maynard, K5ZC...>splut!< | uucp: hoptoad!academ!uhnix1!nuchat!splut!jay
"Don't ask ME about Unix...  | (or sun!housun!nuchat)       CI$: 71036,1603
I speak SNA!"                | internet: beats me         GEnie: JAYMAYNARD
The opinions herein are shared by neither of my cats, much less anyone else.

peter@sugar.UUCP (Peter da Silva) (08/24/87)

> Because most other languages, and all of the other languages that a
> programmer new to C is likely to know, handle strings intrinsically.

Pascal.

Pascal doesn't even have a "variable length packed byte array" type. In
fact it *can't* have one unless you extend it. I know you love Turbo,
but it ain't Jensen & Wirth compatible.

As for your Volvo/68000 comment. What do *you* do on the 80x86 that
doesn't cause you to painfully code around segments? Use Turbo & never
go over 64K?
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter
--                  U   <--- not a copyrighted cartoon :->

gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/24/87)

In article <1623@tekchips.TEK.COM> barts@tekchips.UUCP (Bart Schaefer) writes:
>I suppose, however, that it could be argued that this will only delay
>understanding strcmp(), but at least the novice will have a "function"
>that does what built-in equivalency tests in other languages already do.

Perhaps it would help if they were told that strcmp() does NOT test for
string equality; rather, it compare the lexical ordering of two strings.
This makes it useful sometimes for the function used with qsort().  The
test for exact match is simply a common special case.

People often have the same problem understanding the function of the
UNIX "cat" utility; they think of it as "printing a file", but that is
just a special case of its general use as a file concatenator.  This
attempt to achive maximal generality is characteristic of UNIX, at least
as it was originally developed, and is one of the first things that a
person learning to program in C or on UNIX should learn.  Kernighan &
Plauger's "Software Tools" is a good introduction; Kernighan & Pike's
"The UNIX Programming Environment" also teaches this point.

billc@trsvax.UUCP (08/25/87)

>/* Written 10:55 pm  Aug 19, 1987 by wiscvm.wisc.EDU!ADLER1%BRANDEIS.*/
>/* ---------- "*\"LDA\" ok?" ---------- */
>I was trying to write a C program that would read MIX commands from
>stdin. I also wanted to be able to verify that the string opcode
>was actually internally equal to the string LDA in case the MIX
>command was  LDA 2000,2(0:3)  <CR>. After some experimentation I
>arrived at the following code. It works, but I am somewhat dismayed
>by the expression (*opcode == *"LDA") . It just looks so peculiar.
>Is it really OK?

NO!!!

What you're doing here is simply comparing the first character from each
string.  Instead, use something like this:

	strupr (opcode); /* convert any lower case chars to upper case */
	if (! strcmp (opcode, "LDA")) printf ("Got match.\n");

jay@splut.UUCP (Jay Maynard) (08/25/87)

In article <560@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
> > Because most other languages, and all of the other languages that a
> > programmer new to C is likely to know, handle strings intrinsically.
> 
> Pascal doesn't even have a "variable length packed byte array" type. In
> fact it *can't* have one unless you extend it. I know you love Turbo,
> but it ain't Jensen & Wirth compatible.

Turbo isn't the only Pascal that handles strings...in fact, how many
strictly-J&W-compatible commercial Pascals do you know of? How many
non-J&Ws?

> As for your Volvo/68000 comment. What do *you* do on the 80x86 that
> doesn't cause you to painfully code around segments? Use Turbo & never
> go over 64K?

I use linked lists allocated off the heap, where appropriate...or some
similar technique. Generally, it can be dealt with through appropriate
choice of algorithm (have we seen that discussion before...?) I've never
done anything that required a single data element >64K, but such
applications are fairly exotic.

> -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter
> --                  U   <--- not a copyrighted cartoon :->

Yeah, I know...bleh.

-- 
Jay Maynard, K5ZC...>splut!< | uucp: hoptoad!academ!uhnix1!nuchat!splut!jay
"Don't ask ME about Unix...  | (or sun!housun!nuchat)       CI$: 71036,1603
I speak SNA!"                | internet: beats me         GEnie: JAYMAYNARD
The opinions herein are shared by neither of my cats, much less anyone else.

peter@sugar.UUCP (Peter da Silva) (08/25/87)

In article <92@splut.UUCP>, jay@splut.UUCP (Jay Maynard) writes:
> In article <560@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
> > > Because most other languages, and all of the other languages that a
> > > programmer new to C is likely to know, handle strings intrinsically.
> > 
> > Pascal doesn't even have a "variable length packed byte array" type. In
> > fact it *can't* have one unless you extend it. I know you love Turbo,
> > but it ain't Jensen & Wirth compatible.
> 
> Turbo isn't the only Pascal that handles strings...in fact, how many
> strictly-J&W-compatible commercial Pascals do you know of? How many
> non-J&Ws?

I learned Pascal using a J&W compiler. We're talking about "other languages
that a programmer new to 'C' is likely to know" here... not some weird
variant of Pascal that isn't even a proper superset of J&W (as UCSD,
for example, is). Before you come back with some variant of "Turbo is
becoming (or even is) a standard", let me remind you that UCSD once had the
same cachet.

And of course Turbo strings aren't the same as UCSD strings aren't the same
as Pascal/2 strings...

How about Fortran pre-F77?

How about assembler?

How about PL/M?

Also, many of the languages that do have strings don't give you much more than
the equivalent of "strcpy", "strcmp", "strncpy", and so on. For example,
Fortran 77. About the only place you can do more with strings than copying
bytes into preallocated data is in I/O statements. I'll take *printf over
Fortran formatted I/O any day.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter
--                  U   <--- not a copyrighted cartoon :->

rbutterworth@orchid.UUCP (08/26/87)

In article <6332@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> Perhaps it would help if they were told that strcmp() does NOT test for
> string equality; rather, it compare the lexical ordering of two strings.
> This makes it useful sometimes for the function used with qsort().  The
> test for exact match is simply a common special case.

The biggest problem I've found with my and other's understanding
of strcmp() is its name.  Until you get used to it, "!strcmp()",
"strmcp() != 0", and other such usages are quite non-obvious.

In most cases the function is simply used as a true or false test,
and it isn't obvious that a true comparison should mean that the
strings are different.

If it were named say strdif(), then something like
"if (!strdif(a,b)) ..." or "if (strdif(a,b))" would be much more
readable for the beginner.  i.e. the truth indicates that the
strings were different, something that even beginners should be
able to understand, as opposed to the truth indicating that the
strings were comparable, a concept that isn't all that obvious
to me even after years of use.

The word "compare" says what you want to do with the arguments,
the word "difference" says what result you want.  It's much easier
to think of this particular function in terms of what it returns
rather than in terms of what it does with its arguments.  On the
other hand, functions such as printf() are appropriately named
according to what they do to their arguments, not the value
they return, and so there is much less confusion. 

Of course there isn't much we can do about it now, but this is
something that should be considered when making up names for new
functions.

We speak of "evolving" languages, but somehow I think that if
Darwin had had to contend with the concept of "backward compatibility"
he would have given up.

arnold@emory.uucp (Arnold D. Robbins {EUCC}) (08/26/87)

There has been considerable discussion about C's strings and the fact that
the lack of string operands is a hindrance. Several years ago I suggested
a string operator, but I got little response. Here's my idea again.

Add a new symbol for use in comparison, assignment, and argument declarations
and functions calls that pass arrays *by value*, say "`". It would be
used analogously to * in pointer declarations/use.

Array comparison:
	if (`x == `y)
	if (`x == `"LDA")

Function declaration:
	int foo (char `arg);	/* requires dope vector */
	x = foo (`x);
	char (`junk[5])();	/* function returning array! (length 5) */

Array assignment:
	`x = `y;

	There would have to be a number of new rules relating to arrays
of the same type but of different length, and using arrays of different
types. In particular, it would probably be necessary to special case
array of char so that even if two arrays are of different length, all
operations would work as if the str* functions had been called, i.e.
terminating on a 0 byte.

	The advantages of this proposal is that it adds something many
people feel has long been missing (array operations, passing arrays by
value), but without overloading an existing operator or breaking any
current code.
	The disadvantages are that function calls would now require
the use of dope vectors, and assigments and comparisons would be
compound operations (i.e. a hidden loop); so what looks like a simple,
quick operation (like comparing two integers) could be a very long,
slow operation. Function call/return times also could increase.

	Well, so much for throwing out ideas. Any comments?
-- 
Arnold Robbins
ARPA, CSNET:	arnold@emory.ARPA	BITNET: arnold@emory
UUCP:	{ decvax, gatech, sun!sunatl }!emory!arnold
ONE-OF-THESE-DAYS:	arnold@emory.mathcs.emory.edu

edw@ius1.cs.cmu.edu (Eddie Wyatt) (08/28/87)

In article <2211@emory.uucp>, arnold@emory.uucp (Arnold D. Robbins {EUCC}) writes:
> There has been considerable discussion about C's strings and the fact that
> the lack of string operands is a hindrance. Several years ago I suggested
> a string operator, but I got little response. Here's my idea again.
> 
> Add a new symbol for use in comparison, assignment, and argument declarations
> and functions calls that pass arrays *by value*, say "`". It would be
> used analogously to * in pointer declarations/use.
> 
> Array comparison:
> 	if (`x == `y)
> 	if (`x == `"LDA")
> 
> Function declaration:
> 	int foo (char `arg);	/* requires dope vector */
> 	x = foo (`x);
> 	char (`junk[5])();	/* function returning array! (length 5) */
> 
> Array assignment:
> 	`x = `y;

   The problem with applying these operations to arrays in general is that the
size of an array may be (is usually) unknown to the compiler.

     x = (int *) malloc(sizeof(int)*4000);

    `y = `x;

    How many bytes should be copied????  Can't know unless the compiler
understands the sematics of the first statement.  I'm sure you can
start to imagine all the posible bad situations.

   You may restrict the ` operator to arrays with known bounds
(ie an array declaration for the variables involved is within
scope - int x[3], y[3], not int *x, y[3])  But if this restrict
is made then the facility becomes of very little use for a lot of code
assumes unbounded array  and hence could not take advantage
of this construct.

> 
> 	There would have to be a number of new rules relating to arrays
> of the same type but of different length, and using arrays of different
> types. In particular, it would probably be necessary to special case
> array of char so that even if two arrays are of different length, all
> operations would work as if the str* functions had been called, i.e.
> terminating on a 0 byte.
> 
> 	The advantages of this proposal is that it adds something many
> people feel has long been missing (array operations, passing arrays by
							^^^^^^^^^^^^^^^
> value), but without overloading an existing operator or breaking any
 ^^^^^
> current code.

    No, what is missing from the language is the "concept" of input and
output parameters.  The user of the language should be insolated from
the actually way parameters are passed around.  It should be up to the 
compiler to determine whether an input parameter should be passed by
reference or passed by value based on which ever method is faster
for the particular parameter.

  The problems of passed by value for arrays is that the dimensions of
the array are again generally unknown.

> 	The disadvantages are that function calls would now require
> the use of dope vectors, and assigments and comparisons would be
> compound operations (i.e. a hidden loop); so what looks like a simple,
> quick operation (like comparing two integers) could be a very long,
> slow operation. Function call/return times also could increase.
> 
> 	Well, so much for throwing out ideas. Any comments?
> -- 
> Arnold Robbins
> ARPA, CSNET:	arnold@emory.ARPA	BITNET: arnold@emory
> UUCP:	{ decvax, gatech, sun!sunatl }!emory!arnold
> ONE-OF-THESE-DAYS:	arnold@emory.mathcs.emory.edu

-- 

					Eddie Wyatt

e-mail: edw@ius1.cs.cmu.edu

jpn@teddy.UUCP (John P. Nelson) (08/28/87)

>C is the only major language that doesn't know itself what to do with
>strings, but instead forces programmers to kludge around with pointers and
>function calls instead of allowing precisely the construct described above.

What about Pascal?  I mean ISO standard Pascal, not some nonstandard
extension.  Text manipulation in standard pascal is an order of magnitude
more painful than in C.

How about fortran IV?  I know, the 77 standard includes a character type,
but before that, strings were pretty painful.  Not all fortran compilers
are up to the 77 standard, yet.  Even the '77 standard leaves something
to be desired when it comes to text work.

peter@sugar.UUCP (08/29/87)

#define EQUAL 0

	if(strcmp() == EQUAL)
	if(strcmp() > EQUAL)
	if(strcmp() <= EQUAL)
	etc...

It even makes sense:

	SUI	#23
	JZ	match		/ :->
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter
--                  U   <--- not a copyrighted cartoon :->