jimli@blake.acs.washington.edu (Jimmy Li) (11/16/89)
I wanted to know what is the best way to compare two strings? For example, I have the following decl. char array[10000][200]; What is the fastest way to sort this array? (I heard that 'strcmp' is slow if the strings are long. Why?) Thanks a lot.
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/16/89)
In article <4463@blake.acs.washington.edu> jimli@blake.acs.washington.edu (Jimmy Li) writes: >char array[10000][200]; >What is the fastest way to sort this array? Dump it into an external file and invoke the system sort utility on the file. You could also use qsort(), but this is such a ridiculously large amount of storage to statically allocate that you should be thinking of alternatives. >(I heard that 'strcmp' is slow if the strings are long. Why?) No, strcmp() is not particularly slow unless the strings typically differ only after the Nth character where N >> 1. Many of us use a macro like the following for simple string equality testing, to save the function-call overhead for typical strcmp() implementations: #define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */
CMH117@PSUVM.BITNET (Charles Hannum) (11/16/89)
1) Unless you are dealing with very similar strings, there is nothing wrong with strcmp(). 2) Now, if your question is actually how to sort the array, then look up the qsort() function in your favorite C reference. -- - Charles Martin Hannum II "Klein bottle for sale... Inquire within." (and PROUD OF IT!!!) "To life immortal!" c9h@psuecl.psu.edu "No noozzzz izzz netzzznoozzzzz..." cmh117@psuvm.psu.edu "Memories, all alone in the moonlight ..."
dan@charyb.COM (Dan Mick) (11/16/89)
In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ Why the UNSAFE comment? This looks like utterly standard C to me... -- .sig files are idiotic and wasteful.
bengsig@oracle.nl (Bjorn Engsig) (11/16/89)
Article <11605@smoke.BRL.MIL> by gwyn@brl.arpa (Doug Gwyn) says: |[ you can use this macro to ] |save the function-call overhead for typical strcmp() implementations: | |#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ Is it unsafe because of sideeffects of evaluating a and b twice, or is there something else? -- Bjorn Engsig, Domain: bengsig@oracle.nl, Path: uunet!mcsun!orcenl!bengsig USA Domain: bengsig@nlsun1.oracle.com
aic@mentor.cc.purdue.edu (George A. Basar) (11/16/89)
In article <308@charyb.COM>, dan@charyb.COM (Dan Mick) writes: >In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ > > Why the UNSAFE comment? This looks like utterly standard C to me... > -- It is not unsafe, it is just that he was looking for a a way to not perform the strcmp if the first chars were unequal. He stated it was unsafe(performance-wise) since the order of evaluation is suspect. So the strcmp may be performed before the test for *(a)==*(b). A more 'safe' StrEq is #define StrEq(a,b) ((*(a) == *(b))?strcmp(a,b):*(a)-*(b)) This will guarantee order of evaluation, but with the added overhead of the terniary operator. Actually, the difference is only 3 instructions in favor of my version(22 to 19). * George A. Basar (317)742-8799 (home) * aic@mentor.cc.purdue.edu BASAR@PURCCVM.BITNET | General Consultant (317)494-1787 (work) | Purdue University Computing Center Discalimer: My opinions are just that, mine.
will@charyb.COM (Will Crowder) (11/17/89)
In article <308@charyb.COM> dan@charyb.UUCP (Dan Mick) writes: >In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) >>/* UNSAFE */ > >Why the UNSAFE comment? This looks like utterly standard C to me... This macro is "unsafe" because the macro arguments a and b are used twice, hence possible side effect difficulties... Will
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/17/89)
In article <308@charyb.COM> dan@charyb.UUCP (Dan Mick) writes: | In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: | >#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ | | Why the UNSAFE comment? This looks like utterly standard C to me... I can only speculate, but strcmp can give memory faults if the string is not properly terminated. There may be other reasons I'm missing, I lost a day's news and don't have the original. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/17/89)
In article <5205@mentor.cc.purdue.edu> aic@mentor.cc.purdue.edu (George A. Basar) writes: | It is not unsafe, it is just that he was looking for a a way to not | perform the strcmp if the first chars were unequal. He stated it was | unsafe(performance-wise) since the order of evaluation is suspect. So the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | strcmp may be performed before the test for *(a)==*(b). Not in C, it's not. && is evaluated left to right. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
aitken@hati.cs.cornell.edu (William E. Aitken) (11/17/89)
In article <1632@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: >In article <308@charyb.COM> dan@charyb.UUCP (Dan Mick) writes: >| In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >| >#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ >| >| Why the UNSAFE comment? This looks like utterly standard C to me... > Consider ``StrEq(p++, q++);'' for example. William E. Aitken <aitken@cs.cornell.edu> | On Earth it is considered {uw-beaver,rochester,decvax}!cornell!aitken | ill mannered to kill your Home: (607) 273-7810 Away : (607) 255-9206 | friends while commiting suicide ============================================*============= Avon ==============
aic@mentor.cc.purdue.edu (George A. Basar) (11/17/89)
In article <5205@mentor.cc.purdue.edu>, aic@mentor.cc.purdue.edu I write: > It is not unsafe, it is just that he was looking for a a way to not > perform the strcmp if the first chars were unequal. He stated it was > unsafe(performance-wise) since the order of evaluation is suspect. So the I blew it, apologies to all. No excuse given. This is not unsafe for the application specified, though, which was a fixed array of strings(aasuming a calling convention of StrEq(a[i],b[i])). There are many ways to optimize a solution to a specific problem that would fail in other cases. Once again, sorry. Revoke my cc rights and make me use fortran :-). * George A. Basar (317)742-8799 (home) * aic@mentor.cc.purdue.edu BASAR@PURCCVM.BITNET | General Consultant (317)494-1787 (work) | Purdue University Computing Center Discalimer: My opinions are just that, mine.
ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (11/17/89)
In article <1632@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: : In article <308@charyb.COM> dan@charyb.UUCP (Dan Mick) writes: : | In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: : | >#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ : | : | Why the UNSAFE comment? This looks like utterly standard C to me... : : I can only speculate, but strcmp can give memory faults if the string : is not properly terminated. There may be other reasons I'm missing, I : lost a day's news and don't have the original. I thought "unsafe" was common C jargon for a macro that looks like a function but may evaluate one or more arguments more than once. This is such a macro: if a or b contains a function call or an assignment you may be in trouble.
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/17/89)
In article <548.nlhp3@oracle.nl> bengsig@oracle.nl (Bjorn Engsig) writes: >|#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ >Is it unsafe because of side effects of evaluating a and b twice, ... Yes, that's what the comment means. In the file I extracted this from there was an explanation..
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/17/89)
In article <5205@mentor.cc.purdue.edu> aic@mentor.cc.purdue.edu (George A. Basar) writes: >unsafe ... since the order of evaluation is suspect. Wrong.
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/17/89)
In article <1632@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: > I can only speculate, but strcmp can give memory faults if the string >is not properly terminated. It also could fail if fed a banana cream pie. Why speculate when you don't know?
sullivan@aqdata.uucp (Michael T. Sullivan) (11/17/89)
From article <309@charyb.COM>, by will@charyb.COM (Will Crowder): > In article <308@charyb.COM> dan@charyb.UUCP (Dan Mick) writes: > >>In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >>>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) >>>/* UNSAFE */ >> >>Why the UNSAFE comment? This looks like utterly standard C to me... > > This macro is "unsafe" because the macro arguments a and b are used twice, > hence possible side effect difficulties... I once read (Indian Hills Style Guide???) that if a macro doesn't behave EXACTLY like a function (i.e. evaluates parameters once, doesn't assign to arguments, etc.) that it should be named in all caps. Thus, a person using the macro will know to be careful with it. Applies here, eh? -- Michael Sullivan uunet!jarthur.uucp!aqdata!sullivan aQdata, Inc. San Dimas, CA
peter@ficc.uu.net (Peter da Silva) (11/18/89)
> >>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ > > Why the UNSAFE comment? This looks like utterly standard C to me... Because !a! and !b! may be evaluated multiple times. What if you do something like !StrEq(*++argv, "poodle-hair");!? If !*++argv=='p'!, then it will evaluate !++argv! again. > #define StrEq(a,b) ((*(a) == *(b))?strcmp(a,b):*(a)-*(b)) This has the advantage that it will return the same result as strcmp, but is otherwise as unsafe (albeit a little more predictable, since it will always evaluate both arguments twice). -- `-_-' Peter da Silva <peter@ficc.uu.net> <peter@sugar.hackercorp.com>. 'U` -------------- +1 713 274 5180. "vi is bad because it didn't work after I put jelly in my keyboard." -- Jeffrey W Percival (jwp@larry.sal.wisc.edu)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/18/89)
In article <34344@cornell.UUCP> aitken@cs.cornell.edu (William E. Aitken) writes: | >| Why the UNSAFE comment? This looks like utterly standard C to me... | > | Consider ``StrEq(p++, q++);'' for example. I think a number of people, myself included, were looking for something specific to the macro at hand. Any macro which evaluates arguments more than once must be used with caution, but then so must malloc and free, and most of us don't label each use as unsafe. I confess that I assumed the object of the warning was a bit more subtle. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
dan@charyb.COM (Dan Mick) (11/18/89)
In article <5205@mentor.cc.purdue.edu> aic@mentor.cc.purdue.edu (George A. Basar) writes: | |In article <308@charyb.COM>, dan@charyb.COM (Dan Mick) writes: |>In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: |>>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ |> |> Why the UNSAFE comment? This looks like utterly standard C to me... |> -- | | It is not unsafe, it is just that he was looking for a a way to not |perform the strcmp if the first chars were unequal. He stated it was |unsafe(performance-wise) since the order of evaluation is suspect. So the |strcmp may be performed before the test for *(a)==*(b). | A more 'safe' StrEq is | |#define StrEq(a,b) ((*(a) == *(b))?strcmp(a,b):*(a)-*(b)) | | This will guarantee order of evaluation, but with the added overhead of |the terniary operator. Actually, the difference is only 3 instructions in |favor of my version(22 to 19). | What? First of all, I'd think Doug knows whether or not he meant "unsafe"; after all, the comment is pretty unambiguous. Secondly, how can there be any other evaluation order, considering the precedence and the short-circuit behavior of &&? This looks suspiciously like "Here, here, I've got an answer" but it's *wrong* and *unhelpful*. Sigh. -- .sig files are idiotic and wasteful.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (11/18/89)
In article <11623@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: | In article <1632@crdos1.crd.ge.COM> davidsen@crdos1.UUCP (bill davidsen) writes: | > I can only speculate, but strcmp can give memory faults if the string | >is not properly terminated. | | It also could fail if fed a banana cream pie. | | Why speculate when you don't know? Good grief! I tried to be helpful and offered one case in which the macro could be "unsafe," and you have to make a nasty comment. I didn't even comment on how uninformative the comment really was, or anything. Improperly terminated strings are fairly common, while pies are a feature of C--. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) "The world is filled with fools. They blindly follow their so-called 'reason' in the face of the church and common sense. Any fool can see that the world is flat!" - anon
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/18/89)
In article <1989Nov17.155143.2758@aqdata.uucp> sullivan@aqdata.uucp (Michael T. Sullivan) writes:
-I once read (Indian Hills Style Guide???) that if a macro doesn't behave
-EXACTLY like a function (i.e. evaluates parameters once, doesn't assign
-to arguments, etc.) that it should be named in all caps. Thus, a person
-using the macro will know to be careful with it. Applies here, eh?
I guess if you believe in the Indian Hills Style Guide it might.
robert@isgtec.UUCP (Robert Osborne) (11/18/89)
In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <4463@blake.acs.washington.edu> jimli@blake.acs.washington.edu (Jimmy Li) writes: >>char array[10000][200]; >>What is the fastest way to sort this array? > >Dump it into an external file and invoke the system sort utility on the file. This is a REALLY bad way of sorting an array this big. Write ~2M to disk, perform a system call (a fork of at least 2M) which reads ~2M from disk, sorts it, and writes ~2M back to disk, then read ~2M from disk. He asked for the FASTEST way, this shouldn't even have be given as an option. >You could also use qsort(), [...] ^^^^^^^ Probably your best bet! Use Doug's macro (below) for the compare function. Note the problem some qsort()'s have with "nearly sorted" data. If you REALLY need speed implement a sort of your own, BUT only if qsort() isn't fast enough (it should be). >but this is such a ridiculously large amount >of storage to statically allocate that you should be thinking of alternatives. Maybe all his program *does* is work with 10000 character strings of length 200 :-). Why calloc this space as the first line of your main? >#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) /* UNSAFE */ This is a great macro mind if I use it? But I see it has started another bout of comp.lang.c.morons, shame on you Doug :-) Rob. -- Robert A. Osborne | I take full responsibility for the opin... ...uunet!mnetor!lsuc!isgtec!robert | <HEY BUD, get off the box, it's my turn!> ...utzoo!lsuc!isgtec!robert | oops, sorry about that.
bph@buengc.BU.EDU (Blair P. Houghton) (11/18/89)
In article <5221@mentor.cc.purdue.edu> aic@mentor.cc.purdue.edu (George A. Basar) writes: > > I blew it, apologies to all. No excuse given. This is not unsafe for the >application specified, though, which was a fixed array of strings(aasuming >a calling convention of StrEq(a[i],b[i])). In which case one of the more common calls people would try would be of the form StrEq( a[i++], b[j++] ) Thus allowing the macro to blow up. So it really still is a hazardous piece of programming, but very useful if you're careful to remember its characteristics. --Blair "Moral: Don't judge your own code. That's what we're here for. :-)"
sullivan@aqdata.uucp (Michael T. Sullivan) (11/18/89)
From article <11635@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn): > > I guess if you believe in the Indian Hills Style Guide it might. First off, I'm not sure that's where it came from. Secondly, it still sounds like a pretty good suggestion to me. StrEq doesn't set off warning flags that it is even a macro, much less a macro with a warning comment in some include file. STREQ, however, tells me right away not to go passing (a++, b++) to it. A helpful suggestion is a helpful suggestion _whatever the source_. -- Michael Sullivan uunet!jarthur.uucp!aqdata!sullivan aQdata, Inc. San Dimas, CA
ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (11/18/89)
In article <1989Nov17.234542.3556@aqdata.uucp>, sullivan@aqdata.uucp (Michael T. Sullivan) writes:
: From article <11635@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn):
: > I guess if you believe in the Indian Hills Style Guide it might.
: Secondly, it still sounds like a pretty good suggestion to me.
: StrEq doesn't set off warning flags that it is even a macro,
: much less a macro with a warning comment in some include file.
Anyone who relies on unsafe macro names being written in capitals is
asking for trouble. Common practice has been to use all caps even for
#defined constants, e.g.
#define NULL 0
#define EOF (-1)
In a great many important cases, all caps has NOT meant "unsafe", it has
just meant "macro". Further, there are some well-established unsafe
macros whose names don't contain any capitals at all, obscure things like
getc
putc
putchar
It doesn't seem like a good idea to place any reliance on a convention which
hasn't been followed in the old C library itself.
ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (11/18/89)
In article <214@isgtec.UUCP>, robert@isgtec.UUCP (Robert Osborne) writes: > In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: > >In article <4463@blake.acs.washington.edu> > > jimli@blake.acs.washington.edu (Jimmy Li) writes: > >>char array[10000][200]; > >>What is the fastest way to sort this array? > > > >Dump it into an external file and invoke the system sort utility on the file. > This is a REALLY bad way of sorting an array this big. ... He asked > for the FASTEST way, this shouldn't even have be given as an option. Oh ah, tried it, have you? In fact it is not a REALLY bad way of sorting. Depending on whether you have virtual memory, how much of it you have, what the page replacement algorithm is, how much real memory you have, &c &c sorting using an external file with a system sort utility may end up doing FEWER I/O operations than an internal sort. In fact in UNIX systems sort(1) uses an algorithm (merging) which is intrinsically faster than qsort(3)'s "quicker-sort" algorithm, but alas, the point at which it switches over to merging depends on memory size &c &c &c. A general point about questions like this: any answer we give is likely to be an answer to the wrong question. I find myself wondering why anyone wants to sort char array[10000][200]. Why exactly those number? Are the records really fixed length? How does anyone come by 200 characters without any substructure? If there is substructure, why not declare the array more explicitly and exploit the substructure in the sorting algorithm? For example, suppose that the records contain a field which is, for argument's sake, a date YYMMDD. By setting up a table of 366 buckets (one for each bucket) the records could be distributed among the buckets in O(N) time -- where N is the number of records -- and then the buckets could be sorted using a list merge on the rest of the fields. This would reduce the problem from O(N.lgN) cost to O(N.lg(N/365)), about a factor of 2.3 faster. Without knowing what Jimmy Li is really doing, we might give bad advice. Indeed, the very best way of ensuring that a table is sorted is to generate it in sorted order in the first place; without knowing what he is up to I can't tell whether the "null algorithm" can be used for his problem or not. There is another extremely important reason why Doug Gwyn's suggestion is a REALLY good one even if it proves slower in some cases. That is that the system sort utility provides you with a mini-language for specifying comparisons which can make it a lot easier to get your sort right.
sullivan@aqdata.uucp (Michael T. Sullivan) (11/19/89)
From article <2741@munnari.oz.au>, by ok@mudla.cs.mu.OZ.AU (Richard O'Keefe): > > Anyone who relies on unsafe macro names being written in capitals is > asking for trouble. I'm not relying on anything. All I said was that I thought it was a good idea. Better than "relying" on /*DANGEROUS*/ in an include file. That's all. -- Michael Sullivan uunet!jarthur.uucp!aqdata!sullivan aQdata, Inc. San Dimas, CA
gwyn@smoke.BRL.MIL (Doug Gwyn) (11/19/89)
In article <2742@munnari.oz.au> ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) writes: >> In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >> >Dump it into an external file and invoke the system sort utility on the file. >A general point about questions like this: any answer we give is likely >to be an answer to the wrong question. I find myself wondering why anyone >wants to sort char array[10000][200]. That was the main point that I wanted to call attention to. >There is another extremely important reason why Doug Gwyn's suggestion is >a REALLY good one even if it proves slower in some cases. That is that >the system sort utility provides you with a mini-language for specifying >comparisons which can make it a lot easier to get your sort right. That may or may not be important, depending on the application. One thing one should always try to keep in mind is that for problems like sorting, much effort has already been invested in finding good solutions, and it is likely to be counterproductive to spend one's own limited time reinventing wheels poorly when good wheels are sitting on the shelf. qsort() probably suffices for this case, system("sort ...") for complex key matching requirements (too bad a comparable binary-record sort utility isn't a standard UNIX utility), and if none of the existing tools has the right properties only THEN would one program his own sorting algorithm.
mcdonald@uxe.cso.uiuc.edu (11/19/89)
>I wanted to know what is the best way to compare two strings? >For example, I have the following decl. >char array[10000][200]; >What is the fastest way to sort this array? (I heard that 'strcmp' is slow >if the strings are long. Why?) Someone suggested writing to disk and using an external sort program. That seems a little extreme, especially since the vast majority of routines don't HAVE an external sort routine. The C "qsort" is surely worth a try, otherwise we really should suggest books for him to read. I don't understand the flames about the "unsafe" StrEq macro. Look folks, lots of things are unsafe - motorcycles, steaks, C. For safety you need a toy language. A simple suggestion - is it REALLY too hard to follow? - is to RTFM, which should warn about unsafe uses of functions or macros - or to RTF actual declaration of the macro itself. The question is moot if he calls "qsort", as it requires a function, not a macro. Doug McDonald
lazer@mnetor.UUCP (Lazer Danzinger) (11/20/89)
In article <309@charyb.COM> will@charyb.UUCP (Will Crowder) writes: +In article <308@charyb.COM> dan@charyb.UUCP (Dan Mick) writes: + +>In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: +>>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) +>>/* UNSAFE */ +> +>Why the UNSAFE comment? This looks like utterly standard C to me... + +This macro is "unsafe" because the macro arguments a and b are used twice, +hence possible side effect difficulties... + I see no possibility of side effects. What is unsafe is the referencing of the location where a and b point to, without first checking that they are not null pointers.
cpcahil@virtech.uucp (Conor P. Cahill) (11/21/89)
In article <5139@mnetor.UUCP>, lazer@mnetor.UUCP (Lazer Danzinger) writes: > +This macro is "unsafe" because the macro arguments a and b are used twice, > +hence possible side effect difficulties... > I see no possibility of side effects. What is unsafe is the > referencing of the location where a and b point to, without > first checking that they are not null pointers. wrongo. you (the programmer) must do that before calling StrEqu. This macro was designed to be a replacement for using strcmp directly and therefore has the same requirements as the strcmp function itself. Calling strcmp with a null pointer will result in a core dump on lots of machines and undefined behavior on all others. The side effects have been addressed in lots of other posts in this thread. -- +-----------------------------------------------------------------------+ | Conor P. Cahill uunet!virtech!cpcahil 703-430-9247 ! | Virtual Technologies Inc., P. O. Box 876, Sterling, VA 22170 | +-----------------------------------------------------------------------+
lee@sq.sq.com (Liam R. E. Quin) (11/21/89)
Some general thoughts on STREQ, sorting, arrays and cranberry sauce...
Doug Gwyn mentions
>>> #define StrEq(a,b) (*(a) == *(b) && strcmp(a, b) == 0) /*UNSAFE*/
and comments it as UNSAFE...
I first saw this as a Henry Spencer-ism, and hence generally write
#define STREQ(henry, utzoo) (*(henry) == *(utzoo) && !strcmp(henry, utzoo))
(this has the same side-effect problems, but is a little more fun).
For the original question, about sorting a large "in-core" array,
the 4.3 BSD qsort routine is publicly available by uucp and ftp, and was
on the Tahoe release. I mention this because it has much better behaviour
in some cases, tending not to recurse as deeply as the older ones.
strcmp() is usually quite fast, but if you are calling it a lot of times
and this is a problem, at the loss of some generality and flexibility I
suggest you take the BSD qsort() (or write your own) and WIRE IN the
compare function to be the STREQ macro.
Otherwise, STREQ does not help at all, as you have to give qsort(3) a
function, not a macro.
You might also want to think about using a different sort algorithm
altogether. One that sorts one part of the large array and then
goes on to look at another may take more swaps/compares, but will
exhibit greater locality of reference -- in other words, will cause
fewer page faults! This might be *much* more significant!
Of course, if the strings themselves came from malloc individually,
you can't predict where they are (in general) and this won't work, but I
seem to remember seeing that you had an array
char WhateverItWasCalled[1000][200];
implying a thousand strings each of exactly 200 characters.
If the strings are in practice generally shorter, you may get a big win
by calling malloc for each of them, possibly generating them in sorted
order (for example using a tree or linked list).
That way, you don't use up so much storage, and things will probably
work out faster in the end. 1,000 calls to malloc doesn't seem to take
very long -- especially if you have the newer System V malloc(3X).
Hope this helps.
Lee (I lied about the cranberry sauce)
--
Liam R. Quin, Unixsys (UK) Ltd [note: not an employee of "sq" - a visitor!]
lee@sq.com (Whilst visiting Canada from England)
lee@anduk.co.uk (Upon my return to England at Christmas)
matsl@nada.kth.se (Mats Luthman) (11/21/89)
In article <5139@mnetor.UUCP> lazer@mnetor.UUCP (Lazer Danzinger) writes: >+>In article <11605@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >+>>#define StrEq( a, b ) (*(a) == *(b) && strcmp( a, b ) == 0) >+>>/* UNSAFE */ >+ > I see no possibility of side effects. What is unsafe is the > referencing of the location where a and b point to, without > first checking that they are not null pointers. Try StrEq(foo--, bar--)
decot@hpisod2.HP.COM (Dave Decot) (11/29/89)
> >You could also use qsort(), [...] > ^^^^^^^ > Probably your best bet! Use Doug's macro (below) for the compare > function. Good luck passing the address of a macro to qsort()... Dave