rbj@cmr.icst.nbs.gov (Root Boy Jim) (06/17/88)
? > While we're on the subject of efficiency, cmp is coded wrong. It should ? > first stat the two files to be compared. If the character count is different, ? > so are the files. And files tend to be different more often than the same. ? ? This is *only* useful if the "-s" flag is being used, and it's not ? reading from stdin. How often does that happen? Isn't common usage ? "cmp foo bar", not "cmp -s foo bar"? ? ? --keith Well you have a good point, and force me to restate my case. Mostly what I want to know is whether they are the same or different, not where, and I want it to speak if different and be silent if they are identical. Thus, I want the equivalent of a command like so: alias compare 'cmp -s \!^ \!$ || echo \!^ and \!$ differ' Of course I am free to implement such a command if I wish. And often times it would be much faster than vanilla cmp. Once again we have proved that the conclusions we reach depend on the premises we take for granted. This whole subject started from context diffs, so since I have your ear bent I will bend it a bit more. Actually, this has to do with recursive diffs. 1) Symlinks. Suppose I have two trees, say /usr/include and /usr/outclude, but each has a symlink sys pointing to the same place. Do I really want to follow them just to say the subtrees are identical? We can't use -h (how did -h get to stand for symlink anyway?), but maybe we can define YAO to {not,} follow symlinks. 2) Suppose they point different places? Follow or not, depending on option. 3) This sets us up for removing identical files in the first tree. I wrote a (n UGLY) script which does this; it is nontrivial and slow. Of course it would be aided and abetted by an "if -l file echo file is symlink" switch in csh. Maybe "if -h file..." :-) 4) Diff -r prints "Binary files X and Y differ", but always does a diff on source files. Often I would prefer "Source files X and Y differ". YAO, -q sez "just tell me they're different". More elegantly/foolishly, -p prog sez which prog to run instead of diff. I prefer -q. 5) Kudos for carefully distinguishing output messages by types: "Only in ...", "Files ... are identical", "Binary ... differ", etc. This allows me to "diff -r | grep ^X" where X is one of B, F, O. The addition of "Source files ... differ" would also be unique. A tiny nit: Perhaps the "Only" messages should say "Old file: $1/name" and "New file: $2/name". I'm not entirely satisfiled with those messages, but you get the idea. 6) Another symlink glitch: ls -F prints symlinks to directorys with a trailing '/', so there is no easy way to distinguish them from real directorys. How about a trailing '\' instead? More abstruse is printing a symlink that points to nowhere with a trailing '?', but I don't really care about that and it is extra work to do. The former idea I have come cherish since I hacked it in tho. 7) I don't ask for much, do I :-? (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa> National Bureau of Standards Flamer's Hotline: (301) 975-5688 The opinions expressed are solely my own and do not reflect NBS policy or agreement Careful with that VAX Eugene! I'm having a BIG BANG THEORY!!
andrew@alice.UUCP (06/17/88)
on most implementations, cmp has two getchars (or getc's) in the inner loop. we got a factor of five improvement by reading in blocks and using memcmp.
jaw@eos.UUCP (James A. Woods) (06/22/88)
From article <7993@alice.UUCP>, by andrew@alice.UUCP: > > > on most implementations, cmp has two getchars (or getc's) in the inner loop. > we got a factor of five improvement by reading in blocks and using > memcmp. the 'cmp' on the cray two here is one of the rare unix commands which is vectorized, though some poor soul had to code the loop as a fortran subroutine (no vector C here). it doesn't count lines (though nobody cares, i don't believe this is inherently nonvectorizable). you might think a regular byte-oriented 'cmp' would be i/o bound on such a beast -- not true by a longshot; incidently this was one motivation for my development of boyer/moore/gosper 'egrep' two years back.