jaw@eos.UUCP (James A. Woods) (07/26/88)
# wild card (n.) 1. A symbol which defaults to anything and is therefore mandatory at moments of doubt. See DEFAULT. 2. A tab card randomly inserted in a pack to enliven the action. See also LUDDITE. -- Stan Kelly-Bootle, The Devil's DP Dictionary I write this note as a veteran of the "GREP WARS." To those users of my venerable 2-year-old fast [ef]?grep from comp.sources, this message is a teaser for yet another grep replacement. You may be aware of Andrew Hume's laudable work unifying the various Bell Labs grep versions. It is especially nice to see AT&T adapt the speedup ideas concerning hybridization of the Boyer/Moore/Gosper algorithm with regular expression search, as manifested in the comp.sources package. Hume has a forthcoming paper in "Software Practice and Experience" detailing a bit of his rewrite, which I had the pleasure of formally reviewing. (The SP&E paper, however, discusses I/O tweaks more than it does algorithms.) If you follow the GNU project, you may also be aware of Mike Haertel's valiant generic grep/egrep clone (with added value of sorts, particularly in the way of a line context option). It is free of AT&T license restrictions. My old public-domain effort is also unrestricted, except for the odd choice of calling blackbox AT&T binaries from certain weird corners of the program. This has presented difficulties for GNU, now remedied. Without going into myriad detail at this point, I, armed with invaluable assistance from Arthur David Olson, have sped up Haertel's automaton to incorporate BMG for a much broader class of expressions than "old" fast egrep. We've tested the GNU/Haertel code (plus our additions) on everything from Scott Anderson's infamous "Khadaffi" matcher to the 121 egrep regression microtests from Henry Spencer. I am tracking the official GNU distribution with the supplement until such time as the improvements are folded in (pending legal signoff, new thoughts from Haertel, etc.) Why all this hash? If you are an AT&T loyalist, little of this matters as long as you can wait for some kind vendor to port grep from Unix V9 to System 5 rel4 on the 3B series to your favorite desktop or supercomputer. This could be well over a year. (Witness the two-year hiatus from BSD 4.3 code to SunOS 4.0.) Better yet would be access from the AT&T Toolchest, which would be glad to nickel-and-dime you with yet another license agreement. Thus GNU egrep, with fast algorithms. Stay tuned ... James A. Woods NASA Ames Research Center ames!jaw jaw@ames.arc.nasa.gov P.S. State-of-the-art GNU 'fgrep', with source, must wait until one of you intrepid souls implement it. See the 1980 Commentz-Walter paper "A String Matching Algorithm Fast on the Average", in "Automata, Languages, and Programming", Springer Verlag Lecture Notes in Computer Science #71, Goos & Hartmanis, eds., Graz, Austria, July 1979. The general Commentz-Walter method is 4-5X faster than Aho 'fgrep', so it deserves re-implementation for GNU.