udi@cs.arizona.edu (Udi Manber) (06/17/91)
We are proud to announce the release of version 1.0 of agrep - a new tool for fast text searching with errors. agrep is similar to egrep (or grep or fgrep), but it is much more general. It is based on an entirely different algorithm. The three most significant features of agrep that are not supported by the grep family are 1) the ability to search for approximate patterns; for example, "agrep -2 homogenos foo" will find homogeneous as well as any other word that can be obtained from homogenos with at most 2 substitutions, insertions, or deletions. 2) agrep is record oriented rather than just line oriented; a record is by default a line, but it can be user defined; for example, "agrep -d '^From ' 'pizza' mbox" outputs all mail messages that contain the keyword "pizza". Another example: "agrep -d '$$' pattern foo" will output all paragraphs (separated by an empty line) that contain pattern. 3) multiple patterns with AND (or OR) logic queries. For example, "agrep -d '^From ' 'burger,pizza' mbox" outputs all mail messages containing at least one of the two keywords (, stands for OR). "agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages containing both keywords. Putting these options together one can ask queries like agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by TheAuthor dealing with curriculum. Two errors are allowed (e.g., one in TheAuthor and one in Curriculum, or two in one of them), but they cannot be in either CACM or the year (the <> brackets forbid errors in the pattern between them). Other features include searching for regular expressions (with or without errors), unlimited wild cards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as, say, 2 substitutions or 3 insertions, restricting parts of the query to be exact and parts to be approximate, and many more. agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5) as agrep/agrep.tar.Z (or in uncompressed form as agrep/agrep.tar). The tar file contains the source code (in C), man pages (agrep.1), and a postscript file (agrep.ps) of a technical report (TR #91-11) describing the design and implementation of agrep. This is the first version of agrep. There may be some bugs, especially with complicated patterns and a combination of options. Please mail bug reports (or any other comments) to sw@cs.arizona.edu or to udi@cs.arizona.edu. We would appreciate if users notify us (at the address above) of any extensions, improvements, or interesting uses of this software. June 16, 1991.
gt0178a@prism.gatech.EDU (Jim Burns) (06/18/91)
in article <4328@optima.cs.arizona.edu>, udi@cs.arizona.edu (Udi Manber) says: > We are proud to announce the release of version 1.0 of agrep - a new tool > for fast text searching with errors. You mean grep isn't buggy enough for you as is? Seriously, sounds interesting. -- BURNS,JIM (returned student) Georgia Institute of Technology, 30178 Georgia Tech Station, Atlanta Georgia, 30332 | Internet: gt0178a@prism.gatech.edu uucp: ...!{decvax,hplabs,ncar,purdue,rutgers}!gatech!prism!gt0178a