johnh@nottingham.cs.ucla.edu (John Heidemann) (11/29/90)
Following is version 1.1 of lookbibtex, a Perl script which allows keyword searches of bibtex databases, showing the entire database entry when a match is found. This version corrects a few bugs in the original version. Multi-line fields with curley braces are now supported, and umlauts are correctly handled. - John Heidemann ---------------+-------------------------------------------------------------- John Heidemann | Fortune of the week: Any sufficiently advanced technology UCLA CSD | is indistinguishable from a rigged demo. ---------------+-------------------------------------------------------------- #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # README # lookbibtex.1 # lookbibtex # This archive created: Wed Nov 28 17:44:04 1990 export PATH; PATH=/bin:$PATH if test -f 'README' then echo shar: will not over-write existing file "'README'" else cat << \SHAR_EOF > 'README' lookbibtex 1.1 -------------- To install lookbibtex, change the #! line of lookbibtex to the path for Perl. You may also wish to change the default database; if so edit the $defaultfile variable. Lookbibtex is released under the GNU Public License, Version 1 (Feb 89). A copy of the GPL should be included with your Perl distribution. This is version 1.1, it fixes three problems of release 1.0. End right curley brackets not in the first column were missed in 1.0. Umlauts and multi-line fields with curley brackets are now handled correctly. Perl is very pratical language from Larry Wall. It may be obtained by anonymous ftp from jpl-devvax.jpl.nasa.gov in pub/perl.3.0, or by e-mail to Larry Wall <lwall@jpl-devvax.jpl.nasa.gov>. Any comments are welcome. -John Heidemann <johnh@cs.ucla.edu> 28 November 1990 SHAR_EOF fi # end of overwriting check if test -f 'lookbibtex.1' then echo shar: will not over-write existing file "'lookbibtex.1'" else cat << \SHAR_EOF > 'lookbibtex.1' .\" lookbibtex.1 .TH LOOKBIBTEX 1 "19 November 1990" .SH NAME lookbibtex \- find references in a bibtex database .SH SYNOPSIS .B lookbibtex [ -k .I keyword ] [ .I bibfile.bib ] .I regexp .SH DESCRIPTION lookbibtex searches through a bibtex .I bibfile.bib database, printing entries that match .I regexp. See .BR bibtex (1) for a description of the bibtex database. .I Regexp is a Perl regular expression. See .BR perl (1) for an explanation of differences between perl and standard regular expressions. Searches can be limited to particular bibtex fields with the -k option. To do "and" searches on two fields use shell pipes, reading the output of one search as the bibliography of the second. For example, to find what someone named Kafka wrote about emacs keyboard layout, do: .IP .B lookbibtex -k author kafka | looktexbib -k title - meta .LP To do "or" searches, use regular expressions. For example, you could be concerned with only careful authors, like Kunth and Kafka: .IP .B lookbibtex -k author 'kunth|kafka' .LP More sophisticated searches can be achieved by combining these techniques. .SH AUTHOR John Heidemann <johnh@cs.ucla.edu> .SH SEE ALSO .BR bibtex (1), .BR perl (1) .SH BUGS lookbibtex is written in Perl, so it will not run on machines which do not have perl installed (although this arguably a bug of the person too lazy to install such a useful tool). Multiple keywords on one line will be missed. This program does not implement a full bibtex parser, just a good approximation. Because of this, it may fail on unusual bibtex files. In particular, the @ that begins a bibtex entry, and the } which end it must be the first non-whitespace character on their line, or they will be missed. Also, multi-line fields surrounded with double quotes or curley brackets are handled correctly, but ugly combinations of quotes and brackets and backslashed versions of the same mail fail. Umlaut accents are removed from the search string. I make no predictions as to how this program will react to bibtex files with syntax errors. SHAR_EOF fi # end of overwriting check if test -f 'lookbibtex' then echo shar: will not over-write existing file "'lookbibtex'" else cat << \SHAR_EOF > 'lookbibtex' #!/usr/local/bin/perl # # lookbibtex 1.1 # Look in to a bib file. # # Copyright (C) 1990 by John Heidemann # This is distributed under the GNU Public Licence, Version 1 (Feb 89). # See the Perl documentation for a copy of that license. # # 4-Oct-90 it is hacked together. # 19-Nov-90 Now it remembers "'s and join such lines. # It also removes nasty characters like {} from the search string. # 20-Nov-90 Umlaut accents handled correctly. # 28-Nov-90 A simple heuristic to handle multi-line fields with {}'s is added. # In addition, we compress all whitespace to single spaces in the # searched version. # # This program relies on the convention that the closing } of a # bib entry is the only } in the first non-whitespace column, # and that the opening @ is also there. # $* = 1; # make searches on vars with imbedded newlines work # # customize this to whatever is right locally # $defaultfile = "/u/s9/u/ficus/DOC/ficus.bib"; $badkeys = "string"; # keys to ignore # # do argument processing # if ($#ARGV >= 1 && $ARGV[0] eq "-k") { $keyword = $ARGV[1]; shift (@ARGV); shift (@ARGV); }; if ($#ARGV == 0) { $file = $defaultfile; $pattern = $ARGV[0]; } elsif ($#ARGV == 1) { $file = $ARGV[0]; $pattern = $ARGV[1]; } else { die ("Usage: lookbib [-k keyword] [bibfile.bib] regexp\n" . " Keyword restricts the regexp search to that bibtex " . "field name (author, etc.)\n" . " Default bibfile is $defaultfile, - indicates stdin.\n" . " Regexp is a Perl regexp.\n"); }; # # handle the keyword by modifying the pattern # if (defined($keyword)) { $pattern = "^\\s*${keyword}\\s*=.*${pattern}"; # print "pattern is $pattern\n"; }; # # looking for beginning of bib entry is state 1, in bib is state 2 # $state = 1; # # Certain keys we really want to ignore because # they're not bib entries. They're listed here. # @badkeys = split(/,/, $badkeys); foreach $i (@badkeys) { $badkeys{$i} = "bad"; # just make them defined }; # # To do searches right, we have to make everything # for a field on one line. # This routine does that, and also gets rid of {}'s # which tend to get in the way for searches. In the # same vein, it collapses all whitespace to single spaces. # # To know when to join lines, we use two simple heuristics: # is there are a odd number of "'s on a line, we must enter or exit # multi-line mode. If there are more {'s than }'s, we must enter, # and if there are more }'s than {'s we must exit (anything on # the first line is ignored). # sub printtosearch { local ($print) = @_; local ($search, $mode) = ("", 1); local ($opencurley, $closecurley) = (0,0); @lines = split(/\n/, $print); @lines[0] =~ s/{/ /; foreach $ln (@lines) { # remove and count curley brackets $opencurley = ($ln =~ s/[{]//g); $closecurley = ($ln =~ s/[}]//g); if ($opencurley-$closecurley < 0) { $mode = 1; } elsif ($opencurley-$closecurley > 0) { $mode = 0; } else { # remove umlauts so quote handling works, # and then change modes if required. $ln =~ s/\\"//g; $mode = !$mode if (($ln =~ tr/"/"/) % 2 == 1); }; $search .= $ln; $search .= "\n" if ($mode); }; $search =~ s/[ \t]+/ /g; return $search; } open (INF, "<$file") || die ("cannot open bibfile $file"); while (<INF>) { # print "line ", $i++, " state=$state: $_\n"; if ($state == 1) { if (/^[ \t]*@(\w+)/) { # beginning of entry $key = $1 =~ tr/A-Z/a-z/; # case insensitive keywords if (! defined($badkeys{$1})) { $state = 2; $bibentry = $_; }; }; } elsif ($state == 2) { $bibentry .= $_; if (/^[ \t]*}/) { # ending $searchentry = &printtosearch($bibentry); print "$bibentry\n" if ($searchentry =~ /$pattern/i); $state = 1; } } else { die ("state problem, $state\n"); }; } SHAR_EOF chmod +x 'lookbibtex' fi # end of overwriting check # End of shell archive exit 0