[comp.lang.perl] bibtex keyword search program

johnh@nottingham.cs.ucla.edu (John Heidemann) (11/20/90)

Following is a Perl script which allows keyword searches
of bibtex databases, showing the entire database entry
when a match is found.

A man page and installation instructions are included.


   -John Heidemann


---------------+--------------------------------------------------------------
John Heidemann | What, your editor plays Tetris with it's built-in LISP?  When
UCLA           | _I_ was a boy, we played by tossing tape write rings about...
---------------+--------------------------------------------------------------

----- cut here -----
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	README
#	lookbibtex.1
#	lookbibtex
# This archive created: Mon Nov 19 20:26:27 1990
export PATH; PATH=/bin:$PATH
if test -f 'README'
then
	echo shar: will not over-write existing file "'README'"
else
cat << \SHAR_EOF > 'README'

To install lookbibtex, change the #! line of lookbibtex to 
the path for Perl.  You may also wish to change
the default database; if so edit the $defaultfile variable.

Lookbibtex is released under the GNU Public License, Version 1 (Feb 89).
A copy of the GPL should be included with your Perl distribution.

Any comments are welcome.

   -John Heidemann
    <johnh@cs.ucla.edu>



 
SHAR_EOF
fi # end of overwriting check
if test -f 'lookbibtex.1'
then
	echo shar: will not over-write existing file "'lookbibtex.1'"
else
cat << \SHAR_EOF > 'lookbibtex.1'
.\" lookbibtex.1
.TH LOOKBIBTEX 1 "19 November 1990"

.SH NAME
lookbibtex \- find references in a bibtex database

.SH SYNOPSIS
.B lookbibtex
[ -k
.I keyword
]
[
.I bibfile.bib
]
.I regexp

.SH DESCRIPTION
lookbibtex searches through a bibtex 
.I bibfile.bib
database, printing entries that match
.I regexp.
See 
.BR bibtex (1)
for a description of the bibtex database.
.I Regexp
is a Perl regular expression.
See
.BR perl (1)
for an explanation of differences between perl and standard regular
expressions.

Searches can be limited to particular bibtex fields
with the -k option.

To do "and" searches on two fields use shell pipes, reading
the output of one search as the bibliography of the second.
For example, to find what someone named Kafka wrote
about emacs keyboard layout, do:
.IP
.B
lookbibtex -k author kafka | looktexbib -k title - meta
.LP

To do "or" searches, use regular expressions.
For example, you could be concerned with 
only careful authors, like Kunth and Kafka:
.IP
.B
lookbibtex -k author 'kunth|kafka'
.LP

More sophisticated searches can be achieved by combining
these techniques.

.SH AUTHOR
John Heidemann <johnh@cs.ucla.edu>

.SH SEE ALSO
.BR bibtex (1),
.BR perl (1)

.SH BUGS
lookbibtex is written in Perl, so it will not run on machines
which do not have perl installed (although this arguably
a bug of the person too lazy to install such a useful tool).

Multiple keywords on one line will be missed.

The @ that begins a bibtex entry, and the } which end it
must be the first non-whitespace character on their line, or they
will be missed.

SHAR_EOF
fi # end of overwriting check
if test -f 'lookbibtex'
then
	echo shar: will not over-write existing file "'lookbibtex'"
else
cat << \SHAR_EOF > 'lookbibtex'
#!/usr/local/bin/perl

#
# Look in to a bib file.
# Copyright (C) 1990 by John Heidemann
# This is distributed under the GNU Public Licence, Version 1 (Feb 89).
# See the Perl documentation for a copy of that license.
#
#  4-Oct-90 it is hacked together.
# 19-Nov-90 Now it remembers "'s and join such lines.
#	It also removes nasty characters like {} from the search string.
#
# This program relies on the convention that the closing } of a 
# bib entry is the only } in the left-most column,
# and that the opening @ is also in the first column.
#



$* = 1;   # make searches on vars with imbedded newlines work

#
# customize this to whatever is right locally
#
$defaultfile = "/u/s9/u/ficus/DOC/ficus.bib";
$badkeys = "string";    # keys to ignore


#
# do argument processing
#

if ($#ARGV >= 1 && $ARGV[0] eq "-k") {
	$keyword = $ARGV[1];
	shift (@ARGV);
	shift (@ARGV);
};

if ($#ARGV == 0) {
	$file = $defaultfile;
	$pattern = $ARGV[0];
} elsif ($#ARGV == 1) {
	$file = $ARGV[0];
	$pattern = $ARGV[1];
} else {
	die ("Usage: lookbib [-k keyword] [bibfile.bib] regexp\n" .
		"   Keyword restricts the regexp search to that bibtex " .
			"field name (author, etc.)\n" .
		"   Default bibfile is $defaultfile, - indicates stdin.\n" .
		"   Regexp is a Perl regexp.\n");
};

#
# handle the keyword by modifying the pattern
#
if (defined($keyword)) {
	$pattern = "^\\s*${keyword}\\s*=.*${pattern}";
#	print "pattern is $pattern\n";
};



#
# looking for beginning of bib entry is state 1, in bib is state 2
#
$state = 1;


#
# Certain keys we really want to ignore because
# they're not bib entries.  They're listed here.
#
@badkeys = split(/,/, $badkeys);
foreach $i (@badkeys) {
	$badkeys{$i} = "bad";   # just make them defined
};



#
# To do searches right, we have to make everything
# for a field on one line.
#    This routine does that, and also gets rid of {}'s
# which tend to interfere with searches.
#
sub printtosearch {
	local ($print) = @_;
	local ($search, $mode) = ("", 1);

	@lines = split(/\n/, $print);
	@lines[0] =~ s/{/ { /;
	foreach $ln (@lines) {
		$ln =~ s/[{}]//g;   # remove curly brackets
		$mode = !$mode  if (($ln =~ tr/"/"/) % 2 == 1);
		$search .= $ln;
		$search .= "\n"  if ($mode);
	};
	return $search;
}




open (INF, "<$file") || die ("cannot open bibfile $file");

while (<INF>) {
#	print "line ", $i++, " state=$state: $_\n";

	if ($state == 1) {
		if (/^@(\w+)/) {   # beginning of entry
			$key = $1 =~ tr/A-Z/a-z/;   # case insensitive keywords
			if (! defined($badkeys{$1})) {
				$state = 2;
				$bibentry = $_;
			};
		};
	} elsif ($state == 2) {
		$bibentry .= $_;
		if (/^}/) {   # ending
			$searchentry = &printtosearch($bibentry);
			print "$bibentry\n"  if ($searchentry =~ /$pattern/i);
			$state = 1;
		}
	} else {
		die ("state problem, $state\n");
	};
}


SHAR_EOF
chmod +x 'lookbibtex'
fi # end of overwriting check
#	End of shell archive
exit 0