[comp.text.tex] lookbibtex, version 1.1

johnh@nottingham.cs.ucla.edu (John Heidemann) (11/29/90)
Following is version 1.1 of lookbibtex,
a Perl script which allows keyword searches
of bibtex databases, showing the entire database entry
when a match is found.

This version corrects a few bugs in the original
version.  Multi-line fields with curley braces
are now supported, and umlauts are correctly handled.

   - John Heidemann


---------------+--------------------------------------------------------------
John Heidemann | Fortune of the week:  Any sufficiently advanced technology
UCLA CSD       |    is indistinguishable from a rigged demo.
---------------+--------------------------------------------------------------

#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	README
#	lookbibtex.1
#	lookbibtex
# This archive created: Wed Nov 28 17:44:04 1990
export PATH; PATH=/bin:$PATH
if test -f 'README'
then
	echo shar: will not over-write existing file "'README'"
else
cat << \SHAR_EOF > 'README'
lookbibtex 1.1
--------------

To install lookbibtex, change the #! line of lookbibtex to 
the path for Perl.  You may also wish to change
the default database; if so edit the $defaultfile variable.

Lookbibtex is released under the GNU Public License, Version 1 (Feb 89).
A copy of the GPL should be included with your Perl distribution.

This is version 1.1, it fixes three problems of release 1.0.  End
right curley brackets not in the first column were missed in 1.0.
Umlauts and multi-line fields with curley brackets are now handled
correctly.

Perl is very pratical language from Larry Wall.  It may be obtained by
anonymous ftp from jpl-devvax.jpl.nasa.gov in pub/perl.3.0, or by
e-mail to Larry Wall <lwall@jpl-devvax.jpl.nasa.gov>.

Any comments are welcome.

   -John Heidemann
    <johnh@cs.ucla.edu>
    28 November 1990




 
SHAR_EOF
fi # end of overwriting check
if test -f 'lookbibtex.1'
then
	echo shar: will not over-write existing file "'lookbibtex.1'"
else
cat << \SHAR_EOF > 'lookbibtex.1'
.\" lookbibtex.1
.TH LOOKBIBTEX 1 "19 November 1990"

.SH NAME
lookbibtex \- find references in a bibtex database

.SH SYNOPSIS
.B lookbibtex
[ -k
.I keyword
]
[
.I bibfile.bib
]
.I regexp

.SH DESCRIPTION
lookbibtex searches through a bibtex 
.I bibfile.bib
database, printing entries that match
.I regexp.
See 
.BR bibtex (1)
for a description of the bibtex database.
.I Regexp
is a Perl regular expression.
See
.BR perl (1)
for an explanation of differences between perl and standard regular
expressions.

Searches can be limited to particular bibtex fields
with the -k option.

To do "and" searches on two fields use shell pipes, reading
the output of one search as the bibliography of the second.
For example, to find what someone named Kafka wrote
about emacs keyboard layout, do:
.IP
.B
lookbibtex -k author kafka | looktexbib -k title - meta
.LP

To do "or" searches, use regular expressions.
For example, you could be concerned with 
only careful authors, like Kunth and Kafka:
.IP
.B
lookbibtex -k author 'kunth|kafka'
.LP

More sophisticated searches can be achieved by combining
these techniques.

.SH AUTHOR
John Heidemann <johnh@cs.ucla.edu>

.SH SEE ALSO
.BR bibtex (1),
.BR perl (1)

.SH BUGS
lookbibtex is written in Perl, so it will not run on machines
which do not have perl installed (although this arguably
a bug of the person too lazy to install such a useful tool).

Multiple keywords on one line will be missed.

This program does not implement a full bibtex parser,
just a good approximation.  Because of this, it may
fail on unusual bibtex files.  In particular,
the @ that begins a bibtex entry, and the } which end it
must be the first non-whitespace character on their line, or they
will be missed.
Also, multi-line fields surrounded with double quotes or
curley brackets
are handled correctly,
but ugly combinations of quotes and brackets and backslashed
versions of the same mail fail.
Umlaut accents are removed from the search
string.

I make no predictions as to how this program will react 
to bibtex files with syntax errors.


SHAR_EOF
fi # end of overwriting check
if test -f 'lookbibtex'
then
	echo shar: will not over-write existing file "'lookbibtex'"
else
cat << \SHAR_EOF > 'lookbibtex'
#!/usr/local/bin/perl

#
# lookbibtex 1.1
# Look in to a bib file.
#
# Copyright (C) 1990 by John Heidemann
# This is distributed under the GNU Public Licence, Version 1 (Feb 89).
# See the Perl documentation for a copy of that license.
#
#  4-Oct-90 it is hacked together.
# 19-Nov-90 Now it remembers "'s and join such lines.
#	It also removes nasty characters like {} from the search string.
# 20-Nov-90 Umlaut accents handled correctly.
# 28-Nov-90 A simple heuristic to handle multi-line fields with {}'s is added.
#       In addition, we compress all whitespace to single spaces in the
#	searched version.
#
# This program relies on the convention that the closing } of a 
# bib entry is the only } in the first non-whitespace column,
# and that the opening @ is also there.
#



$* = 1;   # make searches on vars with imbedded newlines work

#
# customize this to whatever is right locally
#
$defaultfile = "/u/s9/u/ficus/DOC/ficus.bib";
$badkeys = "string";    # keys to ignore


#
# do argument processing
#

if ($#ARGV >= 1 && $ARGV[0] eq "-k") {
	$keyword = $ARGV[1];
	shift (@ARGV);
	shift (@ARGV);
};

if ($#ARGV == 0) {
	$file = $defaultfile;
	$pattern = $ARGV[0];
} elsif ($#ARGV == 1) {
	$file = $ARGV[0];
	$pattern = $ARGV[1];
} else {
	die ("Usage: lookbib [-k keyword] [bibfile.bib] regexp\n" .
		"   Keyword restricts the regexp search to that bibtex " .
			"field name (author, etc.)\n" .
		"   Default bibfile is $defaultfile, - indicates stdin.\n" .
		"   Regexp is a Perl regexp.\n");
};

#
# handle the keyword by modifying the pattern
#
if (defined($keyword)) {
	$pattern = "^\\s*${keyword}\\s*=.*${pattern}";
#	print "pattern is $pattern\n";
};



#
# looking for beginning of bib entry is state 1, in bib is state 2
#
$state = 1;


#
# Certain keys we really want to ignore because
# they're not bib entries.  They're listed here.
#
@badkeys = split(/,/, $badkeys);
foreach $i (@badkeys) {
	$badkeys{$i} = "bad";   # just make them defined
};



#
# To do searches right, we have to make everything
# for a field on one line.
#    This routine does that, and also gets rid of {}'s
# which tend to get in the way for searches.  In the
# same vein, it collapses all whitespace to single spaces.
#
# To know when to join lines, we use two simple heuristics:
# is there are a odd number of "'s on a line, we must enter or exit
# multi-line mode.  If there are more {'s than }'s, we must enter,
# and if there are more }'s than {'s we must exit (anything on
# the first line is ignored).
#

sub printtosearch {
	local ($print) = @_;
	local ($search, $mode) = ("", 1);
	local ($opencurley, $closecurley) = (0,0);

	@lines = split(/\n/, $print);
	@lines[0] =~ s/{/ /;
	foreach $ln (@lines) {
		# remove and count curley brackets
		$opencurley = ($ln =~ s/[{]//g);
		$closecurley = ($ln =~ s/[}]//g);
		if ($opencurley-$closecurley < 0) {
			$mode = 1;
		} elsif ($opencurley-$closecurley > 0) {
			$mode = 0;
		} else {
			# remove umlauts so quote handling works,
			# and then change modes if required.
			$ln =~ s/\\"//g;
			$mode = !$mode  if (($ln =~ tr/"/"/) % 2 == 1);
		};
		$search .= $ln;
		$search .= "\n"  if ($mode);
	};
	$search =~ s/[ \t]+/ /g;
	return $search;
}




open (INF, "<$file") || die ("cannot open bibfile $file");

while (<INF>) {
#	print "line ", $i++, " state=$state: $_\n";

	if ($state == 1) {
		if (/^[ \t]*@(\w+)/) {   # beginning of entry
			$key = $1 =~ tr/A-Z/a-z/;   # case insensitive keywords
			if (! defined($badkeys{$1})) {
				$state = 2;
				$bibentry = $_;
			};
		};
	} elsif ($state == 2) {
		$bibentry .= $_;
		if (/^[ \t]*}/) {   # ending
			$searchentry = &printtosearch($bibentry);
			print "$bibentry\n"  if ($searchentry =~ /$pattern/i);
			$state = 1;
		}
	} else {
		die ("state problem, $state\n");
	};
}


SHAR_EOF
chmod +x 'lookbibtex'
fi # end of overwriting check
#	End of shell archive
exit 0