[net.bugs.4bsd] Bug in Refer

solomon@crystal.ARPA (05/30/84)

The following bug concerns the version of refer delivered with 4.2bsd,
but probably applies to lots of versions of UNIX.  I can't understand
why nobody fixed it before.  By the way, tracking down the bug was made
more difficult by bugs in the debugging code included in the sources
(defining D1 gives lots of useful tracing, but causes a core dump in gch(),
and defining TF almost, but not quite, prevents the temp file from being
unlinked), and reporting it was made more difficult by the fact that the
default references in /usr/dict/papers are nearly useless, since many
of the references there have duplicate copies, thus making it impossible
to refer to them unambiguously.

Subject: refer with the -e and -s flags doesn't work

Index:	usr.bin/refer/refer2.c and usr.bin/refer/refer5.c 4.2BSD

Description:
	When refer is called with the -s flag (sort references) and the -e flag
	(delay the bibliography to the end) and the document contains multiple
	references to the same document, the bibliography may not be completely
	sorted, and some of the citations in the text may listed as [0].
Repeat-By:
	refer -e -s test
	Where "test" is the following file:
		Here are some references.
		to yacc
		.[
		yacc
		.]
		a forward
		.[
		foreword
		.]
		another yacc
		.[
		yacc
		.]
		and a preface to unix
		.[
		preface
		.]


		references:
		.[
		$LIST$
		.]
Fix:
	On the first pass through the document, when a citation is processed, a
	"signal" (such as [1]) is written to the output file and a line consisting
	of a sort key followed by a copy of the cited reference is written to a temp
	file.  After the first pass, the temp file is sorted and re-read (to
	establish a mapping from old to new reference numbers) and a second pass
	through the document fixes the signals.  If a duplicate citation is
	discovered, the cited reference is not written to the temp file, but
	the sort key is written, thus screwing up the format of the temp file.
	Unfortunately, the same procedure (putsig()) is used to transfer
	the signal to the output file (which must be done on every citation)
	and to write the sort key to the temp file (which must not be done
	for duplicates).  One fix is to add an argument to putsig() to indicate
	whether the sort key should be written.

	The following fix has not been exhaustively tested for all combinations
	of options, but at least it works both with and without the -l flag.

*** refer2.c.old	Tue May 29 15:48:58 1984
--- refer2.c	Tue May 29 15:52:29 1984
***************
*** 112,118
  						nf = tabs(flds, one);
  						nf += tabs(flds+nf, dbuff);
  						assert(nf < NFLD);
! 						putsig(nf,flds,nr,line1,line);
  					}
  					return;
  				}

--- 112,118 -----
  						nf = tabs(flds, one);
  						nf += tabs(flds+nf, dbuff);
  						assert(nf < NFLD);
! 						putsig(nf,flds,nr,line1,line,0);
  					}
  					return;
  				}
***************
*** 129,135
  	if (sort)
  		putkey(nf, flds, refnum, keystr);
  	if (bare < 2)
! 		putsig(nf, flds, refnum, line1, line);
  	else
  		flout();
  	putref(nf, flds);

--- 129,135 -----
  	if (sort)
  		putkey(nf, flds, refnum, keystr);
  	if (bare < 2)
! 		putsig(nf, flds, refnum, line1, line, 1);
  	else
  		flout();
  	putref(nf, flds);

*** refer5.c.old	Tue May 29 15:49:17 1984

solomon@crystal.ARPA (06/05/84)

I've had some reports that my bug report was truncated somewhere not
too far downstream from here.  I've repeated it below.  The last line
in this note should be 'LAST LINE'.
-------------
The following bug concerns the version of refer delivered with 4.2bsd,
but probably applies to lots of versions of UNIX.  I can't understand
why nobody fixed it before.  By the way, tracking down the bug was made
more difficult by bugs in the debugging code included in the sources
(defining D1 gives lots of useful tracing, but causes a core dump in gch(),
and defining TF almost, but not quite, prevents the temp file from being
unlinked), and reporting it was made more difficult by the fact that the
default references in /usr/dict/papers are nearly useless, since many
of the references there have duplicate copies, thus making it impossible
to refer to them unambiguously.

Subject: refer with the -e and -s flags doesn't work

Index:	usr.bin/refer/refer2.c and usr.bin/refer/refer5.c 4.2BSD

Description:
	When refer is called with the -s flag (sort references) and the -e flag
	(delay the bibliography to the end) and the document contains multiple
	references to the same document, the bibliography may not be completely
	sorted, and some of the citations in the text may listed as [0].
Repeat-By:
	refer -e -s test
	Where "test" is the following file:
		Here are some references.
		to yacc
		.[
		yacc
		.]
		a forward
		.[
		foreword
		.]
		another yacc
		.[
		yacc
		.]
		and a preface to unix
		.[
		preface
		.]


		references:
		.[
		$LIST$
		.]
Fix:
	On the first pass through the document, when a citation is processed, a
	"signal" (such as [1]) is written to the output file and a line consisting
	of a sort key followed by a copy of the cited reference is written to a temp
	file.  After the first pass, the temp file is sorted and re-read (to
	establish a mapping from old to new reference numbers) and a second pass
	through the document fixes the signals.  If a duplicate citation is
	discovered, the cited reference is not written to the temp file, but
	the sort key is written, thus screwing up the format of the temp file.
	Unfortunately, the same procedure (putsig()) is used to transfer
	the signal to the output file (which must be done on every citation)
	and to write the sort key to the temp file (which must not be done
	for duplicates).  One fix is to add an argument to putsig() to indicate
	whether the sort key should be written.

	The following fix has not been exhaustively tested for all combinations
	of options, but at least it works both with and without the -l flag.

*** refer2.c.old	Tue May 29 15:48:58 1984
--- refer2.c	Tue May 29 15:52:29 1984
***************
*** 112,118
  						nf = tabs(flds, one);
  						nf += tabs(flds+nf, dbuff);
  						assert(nf < NFLD);
! 						putsig(nf,flds,nr,line1,line);
  					}
  					return;
  				}

--- 112,118 -----
  						nf = tabs(flds, one);
  						nf += tabs(flds+nf, dbuff);
  						assert(nf < NFLD);
! 						putsig(nf,flds,nr,line1,line,0);
  					}
  					return;
  				}
***************
*** 129,135
  	if (sort)
  		putkey(nf, flds, refnum, keystr);
  	if (bare < 2)
! 		putsig(nf, flds, refnum, line1, line);
  	else
  		flout();
  	putref(nf, flds);

--- 129,135 -----
  	if (sort)
  		putkey(nf, flds, refnum, keystr);
  	if (bare < 2)
! 		putsig(nf, flds, refnum, line1, line, 1);
  	else
  		flout();
  	putref(nf, flds);

*** refer5.c.old	Tue May 29 15:49:17 1984
--- refer5.c	Tue May 29 15:52:37 1984
***************
*** 15,21
  static char stbuff[50];
  static int  prevsig;
  
! putsig (nf, flds, nref, nstline, endline)	/* choose signal style */
  char *flds[], *nstline, *endline;
  {
  	char t[100], t1[100], t2[100], format[10], *sd, *stline;

--- 15,21 -----
  static char stbuff[50];
  static int  prevsig;
  
! putsig (nf, flds, nref, nstline, endline, toindex)	/* choose signal style */
  char *flds[], *nstline, *endline;
  {
  	char t[100], t1[100], t2[100], format[10], *sd, *stline;
***************
*** 138,144
  		}
  	}
  	if (bare < 2)
! 		if (nf > 0)
  			fprintf(fo,".ds [F %s%c",t,sep);
  	if (bare > 0)
  		flout();

--- 138,144 -----
  		}
  	}
  	if (bare < 2)
! 		if (nf > 0 && toindex)
  			fprintf(fo,".ds [F %s%c",t,sep);
  	if (bare > 0)
  		flout();
-- 
	Marvin Solomon
	Computer Sciences Department
	University of Wisconsin, Madison WI
	solomon@uwisc
	...{ihnp4,seismo,ucbvax}!uwvax!solomon
LAST LINE
-- 
	Marvin Solomon
	Computer Sciences Department
	University of Wisconsin, Madison WI
	solomon@uwisc
	...{ihnp4,seismo,ucbvax}!uwvax!solomon

abc@brl-tgr.ARPA (Brint Cooper ) (09/22/85)

The version of "lookbib" which we run at BRL will not handle
surnames of 2 characters.  Thus, the user cannot find
references in the database to authors with  names like 
Wu, Nu, Yu, Ng, etc.  

Is this a known bug?

Can it be fixed?

If it helps, the following lines introduce most of the source
code:

#ifndef lint
static char *sccsid = "@(#)what4.c	4.1 (Berkeley) 5/6/83";
#endif

Thanks for any help.

Brint
	 ARPA:  abc@brl.arpa
	 UUCP:  ...{decvax,cbosgd}!brl-bmd!abc

Offc:    301 278-6883    AV:  298-6883     FTS: 939-6883