[alt.sources] man page cross-referencer

tchrist@convexe.uucp (Tom Christiansen) (10/27/89)

Ok, enough toy programs.  Here's one that's actually pretty useful, at
least if you're into vendor bashing. :-)  (Except for my employer, whom
I've already notified.)  It checks in the man pages for SEE ALSO sections
that reference other man pages, and makes sure that the references are
valid.  For example, here's the start of the gripes it gets off the Tahoe
tape.  This list goes a *long* time.

    addbib.1: indxbib(1) missing
    ar.1: arcv(8) missing
    biff.1: comsat(8c) really in comsat(8)
    binmail.1: Mail(1) missing
    binmail.1: uucp(1c) really in uucp(1)
    binmail.1: uux(1c) really in uux(1)
    chgrp.1: chown(1) missing
    cp.1: rcp(1c) really in rcp(1)
    csh.1: setrlimit(2) missing
    dbx.1: mod(l) missing
    f77.1: intro(3f) really in intro(3)
    finger.1: chfn(1) missing
    fp.1: lisp(s) missing
    fp.1: liszt(s) missing
    graph.1: spline(1g) really in spline(1)
    graph.1: plot(1g) really in plot(1)
    hostid.1: sethostid(2) missing
    hostname.1: sethostname(2) missing


If there are any bugs to be worked out, let's do so, then I'll forward
this to one of the moderated groups.  It may revolutionize the internal
consistency of UNIX man pages. :-)

The usage message is thus:
    usage: cfman [debug=1] [manpath=path] [sections]

If you don't set manpath on the command line, you get $MANPATH
(colon-delimited) if in your environment, otherwise "/usr/man".  The
debugging is so you see what it's doing when.  Enjoy.  This works with
both perl2 and perl3.  (Forgive me the indentation.)

--tom

#!/usr/local/bin/perl
#
# cfman: man page cross-referencer
# author: Tom Christiansen <tchrist@convex.com>


line:       while (<page>) {
		next line unless /^\.SH\s+"*SEE ALSO"*/;
xref:           while (<page>) {
		    last line if /^.SH/;
		    next xref unless /\(/;
		    next xref if /^.PP/;
		    chop;
		    s/\\f[RIPB]//g;
		    s/\\.//g;
entry:              foreach $entry ( split(/,/) ) {
			next entry unless $entry =~ /\(.*\)/;
			$pname = ''; $pext = '';
			$1 = ''; $2 = '';
			$entry =~ /([A-Za-z][A-Za-z0-9._\-]*)\s*\(([^)]+)\).*$/;
			($pname, $pext)=($1,$2);
			next entry if !$pname || !$pext || $pext =~ / /;
			$pext =~ y/A-Z/a-z/;
			($psect = $pext) =~ s/^(.).*/$1/;
			$fullpath = "$mandir/man$psect/$pname.$pext";
			next entry if -e $fullpath;
			if ($page =~ /^\w*\.l$/ && $psect =~ /[18]/) {
			    $fullpath = $mandir ."/manl/" . $pname . ".l";
			    next entry if -e $fullpath; # hack for manl idiocy
			}
			printf "%s: %s(%s)", $page, $pname, $pext;
			$fullpath = "$mandir/man$psect/$pname.$psect";
			if (-e $fullpath) {
			    $psect =~ s/[^.]*\.//;
			    printf " really in %s(%s)\n", $pname, $psect;
			    do flush();
			    next entry;
			}
			if ( $psect == 8 ) {
			    $psect = 1;
			    $fullpath = $mandir  . "man1/".$pname . ".1";
			    if ( -e $fullpath ) {   
				$psect =~ s/[^.]*\.//;
				printf " really in %s(%s)\n", $pname, $psect;
				do flush();
				next entry;
			    }
			}
			$fullpath = "$mandir/man$psect/$pname.$psect";
			if ( (@name = <${fullpath}*>)  && @name !~ /\*$/ ) {
			    $name = $name[$[]; # so .3c has precedence over .3f
			    ( $psect = $name ) =~ s/^.*\.([^.]*)$/$1/;
			    printf " really in %s(%s)\n", $pname, $psect;
			    do flush();
			    next entry;
			}
			printf " missing\n";
			do flush();
		    }
		}
	    }
	    close page;
	}
    }
}

sub flush {
    $| = 1; print ''; $| = 0;
}



    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"