[alt.sources] cfman: man-page cross-referencer

tchrist@convexe.uucp (Tom Christiansen) (11/22/89)

Here's the latest incarnation of the cfman program that 
I sent out several weeks ago.  The original posting was
munged, and some people reported that the patch never made
it by them.

This version has been largely rewritten to be be more 
maintainable and to fix many of the problems people 
reported to me.  It comes with a man page and requires
perl 3.0pl1 to run.

--tom

#!/bin/sh
#    This is a shell archive.
#    Run the following text with /bin/sh to extract.

echo x cfman.8l
sed -e 's/^X//' << \EOFMARK > cfman.8l
X.TH CFMAN 8L
X.de Sh
X.br
X.PP
X.ne 4
X.ti -.5i
X\fB\\$1\fR
X.PP
X..
X.de LB		\" little and bold
X.ft B
X.if !"\\$1"" \&\s-1\\$1\s+1 \\$2 \\$3 \\$4 \\$5 \\$6
X.ft R
X..
X.de Sp
X.if t .sp .5v
X.if n .sp
X..
X.ds lq \&"\"
X.ds rq \&"\"
X.if t \
X.       ds lq ``
X.if t \
X.       ds rq ''
X.de Q
X\*(lq\\$1\*(rq\\$2
X..
X.Sh NAME
Xcfman \- cross-reference man pages for internal consistency
X.Sh SYNOPSIS
X.B cfman
X[
X.B \-d
Xlevel
X] 
X[
X.B \-s
Xsections
X] 
X[
X.B \-p
Xmanpath
X] 
X[
X.B \-x
Xxrefpath
X] 
X[ pattern | pathname ] ...
X.br 
X.Sh DESCRIPTION
X.I 
XCfman 
Xis a 
X.I perl
Xprogram that checks that man page sources 
Xare mutually consistent in their 
X.LB "SEE ALSO"
Xreferences.
XIt will also report any 
X.LB ".TH"
Xline that claims the
Xman page is in a different place than 
X.I cfman
Xfound it.
X.PP
XWhen supplied with no arguments, 
X.I cfman
Xwill check all files (matching *.*) it finds in each man directory in 
Xyour colon-delimited 
X.LB "$MANPATH"
Xenvariable if set, or in 
X.I /usr/man
Xotherwise.  It first verifies that the 
X.LB ".TH"
Xsays
Xthe man page is really where it should be, e.g. if the
Xline is 
X.br
X.in +.5i
X.nf
X\f(TA
X\&.TH\ \ WIDGET\ \ 4
X.in -.5i
X\fR
X.fi
X.br
Xthen \fIwidget.8\fR should be the filename currently 
Xbeing examined.  All upper-case will map to all lower-case,
Xbut mixed case will be preserved for compatibility with
Xthe 
X.LB X11
Xman pages.
X.PP
X.I Cfman
Xthen skips ahead to the 
X.LB "SEE ALSO"
Xsection and retrieves
Xall comma-delimited entries of the general 
Xform \fIpagename(section)\fP.  It first looks in the file
X\&../man\fIsection/pagename.section\fP.  If this fails
Xand the current file ended in one of \fB[npl]\fP, but the
X.I section
Xreferenced is either 
X\fB1\fP or \fB8\fP, then it will check in 
X.I ../man8.
XFailing this, 
X.I cfman
Xchecks to see whether the referenced man page has been 
Xinstalled stripped of its subsection, e.g. \fIuucp\fP(1c)
Xhas found its way into \fIuucp\fP(1).  It then checks
Xto see whether something in section \fB1\fP has been mis-installed
Xin section \fB8\fP, or vice versa, or either one in section \fBl\fP
Xmis-installed in the 
Xin section \fB8\fP and vice-versa.  If all else fails, 
X.I cfman
Xwill guess that a man page is referenced without its
Xproper subsection, as in a reference to \fIrcp(1)\fP
Xthat should really have been to \fIrcp(1c)\fP.  If it finds
Xthe misplaced man page, it reports where the reference
Xthought it was and where it really was.  Otherwise it
Xreports the man page as missing.  
X.PP
XThe 
X.LB $MANPATH
Xvariable may be overridden by 
Xthe \fB-p\fP option.  
XAll checks will 
Xbe performed across each subtree specified in the manpath
X(either from the environment of the command line),
Xunless altered with the \fB-x\fP option.  As a short-cut, 
Xthe \fIxrefpath\fP may have a leading colon to indicate
Xthat it is to be concatenation of the \fImanpath\fP
Xand the supplied \fIxrefpath\fP.  
X.PP
XYou can restrict the sections checked with the \fB-s\fP
Xswitch.  By default, sections 1 through 8 will be examined.
XThe section may be a shell metacharacter expression, 
Xlike 
X.Q ?
Xor 
X.Q [18lpn] .
X.PP
XYou may restrict the individual man pages cross-referenced
Xby specifying which ones you're interested in on the command
Xline.  These may be full pathnames, simple names like 
X.Q tty ,
Xor a shell metacharacter expression like 
X.Q *net . 
XIf
Xno period occurs in the simple name, it is assumed to mean that
Xthe name may have any extension.  If you list specific 
Xman pages on the command line and 
X.I cfman
Xfinds none matching your specification, it will report this fact.
XSee the 
X.LB "EXAMPLES"
Xsection.  
X.PP
XMan pages that are linked by placing a \fB.so\fP directive
Xon the first line will be correctly followed, and no man page
Xin the same subtree.  Very limited support for alternate 
Xman macros is provided: the 
X.I "\fIRand MH Message Handling System\fP" 's
Xman macro set are recognized, but few others.
X.LB .SH.
X.Sh DIAGNOSTICS
XRequires 
X.I perl 
Xto be at least version 3.0, patchlevel 1 to run.  The 
Xprogram will abort if you try to run it with an 
Xearlier version of \fIperl\fR.
X.PP
XFive different tracing levels can be specified with the \fB-d\fP
Xoption.  If any debugging is turned on, the walk through
Xthe different components of the manpath are traced.
XDebug values are numeric and additive, and are interpreted
Xthis way:
X.Sp
X.in +.5i
X.nf
X.ne 5
X	1	Trace each man page examined
X	2	Trace each cross reference examined
X	4	Trace each \s-1\fB.TH\s+1\fP check
X	8	Trace each file-existence test 
X	16	Trace each line
X'in -.5i
X.fi
X.Sp
XTracing information and other warnings are printed to 
X\fIstderr\fP, but normal messages about bad cross references
Xare printed to \fIstdout\fP as that is \fIcfman\fP's principle
Xtask. 
X.PP
XEmbedded 
X.I troff
Xstring macros starting \e*( cannot be resolved, and they
Xwill trigger a warning message if found in the 
X.LB .TH
Xor 
X.LB "SEE ALSO"
Xsections.
X.Sh EXAMPLES
X.nf
X\f(TA
Xcfman							# do all in $MANPATH
Xcfman -p /export/exec/sun3/share/man		# sun man pages
Xcfman -p $HOME/man:/usr/local/mh/man:/usr/local/man:/usr/man
Xcfman -p /usr/local/man	 -x :/usr/man	# xref also in /usr/man
Xcfman -s 18nlp					# only these sections
Xcfman '*tty*' fubar		    			# check for *tty*.* and fubar.*
Xcfman `pwd`/*.[1-8]		    		# just check these files
Xcfman -s 23 'sys*'					# sys*.* files in sections 2,3
Xcfman -s 1 -p /export/exec/sun3/share/man
X.fi
X\fR
X.PP
XThe last command produced this output on my machine:
X.nf
X\f(TA
Xbanner.1v: thinks it's in banner(1)
Xfoption.1: skyversion(8) missing
Xfrom.1: prmail(1) missing
Xmake.1: rstat(8c) missing
Xman.1: apropos(1) missing
Xold-perfmon.1: missing .TH
Xoldperfmon.1: missing .TH
Xoldsetkeys.1: thinks it's in setkeys(1)
Xorganizer.1: restore(1v) really in restore(8)
Xsunview.1: traffic(1) really in traffic(1c)
Xsort.1v: thinks it's in sort(1)
Xsum.1v: thinks it's in sum(1)
X.fi 
X\fR
X.Sh ENVIRONMENT
XThe default manpath will be taken from 
X.LB $MANPATH
Xif set.
X.Sh "SEE ALSO"
Xman(1), troff(1), perl(1), man(7).
X.Sh BUGS
XDue to the current implentation of globbing in 
X.I perl,
Xyou can get 
X.Q "Arguments too long"
Xerrors.  The workaround is to run 
X.I cfman
Xfirst on 
X.Q [a-m]* ,
Xand then on 
X.Q [n-z]* .
X.Sh AUTHOR
XTom Christiansen, \s-1CONVEX\s+1 Computer Corporation.
EOFMARK
echo x cfman.pl
sed -e 's/^X//' << \EOFMARK > cfman.pl
X#!/usr/local/bin/perl
X#
X# cfman v2.0: man page cross-referencer
X# author: Tom Christiansen <tchrist@convex.com>
X# date: 15 November 89
X#
X# usage: cfman [ -d debug-devel ] [ -s sub-sections ] 
X#	       [ -p manpath ] [ -x xrefpath ] 
X
X($iam = $0) =~ s%.*/%%;
X 
X$] =~ /(\d+\.\d+).*\nPatch level: (\d+)/;
Xdie "$iam: requires at least perl version 3.0, patchlevel 1 to run correctly\n"
X	if $1 < 3.0 || ($1 == 3.0 && $2 < 1);
X
X&Getopts('d:s:p:x:') || &usage;
X
X$manpath = $opt_p ? $opt_p : $ENV{'MANPATH'};
X$manpath = "/usr/man" unless $manpath;
X@manpath = split(/:/,$manpath);
X
X$opt_x =~ /^:/ && ( $opt_x = $manpath . $opt_x );
X@xrefpath = $opt_x ? split(/:/,$opt_x) : @manpath;
X
X$debug = $opt_d;
X
X@sections = $opt_s ? split(/ */,$opt_s) : 1..8;
X
Xif ($debug) {
X    $" = ':';
X    print "manpath is @manpath\n";
X    print "xrefpath is @xrefpath\n";
X    $" = ' ';
X} 
X
Xfile:    foreach $file ( $#ARGV >= $[ ? @ARGV : '*.*' ) {
X	     printf STDERR "considering %s\n", $file if $debug & 1;
X	     $bingo = 0;
Xtree:        foreach $tree ( @manpath ) {
X		 print "ROOT is $tree\n" if $debug;
X		 if (!chdir $tree) {
X		    warn "cannot chdir to $tree: $!";
X		    next tree;
X		 } 
X		 $rootdir = $tree;
X		 if ( $file =~ m#^/# ) {
X		    &read_manpages($file); 
X		    next file;
X		 } 
Xsection:         foreach $section ( @sections ) {
X		    &scan_section($tree,$section,$file);
X		 }
X	     } 
X	     print "no man pages matched \"$file\"\n" unless $bingo;
X	  }
X
X
Xexit 0;
X
X############################################################################
X#
X# scan_section()
X#
X#	checks a given man tree (like /usr/local/man) in a 
X#	certain subsection (like '1'), checking for a certain
X#	file, like 'tty' (which mean 'tty.*', 'system.3*', or '*.*'.
X#
X#	will recurse on a subsection name contaning a shell meta-character
X#
X############################################################################
X
Xsub scan_section {
X    local ( $manroot, $subsec, $files ) = @_;
X    local ( $mandir );
X
X    $mandir = "man" . $subsec;
X
X
X    # subsec may have been ? or *; if so, recurse!
X    if ( &has_meta($mandir) ) {  
X	for (<${mandir}>) {
X	    if (&has_meta($_)) { 
X		warn "bad glob of $mandir"; 
X		last; 
X	    } 
X	    s/^man//;
X	    &scan_section($manroot,$_,$files);
X	} 
X	return;
X    } 
X
X    $files = "$files.*" unless $files =~ /\./;
X
X    if (!chdir $mandir) {
X	warn "couldn't chdir to $mandir: $!\n" if $debug;
X	return;
X    } 
X
X    printf STDERR "chdir to %s of %s\n", $mandir, $manroot if $debug & 1;
X
X    &read_manpages ( &has_meta($files) ? <${files}> : ($files));
X
X    chdir('..');
X} 
X
X############################################################################
X#
X# read_manpages()
X#
X#	passed a list of filename, which are man pages.  opens each one
X#	verifying that the file really is in the place that the .TH line.
X#	skips to SEE ALSO section and then verifies existence of each 
X#	referenced man page.
X############################################################################
X
X
Xsub read_manpages {
X    local (@pages) = @_;
X
X    local ($junk, $sopage, $basename, $line, $page, $pname, $pext, $gotTH);
X    local(%seen);
X
X
Xpage:
X    foreach $page ( @pages ) {
X	next page if $page =~ /\.OLD$/;
X
X	if ($seen{$page}++) {
X	    print "already saw $page\n" if $debug & 1;
X	    next page;
X	}
X
X	if (!open page) {
X	    warn "couldn't open $page: $!\n";
X	    next page;
X	}
X
X	$bingo = 1; # global var
X
X	print "checking $page\n" if $debug & 1;
X
X	$gotTH = 0;
X	$line = 0;
X	$sopage = '';
X
Xline:   while (<page>) {
X	    print if $debug & 16;
X	    next line if /^'''/ || /^\.\\"/;
X
X	    # deal with .so's on the first line.
X	    # /usr/ucb/man uses this instead of links.
X	    if (!($line++) && /^\.so\s+(.*)/) {
X		$sopage = $1;
X		print "$page -> $sopage\n" if $debug & 1;
X		($basename = $sopage) =~ s%.*/%%;
X		if ($seen{$basename}++) {
X		    print "already saw $basename\n" if $debug & 1;
X		    next page;
X		} 
X		if (!open(page,"../$sopage")) {
X		    print "$page: cannot open $sopage: $!\n";
X		    next page;
X		} 
X		$page = $basename;
X		next line;
X	    } 
X
X	    # check for internally consistent .TH line
X	    if ( /^\.(TH|SC)/ ) { # SC is for mh
X		 $gotTH++;
X		 printf STDERR "TH checking %s", $_ if $debug & 4;
X		 do flush();
X		 s/"+//g;
X		 ($junk, $pname, $pext) = split;
X		 if (&macro($pname)) {
X			printf STDERR "%s: can't resolve troff macro in .TH: %s\n",
X			    $page, $pname;
X			next line;
X		 } 
X		 $pext =~ y/A-Z/a-z/;
X		 $pname =~ s/\\-/-/g;
X		 $pname =~ y/A-Z/a-z/ if $pname =~ /^[0-9A-Z_\-]+$/;
X		 ($pexpr = $page) =~ s/([.+])/\\$1/g;
X		 $pexpr =~ s%.*/%%;
X		 if ( "$pname.$pext" !~ /^$pexpr$/i) {
X		      printf "%s: thinks it's in %s(%s)\n", 
X			  $page, $pname, $pext;
X		 } 
X		 next line;
X	    }
X
X	    next line unless /^\.S[Hh]\s+"*SEE ALSO"*/ || /^\.Sa\s*$/; # damn mh
X
X	    # finally found the cross-references
Xxref:       while (<page>) {
X		print if $debug & 16;
X		last line if /^\.(S[Hh]|Co|Hi|Bu)/; # i really hate mh macros
X		next xref unless /\(/;
X		next xref if /^.PP/;
X		chop;
X		s/\\f[RIPB]//g;
X		s/\\\|//g;
X		s/\\-/-/g;
Xentry:          foreach $entry ( split(/,/) ) {
X		    #print "got entry $entry\n";
X		    next entry unless $entry =~ /\(.*\)/;
X		    $pname = ''; $pext = '';
X		    $1 = ''; $2 = '';
X		    ($pname, $pext) = 
X			($entry =~ /([A-Za-z0-9\$._\-]+)\s*\(([^)]+)\).*$/); 
X		    if ($debug & 8) {
X			printf STDERR "entry was %s, pname is %s, pext is %s\n",
X			    $entry, $pname, $pext;
X		    }     
X		    if (&macro($pname)) {
X			printf "%s: can't resolve troff macro in SEE ALSO: %s\n",
X			    $page, $pname;
X			next entry;
X		    } 
X		    next entry if !$pname || !$pext || $pext !~ /^\w+$/;
X		    $pext =~ y/A-Z/a-z/;
X		    $pname =~ y/A-Z/a-z/ if $pname =~ /^[A-Z_0-9\-]+$/;
X		    #($psect = $pext) =~ s/^(.).*/$1/;
X		    do check_xref($page,$pname,$pext);
X
X		}	# entry: foreach $entry ( split(/,/) ) 
X	    }		# xref:  while (<page>)
X	}		# line:  while (<page>) 
X	printf "%s: missing .TH\n", $page if (!$gotTH);
X    }  			# page:  foreach $page ( @pages )
X}     			# sub    read_manapages
X
X
X###########################################################################
X#
X# check_xref()
X#
X#	given the name of the page we're looking for, check for a
X#	cross reference of a given man page and its assumed subsection
X#
X###########################################################################
X
Xsub check_xref {
X    local ($name, $target, $section) = @_;
X    local ($basesec, $subsec, $newsec );
X
X    printf STDERR " xref of %s(%s)\n", $target, $section if $debug & 2;
X
X    return if &pathcheck($target,$section);
X
X
X    # if we get this far, something's wrong, so begin notify
X    printf "%s: %s(%s)", $name, $target, $section;
X
X    ($basesec, $subsec) = ($section =~ /^(\d)(.*)$/);
X
X    if ($name =~ /\.\d*([nlp])$/ && ($section == 1 || $section == 8)
X	    && ($newsec = &pathcheck($target,$1))) { # hack for manl idiocy
X	&really($target,$newsec);
X	return;
X    }
X
X    # first check if page.Xn is really in page.X
X    if ( $subsec && ($newsec = &pathcheck($target,$basesec))) {
X	&really($target,$newsec);
X	return;
X    } 
X
X    if ( $basesec == 1 && &pathcheck($target,8))  {
X	&really($target,8);
X	return;
X    }
X
X    if ( $basesec == 8 && &pathcheck($target,1))  {
X	&really($target,1);
X	return;
X    }
X
X    # maybe it thinks it's in 8 but got erroneously in 1
X    if ( $basesec =~ /[18]/ && ($newsec = &pathcheck($target,'l')))  {
X	&really($target,$newsec);
X	return;
X    } 
X
X    # maybe page.X is really in page.Xn; this is expensive
X    if ( !$subsec && ($newsec = &pathcheck($target,$basesec.'*'))) {
X	&really($target,$newsec);
X	return;
X    } 
X
X    printf " missing\n";
X    do flush();
X}
X
X###########################################################################
X#
X# pathcheck()
X#
X#	takes a name (like 'tty') and a section (like '1d')
X#	and looks for 'tty.1d' first in the current root, 
X#	then in all other elements of @xrefpath.  the section
X#	may have a meta-character in it (like '8*').
X#
X#	returns the subsection in which we found the page, or
X#	null if we failed.
X#
X###########################################################################
X
Xsub pathcheck {
X    local ( $name, $section ) = @_;
X    local ( $basesec, $metasec, $fullpath, @expansion, $tree, %checked  ); 
X    local ( $return ) = 0;
X
X    $metasec = &has_meta($section);
X
X    ($basesec) = ($section =~ /^(.)/);
X
X    foreach $tree ( $rootdir, @xrefpath ) {
X	next if !$tree || $checked{$tree}++;  # only check each tree once
X
X	$fullpath = "$tree/man$basesec/$name.$section";  
X
X	print "   testing $fullpath\n" if $debug & 8;
X
X	if (!$metasec) {
X	    if (-e $fullpath) {
X		$return = $section;
X	    }
X	} else {
X	    open(SAVERR, '>&STDERR');  # csh globbing brain damage
X	    close STDERR;
X	    if ((@expansion = <${fullpath}>) && !&has_meta($expansion[0])) {
X	    			# redundant meta check due to sh brain-damage
X		#for (@expansion) { s/.*\.//; } 
X		#$section = join(' or ',@expansion);
X		($section) = ($expansion[0] =~ /([^.]+)$/);
X		$return = $section;
X	    }
X	    open(STDERR, '>&SAVERR');  # csh globbing brain damage
X	    close SAVERR;
X	}
X    } 
X    printf STDERR "   pathcheck returns $section\n" if $debug & 8;
X    $return;
X} 
X
X#---------------------------------------------------------------------------
X
Xsub flush {
X    $| = 1; 
X    print ''; 
X    $| = 0;
X}
X
Xsub has_meta {
X    @_ =~ /[[*?]/;
X} 
X
Xsub macro {
X    @_ =~ /^\\\*\(/;
X} 
X
Xsub really {
X    local($was,$is) = @_;
X    print " really in $was($is)\n";
X    #print " really in $was($is)\n", $_[0], $_[1];
X}
X
Xsub usage {
X    die "usage: $iam [-d debug-level] [-s sub-sections] [-p manpath] 
X    	[-x xrefpath] [pattern ...] \n";
X}
X
X
X# 
X# Straight from the perl library, almost
X# modified by Tom Christiansen to return 1 for success, 0 for bad options
X#
X
X;# getopts.pl - a better getopt.pl
X
X;# Usage:
X;#      do Getopts('a:bc');  # -a takes arg. -b & -c not. Sets opt_* as a
X;#                           #  side effect.
X
X
X
Xsub Getopts {
X    local($argumentative) = @_;
X    local(@args,$_,$first,$rest);
X    local($errs);
X
X    $errs = 0;
X
X    @args = split( / */, $argumentative );
X    while(($_ = $ARGV[0]) =~ /^-(.)(.*)/) {
X	($first,$rest) = ($1,$2);
X	$pos = index($argumentative,$first);
X	if($pos >= $[) {
X	    if($args[$pos+1] eq ':') {
X		shift(@ARGV);
X		if($rest eq '') {
X		    $rest = shift(@ARGV);
X		}
X		eval "\$opt_$first = \$rest;";
X	    }
X	    else {
X		eval "\$opt_$first = 1";
X		if($rest eq '') {
X		    shift(@ARGV);
X		}
X		else {
X		    $ARGV[0] = "-$rest";
X		}
X	    }
X	}
X	else {
X	    print STDERR "Unknown option: $first\n";
X	    $errs = 1;
X	    if($rest ne '') {
X		$ARGV[0] = "-$rest";
X	    }
X	    else {
X		shift(@ARGV);
X	    }
X	}
X    }
X    return $errs == 0;
X}
EOFMARK

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"