[comp.bugs.4bsd] bugfix and extension to slice

greim@sbsvax.UUCP (Michael Greim) (05/17/88)

Hello netland,

Some months ago a program called 'slice' was posted comp.sources.unix.
We here tried it, but found a bug almost instantly. Our system
administrator wanted to use slice to extract tar mail pieces from
a mail box so I made 2 extensions of slice for him.

I sent the following to Rich Salz, suggesting a reposting, but did
not hear anything from him. So I assume it's ok, if I present my changes.

Here are
1.) the BUG
2.) 2 extensions to slice, description
3.) context diff of slice.c (a cure for the ailment)
4.) context diff of slice.1
5.) the tarmail extracting script

1.) the BUG

1.1) Symptoms

	I tried "slice -f file -n100 A#n" and exspected slice to produce
	some file Ann. But to my suprise it said : "can not use -n option
	together with pattern" or some such.

1.2) Diagnosis

	The first command line option not starting with '-' was considered
	a pattern regardless of other options specified.

1.3) Therapy

	Apply the context diff in 3.

2.) 2 extensions to slice, description

	The extensions are in the substitution ability.
	#0nn : with this format you can specify up to 99 parameters instead
		of only 9. We needed this!
	#-nn : take the nn'th parameter from the last. nn=0 means the last
		parameter. This is equal to #$ when you have less than 99 parameters.

	NOTE:
		To make this work properly set MAXPARM in opts.h to 99.
		(no context diff include because of {what-you-like} :-)

	Apply the context diff in 3.

3.) context diff of slice.c (a cure for the ailment)

*** slice.c.old	Wed Mar 23 18:41:41 1988
--- slice.c	Wed Mar 23 18:40:42 1988
***************
*** 43,48 ****
--- 44,52 ----
  bool exclude = FALSE;			/* exclude matched line from o/p files */
  bool split_after = FALSE;		/* split after matched line */
  bool m_flag = FALSE;			/* was -m option used */
+ bool s_flag = FALSE;			/* was -s option used */
+ bool n_flag = FALSE;			/* was -n option used */
+ bool e_flag = FALSE;			/* was -e option used */
  
  FILE *output = (FILE *) NULL;	/* fd of current output file */
  FILE *rejectfd = (FILE *) NULL;	/* fd of reject file */
***************
*** 105,110 ****
--- 109,115 ----
  					usage(1);
  				}
  				pattern = *argv;
+ 				e_flag = TRUE;
  				break;
  			}
  			case 'm': {				/* mailbox pattern */
***************
*** 113,119 ****
  				break; 
  			}
  			case 's': {				/* shell pattern */
! 				pattern = "^#! *\/bin\/sh";
  				break; 
  			}
  			case 'n': {				/* -n n_lines -- split every n lines */
--- 118,125 ----
  				break; 
  			}
  			case 's': {				/* shell pattern */
! 				pattern = "^#! *\\/bin\\/sh";
! 				s_flag = TRUE;
  				break; 
  			}
  			case 'n': {				/* -n n_lines -- split every n lines */
***************
*** 123,128 ****
--- 129,135 ----
  					error("-n: number must be at least 1\n");
  					exit(EXIT_SYNTAX);
  				}
+ 				n_flag = TRUE;
  				break;
  			} 
  			case 'f': {
***************
*** 163,179 ****
  		    }
  		}			/* end switch */
  	  } else {	
! 		if (!pattern) pattern = *argv;	/* first non-flag is pattern */
  		else break;						/* break while loop */
  	  }			/* end if */
       }		/* end while */
  
  	 if (!argc) {
! 		if (m_flag) {
  			format = mboxformat;
! 		} else {
  			format = defaultfmt;
- 		}
  		n_format = 1; 
  	 } else {
  		format = argv;
--- 170,195 ----
  		    }
  		}			/* end switch */
  	  } else {	
! 		/*
! 		 * mg, 22.mar.88
! 		 * the first non-flag is pattern, if not one of -s -n or -m
! 		 * was specified or -e pattern
! 		 */
! 		if (!pattern && !m_flag && !s_flag && !n_flag)
! 			pattern = *argv;	/* first non-flag is pattern */
  		else break;						/* break while loop */
  	  }			/* end if */
       }		/* end while */
  
+ 	if (e_flag && (m_flag || s_flag || n_flag)) {
+ 		error("don't use -e  together with -m, -s or -n flags\n");
+ 		usage(EXIT_SEMANT);
+ 	}
  	 if (!argc) {
! 		if (m_flag)
  			format = mboxformat;
! 		else
  			format = defaultfmt;
  		n_format = 1; 
  	 } else {
  		format = argv;
***************
*** 486,491 ****
--- 506,539 ----
  					q += strlen(tempbuf);
  					break;
  				}
+ 				/*
+ 				 * mg, 18.mar.88
+ 				 * - use #0nn to specify parameter numbers greater than 9
+ 				 * - use #-nn to select the nn'th parameter from the last
+ 				 *		#-00 is equivalent to #$
+ 				 */
+ 				case '-':
+ 				case '0':
+ 					if (!isdigit(*(p+1)) || !isdigit(*(p+2))) {
+ 						error("Invalid use of #%cnn format in '%s'\n", *p, *format);
+ 			 			exit(EXIT_RUNERR);
+ 					}
+ 					i = (*(p+1) - '0') * 10 + *(p+2) - '0';
+ 					if (i > MAXPARM) {
+ 						error("Number of parameter (%1d) exceeds max (%1d)\n", i, MAXPARM);
+ 			 			exit(EXIT_RUNERR);
+ 					}
+ 					if (*p == '-') {
+ 						j = lastparm ();
+ 						if (j < i) {
+ 							error ("Not enough parameters to take difference.\n");
+ 							exit (EXIT_RUNERR);
+ 						}
+ 						i = j - i;
+ 					} else
+ 						i--;
+ 					p += 2;
+ 					goto do_form;
  				case '1':
  				case '2':
  				case '3':
***************
*** 501,506 ****
--- 549,555 ----
  					} else {
  						i = (*p) - '1';
  					}
+ do_form:
  					if (*(p+1) == '%') {
  						p++;
  						fmtcode = getfmt(fmt,p);


4.) context diff of slice.1

*** slice.1.old	Wed Mar 23 18:42:28 1988
--- slice.1	Wed Mar 23 18:40:42 1988
***************
*** 38,45 ****
  into one or more output files.  The output files are named according
  to the \fIformat\fR strings provided.  The input file is split
  whenever a pattern is matched or every \fIn\fR lines, depending on the
! options selected.  Because some of the options are mutually exclusive,
  there are three forms of the command.
  .LP
  Whenever a pattern match is used to slice the file, lines occurring
  before the first match are sent to the \fIreject\fR file (which is
--- 38,47 ----
  into one or more output files.  The output files are named according
  to the \fIformat\fR strings provided.  The input file is split
  whenever a pattern is matched or every \fIn\fR lines, depending on the
! options selected.
! Because some of the options are mutually exclusive,
  there are three forms of the command.
+ It is an error to specify a pattern together with options -m, -s or -n.
  .LP
  Whenever a pattern match is used to slice the file, lines occurring
  before the first match are sent to the \fIreject\fR file (which is
***************
*** 111,119 ****
  output file produced by the current output format.  When an output
  format produces the same name twice, a new format is selected and
  numbering begins again with the initial value.
! .IP "#\&1, #\&2 ..."
! Parameters of the form #\&1, #\&2, ... #\&9 are replaced by corresponding
  tokens drawn from the source line which matched the slice pattern.
  For example, if each procedure in a C program began with a comment
  line of the following form:
  .sp
--- 113,129 ----
  output file produced by the current output format.  When an output
  format produces the same name twice, a new format is selected and
  numbering begins again with the initial value.
! .IP "#\&1, #\&2 ..., #\&0nn, #\&-nn"
! Parameters of the form #\&1, #\&2, ... #\&9 or #\&0nn, where 'nn' is
! a 2 digit number are replaced by corresponding
  tokens drawn from the source line which matched the slice pattern.
+ If you specify #\&-nn, you can select a parameter relative from
+ the last token on the line. #\&-00 is the last token on the line,
+ #\&-01 the last but one, ...
+ .br
+ Note that it is an error to not specify two digits when using #\&0nn
+ or #\&-nn.
+ .br
  For example, if each procedure in a C program began with a comment
  line of the following form:
  .sp
***************
*** 131,136 ****
--- 141,149 ----
  \ \ \ \ \From garyp@cognos Tue Sep 15 15:08:23 EDT 1987
  .sp
  then "#$" would select "1987", the last token on the line.
+ .br
+ Currently there are 99 addressable tokens on an input line. If a line
+ is split in more tokens, #$ will hold the last one.
  .SH FORMAT SPEC's
  .LP
  Substitution parameters can be followed by an optional 
***************
*** 240,245 ****
--- 253,264 ----
  generate the correct filenames, either slice has to lookahead to find
  the next match line or it has to direct lines for the current slice
  into a temporary file until it finds the line matching the pattern.
+ .IP c) 4
+ When you use slice on machines with a filesystem which allowes you
+ only a (usually small) amount of characters for filenames (i.e. 14),
+ slice might not detect that it is overwriting a file and/or
+ its diagnostic output is false. Especially filenames generated by the -m
+ option are too long. Just specify a format when slicing a mailbox.
  .SH DIAGNOSTICS
  ``Internal Error'' indicates a bug in \fIslice\fR, and should be reported.
  Exit status 1 indicates an error parsing options \- for example, if an unknown
***************
*** 249,254 ****
--- 268,279 ----
  be opened.
  .LP
  If a reject file is not provided, a count of rejected lines is reported.
+ .SH "AUTHOR"
+ Originally written by Russell Quinn as "mailsplit".
+ .sp
+ Revised and extended by Gary Puckering <cognos!garyp>.
+ .sp
+ Extended some more by Michael Greim.
  .SH "SEE ALSO"
  .I cat (1),
  .I ed (1),


5.) the tarmail extracting script

The author, Bernard Sieloff (bs@sbsvax.UUCP), says it could be improved,
but it is already 30% faster than the version using csplit.


#! /bin/sh
# @(#)untarpack 2.1 (UniSB[bs]) 88/03/20
PATH=/usr/ucb:/bin:/usr/bin:/usr/local
if [ $# -lt 1 -o $# -gt 2 ]; then
	echo "Usage: untarpack \"subject-string\"[ your-tarmailbox]"
	exit 1
fi
trap 'echo "untarpack: cancelled"; exit 9' 1 2 3 15
TS=$1;
if [ $# -eq 2 ]; then
	MB=$2
else
	MB=/usr/spool/mail/$USER
fi
if [ ! -s $MB ]; then
	echo "untarpack: no such file: $MB"
	exit 1
fi
rm -f utm.boxfile.???-of-???
echo "starting unpacking now---please wait..."
sed -n -e "/^Subject: $TS - part/,/^---end beef/p" $MB |
slice "^Subject: $TS - part" 'utm.boxfile.#-02%03d-of-#$%03d'
if [ $? -ne 0 ]; then
	echo "untarpack: slice error"
	exit 2
fi
if [ ! -s utm.boxfile.001-of-??? ]; then
	echo "untarpack: can't find subjects \"$TS\" in file \"$MB\""
	exit 3
fi
FOUND=`ls utm.boxfile.???-of-??? | wc -l`
PACKS=`expr substr utm.boxfile.001-of-??? 20 3`
if   [ $FOUND -lt $PACKS ]; then
	FOUND=`expr $FOUND + 0`
	PACKS=`expr $PACKS + 0`
	echo "untarpack: lack of tarmail packets ($FOUND instead of $PACKS)"
	exit 4
elif [ $FOUND -gt $PACKS ]; then
	FOUND=`expr $FOUND + 0`
	PACKS=`expr $PACKS + 0`
	echo "untarpack: packet overrun?!? ($FOUND instead of $PACKS)"
	exit 5
fi
echo '---end beef' > utm.boxfile.000-of-$PACKS
echo -n "Done---do you want to UNTARMAIL the tarmail? [y/n]:"
read answer junk
answer=${answer}x
if expr $answer : '[yY].*x'>/dev/null; then
	echo "OK---UNTARMAILing your tarmail..."
	exec untarmail utm.boxfile.???-of-???
else
	echo 'Use "untarmail utm.boxfile.???-of-???" to reconstruct the TARMAIL'
fi
exit 0


Absorb, apply and enjoy,

		Michael

-- 

snail-mail : Michael Greim,
			 Universitaet des Saarlandes, FB 10 - Informatik (Dept. of CS),
             Bau 36, Im Stadtwald 15, D-6600 Saarbruecken 11, West Germany
E-mail     : greim@sbsvax.UUCP