[net.lang.c] C and AWK questions

bobr@chemabs (bobr) (08/12/84)

We would appreciate information from anyone who has found an answer to the 
following problems: 

We are searching for debugging techniques to track down malloc/realloc 
aborts.  Occasionally, our programs get memory faults within the stdio's 
_flsbuf() subroutine.  Does anyone have hints on tracking down the types of 
things that would clobber the malloc buffers?  

Does anyone have a routine which performs the same expansion on its 
argument as does the Bourne shell?  That is, would expand [], ?, *, [!] 
into appropriate file names?  

Does anyone know of a way to invoke a program from within AWK, passing it 
one or more awk variables, and then trapping its output into another awk 
variable?  Does anyone know how to get input from multiple files 
simultaneously from within an awk program?  

We have a file which contains lines which are duplicates.  We want to be
able to get only one occurance of these lines.  Is there a way to tell
one of the versions of grep to find the pattern and obtain only the
first (or last) hit found?

Thanks in advance,

     Bob Richards (...cbosgd!chemabs!bobr)
     Chemical Abstracts Service
     (614) 421-3600 X2486

ajs@hpfcla.UUCP (08/13/84)

> Does anyone know of a way to invoke a program from within AWK, passing it 
> one or more awk variables, and then trapping its output into another awk 
> variable?

The invoking part is easy; just print to a pipe.  You can print variables:

	print foo bar | echo

I don't know of a way to trap the output from the echo, however, except
perhaps in a file (which does the current awk invocation no good).

> Does anyone know how to get input from multiple files simultaneously from
> within an awk program?  

The only way I know of is ugly.  Try this:

		sep="==sep=="			# or some other pattern.

	{	cat file1
		echo $sep
		cat file2
	}	|
		awk '
		(flag == 0) {			# first file.
		    if ($0 == "'$sep'")
		    {
			flag = 1;
			next;
		    }
		    (do stuff from first file here (save data?))
		    next;
		}
		{				# second file.
		    (do stuff from second file here)
		}'

gwyn@BRL-VLD.ARPA (08/14/84)

From:      Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA>

There is a bug in the realloc routine through UNIX System III.
There are some other subtler bugs in all mallocs through UNIX
System V Release 1.  My advice is to find a working malloc and
substitute it for the one in your C library.

Most versions of malloc can be conditionally recompiled to
produce a fairly strict checking version for debugging.  This
will usually abort with a core dump as soon as something goes
wrong with malloc-controlled storage.

moss@BRL-VLD.ARPA (08/14/84)

From:      "Gary S. Moss (DRXBR-VLD-V)" <moss@BRL-VLD.ARPA>

 
> We are searching for debugging techniques to track down malloc/realloc 
> aborts.  Occasionally, our programs get memory faults within the stdio's 
> _flsbuf() subroutine.  Does anyone have hints on tracking down the types of 
> things that would clobber the malloc buffers?  

I believe there are DEBUG defines you can turn on in the malloc source
to compile a debugging version (depending on your OS).

> Does anyone have a routine which performs the same expansion on its 
> argument as does the Bourne shell?  That is, would expand [], ?, *, [!] 
> into appropriate file names?  

I have a boolean subroutine which returns whether or not a string matches
a pattern which can contain those operators.  The only caviat is that
although it mimics sh(1) file name expansion, it was not meant for use
with file names so it does not require that '/' or '.' at the beginning
of a file or directory name be matched explicitly.  This will be left
as an exercise to the reader [:-)].

______________ tear here __ match.c _______________________________________
/*
 *	SCCS id:	@(#) match.c	1.2
 *	Last edit: 	8/14/84 at 11:33:26
 *	Retrieved: 	8/14/84 at 11:33:45
 *	SCCS archive:	/vld/moss/work/libWIND/s.match.c
 *
 *	Author:		Gary S. Moss
 *			U. S. Army Ballistic Research Laboratory
 *			Aberdeen Proving Ground
 *			Maryland 21005
 *			(301)278-6647 or AV-283-6647
 */
static
char	sccsTag[] = "@(#) match.c	1.2	last edit 8/14/84 at 11:33:26";
#include <stdio.h>
#include <string.h>
#include "./ascii.h"
extern void	prnt1Err();

/*	m a t c h ( )
 *	if string matches pattern, return 1, else return 0
 *	special characters:
 *		*	Matches any string including the null string.
 *		?	Matches any single character.
 *		[...]	Matches any one of the characters enclosed.
 *		[!..]	Matchea any character NOT enclosed.
 *		-	May be used inside brackets to specify range
 *			(i.e. str[1-58] matches str1, str2, ... str5, str8)
 *		\	Escapes special characters.
 */
match(	 pattern,  string )
register
char	*pattern, *string;
{
	do {
		switch( pattern[0] ) {
		case '*': /* Match any string including null string.	*/
			if( pattern[1] == '0' || string[0] == '0' )
				return	1;
			while( string[0] != '0' ) {
				if( match( &pattern[1], string ) )
					return	1;
				++string;
			}
			return	0;
		case '?': /* Match any character.			*/
			break;
		case '[': /* Match one of the characters in brackets
				unless first is a '!', then match
				any character not inside brackets.
			   */
			{ register char	*rgtBracket;
			  static int	negation;

			++pattern; /* Skip over left bracket.		*/
			/* Find matching right bracket.			*/
			if( (rgtBracket = strchr( pattern, ']' )) == NULL ) {
				prnt1Err( "Unmatched '['." );
				return	0;
			}
			/* Check for negation operator.			*/
			if( pattern[0] == '!' ) {
				++pattern;
				negation = 1;
			} else {
				negation = 0;
			}	
			/* Traverse pattern inside brackets.		*/
			for(	;
				pattern < rgtBracket
			     &&	pattern[0] != string[0];
				++pattern
				)
			{
				if(	pattern[ 0] == '-'
				    &&	pattern[-1] != '\\'
					)
				{
					if(	pattern[-1] <= string[0]
					    &&	pattern[-1] != '['
					    &&	pattern[ 1] >= string[0]
					    &&	pattern[ 1] != ']'
					)
						break;
				}
			}
			if( pattern == rgtBracket ) {
				if( ! negation ) {
					return	0;
				}
			} else {
				if( negation ) {
					return	0;
				}
			}
			pattern = rgtBracket; /* Skip to right bracket.	*/
			break;
			}
		case '\\': /* Escape special character.			*/
			++pattern;
			/* WARNING: falls through to default case.	*/
		default:  /* Compare characters.			*/
			if( pattern[0] != string[0] )
				return	0;
		}
		++pattern;
		++string;
	} while( pattern[0] != '0' && string[0]  != '0' );
	if( (pattern[0] == '0' || pattern[0] == '*' ) && string[0]  == '0' )
		return	1;
	else	return	0;
}
______________ tear here __ matchtest.c ___________________________________
/*
 *	SCCS id:	@(#) matchtest.c	1.1
 *	Last edit: 	8/14/84 at 11:32:48
 *	Retrieved: 	8/14/84 at 11:33:49
 *	SCCS archive:	/vld/moss/work/libWIND/s.matchtest.c
 *
 *	Author:		Gary S. Moss
 *			U. S. Army Ballistic Research Laboratory
 *			Aberdeen Proving Ground
 *			Maryland 21005
 *			(301)278-6647 or AV-283-6647
 */
#if ! defined( lint )
static char
sccsTag[] = "@(#) matchtest.c	1.1	last edit 8/14/84 at 11:32:48";
#endif
#include <stdio.h>
extern int	match();
char	*usage[] = {
"",
"matchtest(1.1)",
"",
"Usage: matchtest [pattern string]",
"",
"If no arguments are given, the program reads words on its standard input.",
"The program writes to its standard output.",
0
};
void		prntUsage(), prnt1Err();
static char	*pattern, *string;
static char	patbuf[BUFSIZ], strbuf[BUFSIZ];
/*	m a i n ( )
 */
main( argc, argv )
char	*argv[];
{
	if( ! parsArgv( argc, argv ) ) {
		prntUsage();
		exit( 1 );
	}
	if( pattern != NULL ) {
		if( match( pattern, string ) ) {
			(void) printf(	"'%s' matches '%s'.\n",
					pattern,
					string
					);
			exit( 0 );
		} else {
			(void) printf(	"'%s' does not match '%s'.\n",
					pattern,
					string
					);
			exit( 1 );
		}
	}
	while( scanf( "%s %s", patbuf, strbuf ) == 2 ) {
		if( match( patbuf, strbuf ) ) {
			(void) printf( "'%s' matches '%s'.\n", patbuf, strbuf );
		} else {
			(void) printf(	"'%s' does not match '%s'.\n",
					patbuf,
					strbuf
					);
		}
	}		
	exit( 0 );
}

/*	p a r s A r g v ( )
 */
parsArgv( argc, argv )
register char	**argv;
{
	register int	c;
	extern int	optind;
	extern char	*optarg;

	/* Parse options.					*/
	while( (c = getopt( argc, argv, "" )) != EOF ) {
		switch( c ) {
		case '?' :
			return	0;
		}
	}
	if( argc - optind != 2 ) {
		if( argc == optind ) {
			pattern = string = NULL;
		} else {
			(void) fprintf( stderr, "Arg count wrong!\n" );
			return	0;
		}
	} else {
		pattern = argv[optind++];
		string = argv[optind++];
	}
	return	1;
}

/*	p r n t U s a g e ( )
 *	Print usage message.
 */
void
prntUsage() {
	register char	**p = usage;
	while( *p )
		(void) fprintf( stderr, "%s\n", *p++ );
	return;
}

/*	p r n t 1 E r r ( )
 *	Print error message with 1 argument.
 */
void
prnt1Err( str )
char	*str;
{
	(void) fprintf( stderr, str );
	return;
}
________________________________________________________________________
-- Moss.

zarth@drutx.UUCP (CovartDL) (08/15/84)

I am not sure about the first stuff or that I understand the duplicate entry 
part but have you looked at the UNIX command uniq(1).

			Zarth Arn

chris@umcp-cs.UUCP (08/15/84)

The 4.2BSD malloc has (if what I've been told is correct) a range
checking option that will cause ``free'' and ``realloc'' to abort
if the area allocated has been overflowed.  I've never seen a
version of malloc & friends that allows one to insert calls to
malloc-checking routine as desired, although it shouldn't be too
hard to write.

If anyone is interested, I have a modified version of the CalTech
power-of-two malloc (the same one as used in 4.2BSD) with range
checking which works properly even on Pyramids, which I could
probably post.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

sjh@PURDUE.ARPA (08/15/84)

From:  Steve Holmes <sjh@PURDUE.ARPA>

 
>> We are searching for debugging techniques to track down malloc/realloc 
>> aborts.  Occasionally, our programs get memory faults within the stdio's 
>> _flsbuf() subroutine.  Does anyone have hints on tracking down the types of 
>> things that would clobber the malloc buffers?  
>
>I believe there are DEBUG defines you can turn on in the malloc source
>to compile a debugging version (depending on your OS).

WRT the malloc problem: I had the same problem for which the solution was
to initialize the pointers to the various blocks of storage to null on the
first call.  I don't have the code handy but could dig it up if needed.

Steve Holmes

pem1a@ihuxr.UUCP (Tom Portegys) (08/17/84)

On your awk/malloc problem:

We just got done finding a nasty bug in a lex program which
clobbered the malloc memory.  It either gave memory faults or
looped.  Turned out to be caused by an overflow of an array
called yytext, which holds the input characters being matched.
This array was compiled to hold 200 characters, yet an 
expression we wanted to match could far exceed that.  So lex
merrily proceeded to overflow the array and destroy the
malloc linkages.  The bug would appear when an fprintf attempted
to get some memory from malloc.  The answer was to either make the 
array yytext bigger, or to change the way to match the expression. 
We chose the latter.  We were also very disappointed that lex
did not do any special checking on this overflow problem.

                     Tom Portegys, Grant Rose, Mark Young
                     Bell Labs, Naperville, Ill.
                     ihnp4!ihuxr!pem1a

jim@ism780b.UUCP (08/18/84)

#R:chemabs:-13400:ism780b:25500016:000:1007
ism780b!jim    Aug 16 14:22:00 1984

malloc buffers get clobbered by falling off the end, usually by
an improper limit check on an array index.  A very frequent method
is
	newfoo = strcpy(malloc(strlen(foo)), foo);

which is wrong because strlen does not include the terminating NUL.

Several programs, such as make and find, contain subroutines which
do shell filename expansion.  For some reason these have never been
extracted out into a libc routine.

It is depressing that awk does not have so fundamental a function as
system(), so there is no way to execute other commands from within awk.
Nor is there any way to read input from more than one file.

the "uniq" command will eliminate duplicates, yield only the lines which
are not duplicates, yield only lines which are duplicates, or produce
each unique line with a count of occurrences, according to your fancy.

If you just want the first occurrence of a pattern in a given file,
use  sed '/pattern/q'  or  awk '/pattern/ {print; exit}'

-- Jim Balter, INTERACTIVE Systems (ima!jim)

jim@ism780b.UUCP (08/27/84)

#R:chemabs:-13400:ism780b:25500019:000:376
ism780b!jim    Aug 25 15:41:00 1984

>        print foo bar | echo

I missed "|" when reading the manual to provide my first answer.
Sorry about that.  But the above won't work.  What you really want is

	cmd = "echo " foo " " bar; print "" | cmd

Also, note that awk does not wait for the command to complete, yielding
results that are surprising, to say the least.

-- Jim Balter, INTERACTIVE Systems (ima!jim)