[comp.unix.wizards] Look! An xargs!!

bph@buengc.BU.EDU (Blair P. Houghton) (09/03/89)

Okay, kids.  Shell script 101 is now in session.

Try this, after putting  it in a file and turning on the execute flag.

    while read arrrrg
    do
	    $* $arrrrg
    done

It's in sh(1), so no whining from the "no csh(1)" gallery.  I'm sorry,
it's dirt simple.  That's why I'm not pleased that I have to post it.
You all should have thought of it yerselves.

It will not work for command2 if it requires special syntax for its file
arguments (e.g., dd(1)), and it allows only one filename per invocation
of command2 (not efficient if you have a parallel-file-grepping grep,
as comes with Umax 4.2 for the Encore Multimax multiprocessing
machine, and useless if you're trying to specify both input and
output filenames to a single command), but it should suffice for
([ef])grep.

				--Blair
				  "Sh(1) makes the sound of
				   two hands clapping seem
				   complex without bound."

cpcahil@virtech.UUCP (Conor P. Cahill) (09/03/89)

In article <4026@buengc.BU.EDU>, bph@buengc.BU.EDU (Blair P. Houghton) writes:
> Okay, kids.  Shell script 101 is now in session.
> 
> Try this, after putting  it in a file and turning on the execute flag.
> 
>     while read arrrrg
>     do
> 	    $* $arrrrg
>     done
> 

The reason why xargs was suggested was so the following type of operation
could be executed without fork/execing a grep for every file.

	find . [args] -print -exec grep [RE] {} ";"

Xargs will use the following structure AND process as many files
as it can in a single iteration:

	find . [args] -print | xargs grep [RE]

So your solution would be somewhat less effecient than using the 
find to exec the grep itself, since you would require a pipe,
a shell, and then the fork/exec FOR every file.



-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

deboor@buddy.Berkeley.EDU (Adam R de Boor) (09/04/89)

Shell scripts 201 (Graduate level :)

#!/bin/sh -
args=""
while read arg; do
	args="$args $arg"
done
$* $args


Problems with args like -m'this thing has spaces' when given to the above
script are left as an exercise for the reader.

While the lecturer in SS101 was wrong in his solution of the problem (or
perhaps misunderstood the problem itself), his contention that xargs can
be solved with a rudimentary shell script is correct, q.e.d. (or some
similar abbreviation, hee hee)

a

dce@Solbourne.COM (David Elliott) (09/04/89)

In article <4026@buengc.BU.EDU> bph@buengc.bu.edu (Blair P. Houghton) writes:
>Okay, kids.  Shell script 101 is now in session.
>
>Try this, after putting  it in a file and turning on the execute flag.
>
>    while read arrrrg
>    do
>	    $* $arrrrg
>    done

What if one of the members of $* or $arrrrg contains spaces?

If you're going to be pedantic, at least be correct ;-)

	eval ${1+"$@"} "$arrrrg"

-- 
David Elliott		dce@Solbourne.COM
			...!{uunet,boulder,nbires,sun}!stan!dce

"We don't do this because we love you or like you...we don't even know you!"

cpcahil@virtech.UUCP (Conor P. Cahill) (09/04/89)

In article <16816@pasteur.Berkeley.EDU>, deboor@buddy.Berkeley.EDU (Adam R de Boor) writes:
> 
> Shell scripts 201 (Graduate level :)
> 
> #!/bin/sh -
> args=""
> while read arg; do
> 	args="$args $arg"
> done
> $* $args

Yet another non-solution.  This one does not handle the problem where the
list of arguments exceeds the maximum length (usually 5120 bytes). 

Also there is a dependency that the args variable in the while loop is 
available outside the loop.  I have found that this is not always true 
due to the fact that the while loop may be implemented using a sub-shell.

I guess we need a post-graduate level course, or a beginners c program class.

-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/04/89)

In article <16816@pasteur.Berkeley.EDU> deboor@buddy.Berkeley.EDU.UUCP (Adam R de Boor) writes:
-Shell scripts 201 (Graduate level :)
-While the lecturer in SS101 was wrong in his solution of the problem (or
-perhaps misunderstood the problem itself), his contention that xargs can
-be solved with a rudimentary shell script is correct, q.e.d. (or some
-similar abbreviation, hee hee)

The first solution was actually better, because it didn't bump into
the exec args limit that is the main reason for xargs's existence.

deboor@buddy.Berkeley.EDU (Adam R de Boor) (09/05/89)

ooooooooo. meanie! I guess I've been pampered by using the Sun sh. Gotten
burned a few times from having {} be in a subshell, but never a while loop.
This one will do more work, but should be correct. Of course, I don't have
a limited sh, so who knows? I've tested this on my system and it does
The Right Thing... 

I *do* hope you'll have the good grace to not mention how early versions of
the shell didn't accept comments? (lots o' grins here).

#!/bin/sh -
#
# Remove all possible temp files we might create on exit or signal...
#
tmpbase=/tmp/xargs$$
trap "rm -f ${tmpbase}*; exit 0" 0 1 2 3 15

#
# Assume command has no spaces, but quote the rest of the args to make sure
# If your system doesn't have echo -n, remove the -n and add a \c at the
# end of each string.
#
cmd=`echo -n "$1 "; shift; for i in "$@"; do echo -n "'$i' "; done`
#
# Copy all the args to a file for easier processing without worry about
# overflowing shell variables.
#
cat - > $tmpbase
#
# If file small enough (say 4000 bytes), just execute the command directly.
# Use "set" to put the length and filename produced by "wc" into $1 and $2
# since we're done with our args anyway.
#
# Note we need to use tr to map from newline to space in all the files,
# else the shell will execute the command on the first file, then try
# and execute all the remaining files, since it keeps the newlines from the cat
#
# "eval" is necessary to cause $cmd to be broken into component words. The
# ''s we put around $2-n will keep spaces in them from getting in the way,
# as well as avoiding any wildcard expansion that might otherwise happen.
#
set - `wc -c $tmpbase`
if [ $1 -le 4000 ]; then
    eval "$cmd `cat $tmpbase | tr '\012' ' '`"
else
    #
    # Tough luck. Since we're working on filenames, we'll be really
    # conservative and break the list into 50-file groups. That gives
    # 80-chars per name with our conservative limit of 4000 (though
    # all the MAXARGS system constants I've seen are above 10000)
    #
    split -50 $tmpbase ${tmpbase}s
    for i in ${tmpbase}s*; do
	eval "$cmd `cat $i | tr '\012' ' '`"
    done
fi

deboor@buddy.Berkeley.EDU (Adam R de Boor) (09/05/89)

> The first solution was actually better, because it didn't bump into
> the exec args limit that is the main reason for xargs's existence.

I beg to differ: the first solution was worse than using the -exec arg
to find since it still executed the command for each found file but
had the overhead of the shell and pipe and while loop to contend with
besides. It also didn't handle args with spaces in them :)

a

dg@lakart.UUCP (David Goodenough) (09/05/89)

OK - will you all get off my case, already!!

#! /bin/sh

awk 'BEGIN   {
		command = "'$1'"
		rec = command
	     }
	     {
		for (i = 1; i <= NF; i++)
		 {
		    if (length(rec " " $i) > 512 && rec != command)
		     {
			print rec
			rec = command
		     }
		    rec = rec " " $i
		 }
	     }
     END     {
		if (rec != command)
		    print rec
	     }'

The 512 can be changed to match the limits of whatever you want - it could
even be made a second parameter to this grungy little shell script. Or if
you want to make command everything else on the invocation line:

xargs grep -i foobar

type of thing, then the $1 in the BEGIN section can be changed to $*

So there :-P  Like I said, I do it all with awk :-)
-- 
	dg@lakart.UUCP - David Goodenough		+---+
						IHS	| +-+-+
	....... !harvard!xait!lakart!dg			+-+-+ |
AKA:	dg%lakart.uucp@xait.xerox.com			  +---+

bob@wyse.wyse.com (Bob McGowen Wyse Technology Training) (09/07/89)

In article <1126@virtech.UUCP> cpcahil@virtech.UUCP (Conor P. Cahill) writes:
>In article <16816@pasteur.Berkeley.EDU>, deboor@buddy.Berkeley.EDU (Adam R de Boor) writes:
>> 
---deleted---
>> while read arg; do
>> 	args="$args $arg"
>> done
---deleted---
>
>Yet another non-solution.  This one does not handle the problem where the
>list of arguments exceeds the maximum length (usually 5120 bytes). 

true, but a test could be added to avoid this problem.  As a quick and
not too efficient solution:

	if [ `echo $args|wc -c` -gt 3000 ]
	then
	    $* $args
	fi
A final product would require better checking, I think, and testing of
command line items, etc, but this would seem to be a way to start.

>Also there is a dependency that the args variable in the while loop is 
>available outside the loop.  I have found that this is not always true 
>due to the fact that the while loop may be implemented using a sub-shell.

ONLY when the loop's input or output is redirected to/from other programs or
a file.  If this is not the case the loop will always execute in the current
shell.  Thus the form in the example is quite valid.

>I guess we need a post-graduate level course, or a beginners c program class.

OK, but are you willing to pay for it?

Bob McGowan  (standard disclaimer, these are my own ...)
Customer Education, Wyse Technology, San Jose, CA
..!uunet!wyse!bob
bob@wyse.com

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/07/89)

In article <2404@wyse.wyse.com> bob@wyse.UUCP (Bob McGowen Wyse Technology Training) writes:
->Yet another non-solution.  This one does not handle the problem where the
->list of arguments exceeds the maximum length (usually 5120 bytes). 
-	if [ `echo $args|wc -c` -gt 3000 ]
-	then
-	    $* $args
-	fi

Worse and worse.  Now it does nothing at all when there are a lot of
arguments (which is usually the case for "xargs"!).

bob@wyse.wyse.com (Bob McGowen Wyse Technology Training) (09/08/89)

In article <10967@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <2404@wyse.wyse.com> bob@wyse.UUCP (Bob McGowen Wyse Technology Training) writes:
---deleted code---
>Worse and worse.  Now it does nothing at all when there are a lot of
>arguments (which is usually the case for "xargs"!).

My apologies for not attempting to write a complete, debugged and fully
functional program for you.

My objective, which has also been the contention of others, is to state
that you CAN implement xargs in the shell (or awk or ?) and handle the
problems, NOT to write the whole thing!

(I noticed that someone had just posted a "complete" xargs written in awk.
Perhaps this will meet all your objections and solve all the problems you
can think of.)

I would also like to know the circumstances which prompt you to state that
xargs will do nothing when there are a lot of "arguments".  Do you mean
the items piped in or those on the xargs command line?  The manual and
my experience indicate that the program will work fine with large amounts
of standard input, it generates a series of invocations of the command to
run without any problems that I can see.  Note that I have only used the
xargs default buffer (which the manual (XENIX 2.3) says is 470 characters
max) which may be different in your environment.

Cheers!

Bob McGowan  (standard disclaimer, these are my own ...)
Customer Education, Wyse Technology, San Jose, CA
..!uunet!wyse!bob
bob@wyse.com

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/08/89)

In article <2415@wyse.wyse.com> bob@wyse.UUCP (Bob McGowen Wyse Technology Training) writes:
>I would also like to know the circumstances which prompt you to state that
>xargs will do nothing when there are a lot of "arguments".

Nothing has ever prompted me to state that!  What I said was, your attempted
example of an xargs replacement had that problem.  I don't see the point of
an xargs replacement that didn't add SOMEthing beyond simply typing the
desired command to the shell in the first place, and your script in effect
was a somewhat elaborate implementation of the shell's `` facility, with
the additional feature that it would not even attempt to run a command
under precisely the circumstances that lead one to choose to use xargs.

Why not simply write a genuine xargs implementation, say in C where you
can do it right without a lot of hassle.

barnett@crdgw1.crd.ge.com (Bruce Barnett) (09/09/89)

In article <10978@smoke.BRL.MIL>, gwyn@smoke (Doug Gywn) writes:

>Why not simply write a genuine xargs implementation, say in C where you
>can do it right without a lot of hassle.

Hear! hear!

Who really cares if you can *ALMOST* do it in 35 lines when you can do
it *RIGHT* in 70?

Forgive this posting of the sources in this newsgroup, but maybe it will
reduce the noise level. Here is the version of xargs.c from comp.sources.whatever

/* xargs -- quickie version of System V xargs(1): read a list of
 *	arguments from stdin, apply to the command which is
 *	the argument(s) of xargs
 */

#include <stdio.h>

char *cmdname;		/* should point to argv[0] */

char command[BUFSIZ];	/* command given to xargs */
char line[BUFSIZ];	/* current input line */
char cmdbuf[BUFSIZ];	/* command + input lines */

main(argc, argv)
	int argc;
	char *argv[];
{
	char *gets();
	char *strcat(), *strcpy();

	cmdname = argv[0];

	/* skip (xargs) command name */

	argv++, argc--;

	/* construct command from arguments */

	strcpy(command, "exec");
	while (argc--) {
		(void) strcat(command, " ");
		(void) strcat(command, *argv++);
	}

	/* initialize for command execution loop */

	(void) strcpy(cmdbuf, command);

	/* here's where all the action is: read in arguments
	 * from stdin, appending to the current command buffer
	 * if next line would overflow command buffer, execute
	 * command buffer and reinitialize
	 */

	while (gets(line) != NULL) {

		/* the "+2" is for the blank and trailing null char */

		if (strlen(cmdbuf)+strlen(line)+2 > BUFSIZ) {
			docmd(cmdbuf);
			(void) strcpy(cmdbuf, command);
		}
		(void) strcat(cmdbuf, " ");
		(void) strcat(cmdbuf, line);
	}

	/* see if there is any left to do */

	if (strlen(cmdbuf) != strlen(command)) {
		docmd(cmdbuf);
	}
}

docmd(cmdbuf)
char *cmdbuf;
{
	return system(cmdbuf);
}

--
Bruce G. Barnett	<barnett@crd.ge.com>   uunet!crdgw1!barnett

andrew@alice.UUCP (Andrew Hume) (09/09/89)

please, no more!!
people wanting to write xargs (or programs sort of like xargs)
should form their own newsgroup. every implementation seen so far,
and there have been a swag of them, has been wrong. in the sense
of being incorrect. if you really feel like you REALLY have to
write an xargs, presumably as a rite of passage, test it before
posting it (to some other group). a minimal test would include
the horror directory that chesson or bourne used to keep that had
all the legal single character filenames.

leo@philmds.UUCP (Leo de Wit) (09/11/89)

In article <2205@crdgw1.crd.ge.com> barnett@crdgw1.crd.ge.com (Bruce Barnett) writes:
|In article <10978@smoke.BRL.MIL>, gwyn@smoke (Doug Gywn) writes:
|
|>Why not simply write a genuine xargs implementation, say in C where you
|>can do it right without a lot of hassle.
|
|Hear! hear!
|
|Who really cares if you can *ALMOST* do it in 35 lines when you can do
|it *RIGHT* in 70?
|
|Forgive this posting of the sources in this newsgroup, but maybe it will
|reduce the noise level. Here is the version of xargs.c from comp.sources.whatever

    [next wrong xargs implementation deleted]

This program doesn't handle spaces in arguments correctly. Moreover,
since the system() function is used to fire up the command, an
(incorrect) re-interpretation of the arguments is done (think of
metacharacters like ',\,",$,` etc). This can be dealt with much better
by using argv[] directly (and execve(), left as an exercise for the
reader).  Another bad feature is the use of gets(); I'd like to feed a
> BUFSIZ line to your xarrrrrgs. Lastly, your xargs does a poor
performance job:  the system imposed limit on argument lists is
typically much higher than BUFSIZ.

   Leo.

barnett@crdgw1.crd.ge.com (Bruce Barnett) (09/12/89)

In article <2205@crdgw1.crd.ge.com> I posted the
sources to xargs.c

In article <1086@philmds.UUCP>, leo@philmds (Leo de Wit) writes:
>This program doesn't handle spaces in arguments correctly. Moreover,
>since the system() function is used to fire up the command, an
>(incorrect) re-interpretation of the arguments is done (think of
>metacharacters like ',\,",$,` etc). This can be dealt with much better
>by using argv[] directly (and execve(), left as an exercise for the
>reader).  Another bad feature is the use of gets(); I'd like to feed a
>> BUFSIZ line to your xarrrrrgs. Lastly, your xargs does a poor
>performance job:  the system imposed limit on argument lists is
>typically much higher than BUFSIZ.

I forgot to include the source of the xargs I posted.

It was mod.sources, V3, n106. Written by Gordon A. Moffett. Posted Feb, 1986.

I might as well include Gordon's disclaimer:

"Here is a reimplementation of the System V utility xargs.  I haven't
heard any complaints about it, though [1] There is room for improvement
regarding the command buffer size (tho' it is better than the System V
area in that particular regard) [2] It does not have all the features
of the System V version (as the man page points out)."


I suggest if someone finds this program offensive, they should fix
it up and send a new version to comp.sources.unix. I can assure you
I will not be offended.

--
Bruce G. Barnett	<barnett@crd.ge.com>   uunet!crdgw1!barnett

rbj@dsys.ncsl.nist.gov (Root Boy Jim) (09/19/89)

? From: Bruce Barnett <barnett@crdgw1.crd.ge.com>

? Who really cares if you can *ALMOST* do it in 35 lines when you can do
? it *RIGHT* in 70?

Any program that uses `gets()' isn't done right.

? Bruce G. Barnett	<barnett@crd.ge.com>   uunet!crdgw1!barnett

	Root Boy Jim and
	the GNU Bohemians