[net.unix-wizards] comm '! sort file1' '! sort file2'

ado@elsie.UUCP (Arthur David Olson) (09/22/86)

I regularly want to run "comm" on a pair of unsorted files.  So I ended up
writing a shell script to do the job, the guts of which (simplified for the
purposes of this article) is:

	tmp=/tmp/\#scomm.$$
	sort "$1" > $tmp
	sort "$2" | comm $tmp -
	rm $tmp

All fine and dandy.  But then I got to thinking. . .that what I'd *really*
like to be able to do is use a command like

	comm '! sort file1' '! sort file2'

(where the space after the '!' should be your clue that I use csh) and have
comm do the dirty work.

And then I got to thinking. . .that if fopen turned
	fopen("!whatever", "r")
calls into
	popen("whatever", "r")
calls (with other changes as necessary), then what I'd like to see happen
with the "comm" command would happen generally.

Comments?
--
UNIX is a registered trademark of AT&T.
--
	UUCP: ..decvax!seismo!elsie!ado   ARPA: elsie!ado@seismo.ARPA
	DEC, VAX, Elsie & Ado are Digital, Borden & Ampex trademarks.

aburt@isis.UUCP (Andrew Burt) (09/24/86)

Rather than play with the functionality of fopen to allow for "!cmd..."
as an argument and popen it, as suggested, how about the traditional
shell script approach:

	% cat .../bin/cmd
	eval "$@" "> /tmp/cmd$$"
	echo /tmp/cmd$$
	% comm `cmd sort file1...` `cmd sort file2...`

The main flaw is that it can't know when you're done to remove the
tmp files.

On the original note, though, of modifying fopen: boo! hiss!  Some
programs do direct open(2)'s, meaning not every command could be expected
to handle the !cmd syntax.  Shades of MS-DOS and the wildcards that
not every program can handle...

And restricting the meaning of ! as a first character in a filename
at the kernel level doesn't seem true to the Unix spirit (besides which
the kernel shouldn't have to decided if you want to "csh -c" it or "sh -c"
it).

Perhaps a whimsical solution would be a new file type: when opened via
open(2) the kernel runs the command the file has as its contents.
A few syscalls to manipulate it as text (to get the commands into and
out of the file) and we're set.  Call 'em "run" files.

The 'cmd' program above would then set the command given as $* into
such a "run" file then return the name as it does.  (A "remove on close"
flag set by the special "run" file syscalls would also alleviate the
problem of the file persisting after the command finishes.  Even without
this flag, the "run" file would probably be much smaller than the tmp
files left by my 'cmd' program above.)
-- 

Andrew Burt
isis!aburt   or   aburt@udenver.csnet

ggs@ulysses.UUCP (Griff Smith) (09/24/86)

> I regularly want to run "comm" on a pair of unsorted files.  So I ended up
> writing a shell script to do the job, the guts of which (simplified for the
> purposes of this article) is:
> 
> 	tmp=/tmp/\#scomm.$$
> 	sort "$1" > $tmp
> 	sort "$2" | comm $tmp -
> 	rm $tmp
> 
> All fine and dandy.  But then I got to thinking. . .that what I'd *really*
> like to be able to do is use a command like
> 
> 	comm '! sort file1' '! sort file2'
> 
> (where the space after the '!' should be your clue that I use csh) and have
> comm do the dirty work.
> 
> And then I got to thinking. . .that if fopen turned
> 	fopen("!whatever", "r")
> calls into
> 	popen("whatever", "r")
> calls (with other changes as necessary), then what I'd like to see happen
> with the "comm" command would happen generally.
> 
> Comments?
> --
> UNIX is a registered trademark of AT&T.
> --
> 	UUCP: ..decvax!seismo!elsie!ado   ARPA: elsie!ado@seismo.ARPA
> 	DEC, VAX, Elsie & Ado are Digital, Borden & Ampex trademarks.

And then I get to re-build all the commands with the new library.
This is exactly what the shell was designed to avoid!  Such features
belong in the shell!  The Korn shell supports the syntax

	comm <(sort file1) <(sort file2)

The expression "<(sort file1)" turns into /dev/fd/n, where "n" is
the file descriptor number for the pipe.  True, this only works
on V8 and on systems that have had the /dev/std{in,out,err} feature
added to the kernel, but it is a much cleaner solution.  Then
I get to worry about

	diff <(sort file1) <(sort file2)

which fails because diff can't seek on the pipe.
-- 

Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		(201) 582-7736
UUCP:		{allegra|ihnp4}!ulysses!ggs
Internet:	ggs@ulysses.uucp

jason@hpcnoe.UUCP (Jason Zions) (09/25/86)

> / ado@elsie.UUCP (Arthur David Olson) /  8:20 am  Sep 22, 1986 /
> I regularly want to run "comm" on a pair of unsorted files.
> 	[ ... ]
> All fine and dandy.  But then I got to thinking. . .that what I'd *really*
> like to be able to do is use a command like
> 
> 	comm '! sort file1' '! sort file2'
> 
> (where the space after the '!' should be your clue that I use csh) and have
> comm do the dirty work.
> 
> And then I got to thinking. . .that if fopen turned
> 	fopen("!whatever", "r")
> calls into
> 	popen("whatever", "r")
> calls (with other changes as necessary), then what I'd like to see happen
> with the "comm" command would happen generally.

You can do this more easily. Create a program, bang.c, the name of whose
executable is  !  . All this sucker does is a mktemp, then executes the
rest of the command line as a command, redirecting stdout into the temporary
file. !'s output is just the name of the temporary file.

Admittedly, it's a bit dirty; the file isn't deleted, so /tmp could fill up
quick. Perhaps ! could fork; the child sleeps for 5 minutes, then unlinks
the temporary file. This depends on comm (or whatever) opening the temporary
file soon enough. You could make the 5 minutes into 1 hour and be pretty sure.

I would select some character other than ! as a command name; perhaps @. Not
sacred to any shell that I know of.

How's that?
--
This is not an official statement of Hewlett-Packard Corp., and does not 
necessarily reflect the views of HP. It is provided completely without warranty
of any kind. Lawyers take 3d10 damage and roll a saving throw vs. ego attack.

Jason Zions				Hewlett-Packard
Colorado Networks Division		3404 E. Harmony Road
Mail Stop 102				Ft. Collins, CO  80525
	{ihnp4,seismo,hplabs,gatech}!hpfcdc!hpcnoe!jason

chris@pixutl.UUCP (chris) (10/02/86)

In article <830003@hpcnoe.UUCP>, jason@hpcnoe.UUCP (Jason Zions) writes:
> > / ado@elsie.UUCP (Arthur David Olson) /  8:20 am  Sep 22, 1986 /
> > I regularly want to run "comm" on a pair of unsorted files.
> > 	[ ... ]
> > All fine and dandy.  But then I got to thinking. . .that what I'd *really*
> > like to be able to do is use a command like
> > 
> > 	comm '! sort file1' '! sort file2'
> > 

It can be done easily on SYSV, using fifo's. I keep one around in my home
directory for this kind of things.  This is what you do, after creating
the fifo (mknod -p ~/fifo).

sort file1 > ~/fifo & sort file2 | comm - ~/fifo

				Chris

-- 

 Chris Bertin       :  (603) 881-8791 x218
 xePIX Inc.         :
 51 Lake St         :  {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}\
 Nashua, NH 03060   :     !wjh12!pixel!pixutl!chris

chris@umcp-cs.UUCP (Chris Torek) (10/07/86)

>/ ado@elsie.UUCP (Arthur David Olson) /  8:20 am  Sep 22, 1986 /
>>I regularly want to run "comm" on a pair of unsorted files.
>> 	[ ... ]
>>All fine and dandy.  But then I got to thinking. . .that what I'd *really*
>>like to be able to do is use a command like
>> 
>> 	comm '! sort file1' '! sort file2'
>> 

In article <35@pixutl.UUCP> chris@pixutl.UUCP (chris) writes:
>It can be done easily on SYSV, using fifo's.

Unfortunately, one must be root to create a fifo, no?

It should (but is not presently) possible to do this in 4.3BSD
using AF_UNIX sockets.  The socket `file' that is created should
be usable by any ordinary process.  An open() could be translated
into a socket(),connect() pair, and the process would then be
talking to the creator of the socket.

Perhaps in 4.4...
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

ekrell@ulysses.UUCP (Eduardo Krell) (10/08/86)

I just implemented this feature on SVR3 by adding "seekable pipes". What I really
did was to port the /dev/fd hack from V8 and I also added a new file control
command to "block on eof".
  When a process opens a file for reading with this flag, an attempt to read() past
the end of file will block provided some other process is writing to that file.
When data is written to the file, the reader will awake and continue as normal.
Very similar to pipes.
  When you type (in ksh) "diff <(sort file1) <(sort file2)", the following happens

1) a file is created in /tmp. it is opened once for reading and once for writing.
an fcntl() is done to add the "block on eof" attribute. It is then unlink()ed.

2) A pipe is created to read from "sort file1". Say the reader file descriptor is 3.
The "<(sort file2)" argument is replaced by "/dev/fd/3". Same thing happens to the
second argument.

3) When diff opens /dev/fd/3, it will be actually reading from the pipe. seek()s
are ok since it is actually reading from a file in /tmp.

4) Since the file was unlink()ed, it will go away after both sort and diff finish.

-- 
    
    Eduardo Krell                   AT&T Bell Laboratories, Murray Hill

    {ihnp4,seismo,ucbvax}!ulysses!ekrell

chris@pixutl.UUCP (chris) (10/08/86)

In article <3724@umcp-cs.UUCP>, chris@umcp-cs.UUCP writes:
> In article <35@pixutl.UUCP> chris@pixutl.UUCP (chris) writes:
> >It can be done easily on SYSV, using fifo's.
> 
> Unfortunately, one must be root to create a fifo, no?
> 
> In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
> UUCP:	seismo!umcp-cs!chris
> CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

No, you don't have to be root to create a fifo. SYSV is not THAT stupid!

Chris
-- 

 Chris Bertin       :  (603) 881-8791 x218
 xePIX Inc.         :
 51 Lake St         :  {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}\
 Nashua, NH 03060   :     !wjh12!pixel!pixutl!chris

geoff@desint.UUCP (Geoff Kuenning) (10/09/86)

In article <3724@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:

> Unfortunately, one must be root to create a fifo, no?

Wrong.  I just did "mknod fifo p" from a normal account, and it worked
like a charm.
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff

bruce@stride.UUCP (Bruce Robertson) (10/09/86)

In article <3724@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>Unfortunately, one must be root to create a fifo, no?

No.  You don't have to be root to create fifo's; the mknod() system
call works for everyone in the special case of a fifo.
-- 

	Bruce Robertson
	UUCP: cbosgd!utah-cs!utah-gr!stride!bruce
	ARPA: stride!bruce@utah-gr.arpa

ado@elsie.UUCP (Arthur David Olson) (10/10/86)

On the floor:  allowing something such as
	comm '! sort file1' '! sort file2'
to set up multiple pipes into a command.

In article <830003@hpcnoe.UUCP>, jason@hpcnoe.UUCP (Jason Zions) writes:
> It can be done easily on SYSV, using fifo's. . .This is what you do,
> after creating the fifo (mknod -p ~/fifo).
>
> sort file1 > ~/fifo & sort file2 | comm - ~/fifo

In article <3724@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
> Unfortunately, one must be root to create a fifo, no?

In article <264@desint.UUCP>, geoff@desint.UUCP (Geoff Kuenning) writes:
> . . .I just did "mknod fifo p" from a normal account, and it worked. . .

And in this article I write:
Even if a normal can create a fifo, it may not be the wisest thing to do.
Think of the havoc that results if I put a
	sort "$1" > ~/fifo & sort "$2" | comm - ~/fifo
command into a shell script (named, for example, "scomm") and then,
six months from now when I've forgotten the implementation details,
type in
	scomm firstfile secondfile > results &
	scomm thirdfile fourthfile > moreresults &

The "/dev/fd/n" mechanism discussed in other recent postings doesn't suffer
from this defect.
--
UNIX is a registered trademark of AT&T.
--
	UUCP: ..decvax!seismo!elsie!ado   ARPA: elsie!ado@seismo.ARPA
	DEC, VAX, Elsie & Ado are Digital, Borden & Ampex trademarks.

guy@sun.uucp (Guy Harris) (10/11/86)

> Think of the havoc that results if I put a
> 	sort "$1" > ~/fifo & sort "$2" | comm - ~/fifo
> command into a shell script (named, for example, "scomm") and then,
> six months from now when I've forgotten the implementation details,
> type in
> 	scomm firstfile secondfile > results &
> 	scomm thirdfile fourthfile > moreresults &

At which point you say "Since the FIFO is a temporary file, I shouldn't just
save a FIFO in my home directory for a rainy day and use that; I should
create it at the beginning of the script, using "$$" in the name to make the
name unique, and delete it at the end!" and change the script to read like:

	fifoname=/tmp/scomm.$$
	trap "rm -f $fifoname; exit 1" 1 2 15
	/etc/mknod $fifoname p
	sort "$1" > $fifoname & sort "$2" | comm - $fifoname
	rm -f $fifoname
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

chris@pixutl.UUCP (chris) (10/14/86)

In article <7028@elsie.UUCP>, ado@elsie.UUCP (Arthur David Olson) writes:
>
> Even if a normal can create a fifo, it may not be the wisest thing to do.
> Think of the havoc that results if I put a
> 	sort "$1" > ~/fifo & sort "$2" | comm - ~/fifo
> command into a shell script (named, for example, "scomm") and then,
> six months from now when I've forgotten the implementation details,
> type in
> 	scomm firstfile secondfile > results &
> 	scomm thirdfile fourthfile > moreresults &
> 

Think of the havoc that would be created if 6 months from now you forgot
the implementation details of 'rm -r'... You can create temporary fifo's
if you want to fix that problem.
	
	[ ${#} -ne 2 ] && { echo "Usage ${0} file1 file2"; exit 1; }
	trap "rm -f /tmp/fifo$$" 0 1 2 3
	/etc/mknod /tmp/fifo$$ p || { echo "can't create fifo" ; exit 1; }
 	sort "${1}" > /tmp/fifo$$ & sort "${2}" | comm - /tmp/fifo$$

Chris
-- 

 Chris Bertin       :  (603) 881-8791 x218
 xePIX Inc.         :
 51 Lake St         :  {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}\
 Nashua, NH 03060   :     !wjh12!pixel!pixutl!chris

rossc@metro.oz (Ross Cartlidge) (10/23/86)

>ado@elsie.UUCP (Arthur David Olson) /  8:20 am  Sep 22, 1986 /
>...
>All fine and dandy.  But then I got to thinking. . .that what I'd *really*
>like to be able to do is use a command like
> 	comm '! sort file1' '! sort file2'

Using the "2dpipe" set of commands I sent to net.sources
the above command can be implemented by

	comm `1 sort file1` `1 sort file2`