[news.software.b] C News sort -u | batcher ?

ronald@robobar.co.uk (Ronald S H Khoo) (09/26/90)

A while back, Henry suggested that the input to batcher might
be put through sort -u to remove duplicates caused by certain
types of sys file constructions involving multiple entries
(which I had asked about after stl!dww had pointed out certain
 advantageous tricks you could do with them).

I forgot about all this until I (finally :-) got round to reading the
C News paper this afternoon and realised, of course, that batcher does
actually gain from having sorted input.  You might not recover all of
the cost of doing the sort, but you certainly get some of it back.

The figures are too coarse using /bin/time and the disc subsystem on a
PC is rubbish -- but for a few of today's 400k batches the cost of
batcher is something like 2.5 to 3 seconds system, virtually zero user
and 25 elapsed (you see what I mean about PC discs ? :-() The sort costs
about 0.3 u+s and the saving in batcher seems to be not far from that.

[ In looking at this, I noticed that batcher won't read stdin, which
 is a shame though the extra cost of 'rm $tempfile' doesn't seem to
 be particularly significant ]

Unfortunately, these back of the envelope "measurements" on a few of
today's togo.* are rather buried in the noise level, but someone who
might have held back from putting in the sort for cost reasons might
now be less unwilling.

I thouroughly recommend the C news paper to any C News admin.  It was an
excellent read.  Perhaps there should be a louder pointer to it in the C
News distribution ? I couldn't find one there at all!
-- 
   ronald@robobar.co.uk | +44 81 991 1142 (O) | +44 71 229 7741 (H) | YELL!
   "Nothing sucks like a VAX"   --   confirmed after recent radiator burst!
Hit 'R' <RETURN> to continue .....

henry@zoo.toronto.edu (Henry Spencer) (09/27/90)

In article <1990Sep25.233128.10037@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:
>I forgot about all this until I (finally :-) got round to reading the
>C News paper this afternoon and realised, of course, that batcher does
>actually gain from having sorted input.  You might not recover all of
>the cost of doing the sort, but you certainly get some of it back.

There are, unfortunately, some other costs which are harder to quantify.
In particular, sorting can result in out-of-order delivery of articles,
when cross-postings are involved.  Granted, out-of-order delivery exists
already, but I'm unhappy about the thought of knowingly making it worse.
This is why the standard C News batcher doesn't sort.

>I thouroughly recommend the C news paper to any C News admin.  It was an
>excellent read.  Perhaps there should be a louder pointer to it in the C
>News distribution ? I couldn't find one there at all!

A good point, although we'd have to include some details on obtaining it,
since most libraries don't carry the Usenix proceedings.
-- 
TCP/IP: handling tomorrow's loads today| Henry Spencer at U of Toronto Zoology
OSI: handling yesterday's loads someday|  henry@zoo.toronto.edu   utzoo!henry

dylan@ibmpcug.co.uk (Matthew Farwell) (09/28/90)

In article <1990Sep26.193235.10920@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <1990Sep25.233128.10037@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:
>>I thouroughly recommend the C news paper to any C News admin.  It was an
>>excellent read.  Perhaps there should be a louder pointer to it in the C
>>News distribution ? I couldn't find one there at all!
>
>A good point, although we'd have to include some details on obtaining it,
>since most libraries don't carry the Usenix proceedings.

Why not just include it in the distribution? Its only 35k.

Dylan.
-- 
Matthew J Farwell                 | Email: dylan@ibmpcug.co.uk
The IBM PC User Group, PO Box 360,|        ...!uunet!ukc!ibmpcug!dylan
Harrow HA1 4LQ England            | CONNECT - Usenet Access in the UK!!
Phone: +44 81-863-1191            | Sun? Don't they make coffee machines?

henry@zoo.toronto.edu (Henry Spencer) (09/30/90)

In article <1990Sep28.165901.22657@ibmpcug.co.uk> dylan@ibmpcug.CO.UK (Matthew Farwell) writes:
>>>I thouroughly recommend the C news paper to any C News admin. ...
>>
>Why not just include it in the distribution? Its only 35k.

There are lots of things that it would sort of be nice to include in the
distribution!  We've been balking for the sake of keeping the size down.

I agree that it's an interesting read :-), but it doesn't really tell
you anything that's important to getting the software working, I think.
-- 
Imagine life with OS/360 the standard  | Henry Spencer at U of Toronto Zoology
operating system.  Now think about X.  |  henry@zoo.toronto.edu   utzoo!henry