[news.software.b] CNews building up files in "in.coming"

mjb@acd4.UUCP ( Mike Bryan ) (10/23/89)

I've noticed that when C News is unbatching articles in the in.coming
directory, it doesn't remove the batch files until it is completely
done.  It seems to me that newsrun should remove each file in turn
immediately after unbatching the articles it contains.  In the current
implementation, you have to have room to store each new article twice
(actually about 1.5 times, since the batched file is compressed.)  I'm
currently running patch 23-Jul-89.  (Haven't made time to upgrade to
latest patch yet; my company expects me to do real work!!!)

Apparently newsrun will at least remove these files in the middle of
unbatching if it sees the disk space creep too low.  Since this
lessens the problem of running out of space due to keeping the batches
around, the whole question is probably moot.  I'm curious, though; why
not just remove each file when you are done with it?

Normally, this wouldn't cause me any problems.  However,
/usr/spool/news is not its own segment on our system, it shares with a
couple of other lightly-used directory trees.  Earlier this week our
news feed got screwed up overnight, and I was unbatching nearly a
whole day's worth of news around 9AM.  If we were doing any other work
on that segment, there would have existed a chance of running the
system out of space before CNews finished its current file.  [We
weren't, however.]  Granted, it's a slim chance, but it would be
lessened if CNews would remove batch files one at a time.

-- 
Mike Bryan, Applied Computing Devices, 100 N Campus Dr, Terre Haute IN 47802
Phone: 812/232-6051  FAX: 812/231-5280  Home: 812/232-0815
UUCP: uunet!acd4!mjb  INTERNET: mjb%acd4@uunet.uu.net
"Agony is born of desire; that's what you get for wanting." --- Moev

henry@utzoo.uucp (Henry Spencer) (10/23/89)

In article <1989Oct22.192159.4827@acd4.UUCP> mjb@acd4.UUCP ( Mike Bryan          ) writes:
>I've noticed that when C News is unbatching articles in the in.coming
>directory, it doesn't remove the batch files until it is completely
>done.  It seems to me that newsrun should remove each file in turn
>immediately after unbatching the articles it contains...

If you are running a current C News, it will (a) remove the batch files
after every 50 files processed, or (b) remove the batch file after each
file processed, depending on whether space is tight or not.  This was
a deliberate change from our earlier version that always did (b).

>... I'm curious, though; why
>not just remove each file when you are done with it?

Because there is a noticeable performance loss in doing *anything*
unnecessary on each pass through the inner loop.  As it is, on systems
with high-performance shells (with the "test" command built in), when
space is plentiful the inner loop runs compress and relaynews, period.

>... If we were doing any other work
>on that segment, there would have existed a chance of running the
>system out of space before CNews finished its current file.  [We
>weren't, however.]  Granted, it's a slim chance, but it would be
>lessened if CNews would remove batch files one at a time.

This sort of thing is the reason why spacefor, as shipped, has safety
margins built in.  C News as shipped will fall back on strategy (b)
when space margin falls below 5000 blocks.  If you have programs that
suddenly eat multiple megabytes on your news filesystems, then you need
to either crank the margins up or modify the shell files to change the
strategy.  Setting the margins to zero is a bad idea no matter how
well-behaved your system is.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bill@twwells.com (T. William Wells) (10/25/89)

In article <1989Oct23.023759.17067@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
: In article <1989Oct22.192159.4827@acd4.UUCP> mjb@acd4.UUCP ( Mike Bryan          ) writes:
: >I've noticed that when C News is unbatching articles in the in.coming
: >directory, it doesn't remove the batch files until it is completely
: >done.  It seems to me that newsrun should remove each file in turn
: >immediately after unbatching the articles it contains...
:
: If you are running a current C News, it will (a) remove the batch files
: after every 50 files processed, or (b) remove the batch file after each
: file processed, depending on whether space is tight or not.  This was
: a deliberate change from our earlier version that always did (b).

This really isn't adequate. Because it assumes relatively small
batch files. On my system, I get batches of 250K (compressed) and
up, and having them accumulate can be a real bear. Especially when
disk space gets tight. (Why such big batches? It makes sense if
you have a 9600+ BPS modem with built in error correction. Like my
Telebit.)

Could you either make this based on the file sizes, or provide an
option or configuration parameter to make it delete after each
batch? Perhaps you could make that 50 a configuration parameter?

This behavior has messed me over more than a couple of times.

Now that I think about it, I have a feeling that there are
several assumptions in C news that are invalid when batch sizes
are typically very large. Perhaps you might want to think in terms
of a configuration options for these kinds of systems.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

henry@utzoo.uucp (Henry Spencer) (10/26/89)

In article <1989Oct25.023352.8840@twwells.com> bill@twwells.com (T. William Wells) writes:
>Could you either make this based on the file sizes, or provide an
>option or configuration parameter to make it delete after each
>batch? Perhaps you could make that 50 a configuration parameter?

Exactly what should the algorithm be for deciding how to make the choice?
The current code does it based on whether space is tight or not; this
seemed the only reasonable rule to me.  If there is lots of space, better
to optimize for time.  If space is tight, better pay more attention to
that.  How else?  I don't understand the circumstances in which the
current code is as ill-behaved as you imply; can you elaborate?

(If it's because the space margins in "spacefor" are set to zero or some
very small number, your warranty is void. :-))

>Now that I think about it, I have a feeling that there are
>several assumptions in C news that are invalid when batch sizes
>are typically very large. Perhaps you might want to think in terms
>of a configuration options for these kinds of systems.

Compressed batches of 200-300K are common here; we haven't seen any
problem with them.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/27/89)

>Now that I think about it, I have a feeling that there are
>several assumptions in C news that are invalid when batch sizes
>are typically very large. Perhaps you might want to think in terms

The horrible per batch start-up overhead of C news make large batches the
only way to go.



-- 
Branch Technology  <zeeff@b-tech.ann-arbor.mi.us>

chip@ateng.com (Chip Salzenberg) (10/27/89)

According to zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff):
>The horrible per batch start-up overhead of C news make large batches the
>only way to go.

Oh, piffle.

C News overhead isn't "horrible" no matter what you look at.  Except for
inews, maybe.  :-)  But inews isn't involved in unbatching.
-- 
You may redistribute this article only to those who may freely do likewise.
Chip Salzenberg at A T Engineering;  <chip@ateng.com> or <uunet!ateng!chip>
"'Why do we post to Usenet?'  Naturally, the answer is, 'To get a response.'"
                        -- Brad "Flame Me" Templeton

coolidge@brutus.cs.uiuc.edu (John Coolidge) (10/28/89)

chip@ateng.com (Chip Salzenberg) writes:
>According to zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff):
>>The horrible per batch start-up overhead of C news make large batches the
>>only way to go.

>C News overhead isn't "horrible" no matter what you look at.  Except for
>inews, maybe.  :-)  But inews isn't involved in unbatching.

IMHO both sides are right to some extent. Jon is right in claiming that
the per-batch start-up overhead is really pretty high. There's quite a
lot done in the standard code --- shell scripts are not cheap to start
with (lots of forks); locking and unlocking costs; checking free space
costs; running down the directory looking for batches isn't free.

On the other hand, Chip is right in claiming that C is never "horrible"
if installed correctly. There are ways of doing things "better", but
you give something up for what you gain (mainly error recovery, see other
postings for specifics).

A good partial compromise is to rewrite newsrun in C, thereby removing
some of the really wasteful costs involved in running shell scripts
(albeit at the cost of portability --- trade-offs all over the place).

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/30/89)

>>The horrible per batch start-up overhead of C news make large batches the
>>only way to go.
>
>C News overhead isn't "horrible" no matter what you look at.  Except for
>inews, maybe.  :-)  But inews isn't involved in unbatching.

I posted test results awhile back showing that C news was slower than
B news*.  Given the design goals of C news, this is horrible.  Please post
your test results that show otherwise.


* Sys V, 50k batches, measured for all processes (there were many) 
from rnews to article being in the spool directory.  There was also 
much more disk i/o for C news.  


-- 
Branch Technology  <zeeff@b-tech.ann-arbor.mi.us>

henry@utzoo.uucp (Henry Spencer) (10/30/89)

In article <9690@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
>>C News overhead isn't "horrible" no matter what you look at...
>
>I posted test results awhile back showing that C news was slower than
>B news*.  Given the design goals of C news, this is horrible.  Please post
>your test results that show otherwise.

Well, apart from the fact that some of the problems you noticed have
since been fixed, the number of people who see a major performance
*improvement* -- including us -- is sufficient for a strong suspicion
that you're measuring the wrong thing somehow.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bill@twwells.com (T. William Wells) (11/13/89)

In article <1989Nov1.184509.27953@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
: In article <1989Oct30.121734.1658@twwells.com> bill@twwells.com (T. William Wells) writes:
: >I typically have about 10M free on my disk just before expire
: >runs; unfortunately, I also tend to run jobs that do a lot of
: >data manipulation. One core dump, or one extraneous big data
: >file, can make that 10M disappear real quick!
:
: Unfortunately, this is really hard to deal with in any graceful way.  Even
: with a suitable change to the strategy in newsrun, there is still a problem
: with C News's basic approach, which is to anticipate space problems rather
: than trying to cope with them when they arrive.  There is an inherent
: assumption that the free space at a given time is a reasonable prediction
: of what the free space will be in the immediate future.
:
: (C News does make some effort to cope well with running out of space.
: Unfortunately, it is impossible to really do this right.  There are too
: many situations where space exhaustion in mid-stride means you trip and
: fall.  All the more so when stdio, dbm, the shell, etc. are involved as
: middlemen and you don't have direct access to the problem.  Hence the
: emphasis on prevention rather than cure.)

A generally good idea. The thoughts I have are based on a
different kind of prevention: minimize the amount of extraneous
space used by C news, especially the transient space, so that the
probability of running out of space is diminished.

So far I've been lucky. The closest I've come is a runaway
process. I caught it with just 20K to spare....

: >What I'd like to see, as a configuration option is:
: >
: >"Your system may be one of those where the amount of free disk
: >varies outrageously and unpredictably. If so, or if you have
: >other reasons, would you like C news to minimize disk usage,
: >almost certainly at the cost of increased processing time? [n]"
:
: I conjecture that this is a relatively rare situation.  If there are lots
: of people with such problems, I'd like to hear about them.  (Note, we are
: talking about major variations over very short time periods:  minutes not
: hours.)  I fear it would not be trivial to fit this in; I'm reluctant
: to do it unless there is widespread need.

Well, did you get much of a response?

BTW, the things I was thinking of doing, in response to a yes to
the above, were stuff like deleting each input file after it was
successfully processed and compressing the backup log and history
files. Other things, like controlling batching and the length of
time news is stored are already configurable. This would probably
save almost 2M on my system during peak usage, a significant
savings indeed (recall: at expire time I have 10M free. Actually,
no longer true: now 9M.)

Of course, I can patch C news to do all this myself, but if
others have the same problem, having it as a configuration option
would be nice.

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

henry@utzoo.uucp (Henry Spencer) (11/14/89)

In article <1989Nov13.101830.19896@twwells.com> bill@twwells.com (T. William Wells) writes:
>: I conjecture that this is a relatively rare situation.  If there are lots
>: of people with such problems, I'd like to hear about them.  (Note, we are
>: talking about major variations over very short time periods:  minutes not
>: hours.) ...
>
>Well, did you get much of a response?

Total silence, I'm afraid.  Looks like you're on your own on this one, Bill,
unless I have a sudden inspiration showing some easy way to deal with it.

>BTW, the things I was thinking of doing, in response to a yes to
>the above, were stuff like deleting each input file after it was
>successfully processed and compressing the backup log and history
>files...

On my current low-priority to-do list (the high-priority list is "dbz")
is to look at changes resembling the last two.  Utzoo, these days, in
fact deletes history.o and compresses log.o when relevant processing
is complete.

You might also want to look at the -s option of expire (and doexpire).
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu