mjb@acd4.UUCP ( Mike Bryan ) (10/23/89)
I've noticed that when C News is unbatching articles in the in.coming directory, it doesn't remove the batch files until it is completely done. It seems to me that newsrun should remove each file in turn immediately after unbatching the articles it contains. In the current implementation, you have to have room to store each new article twice (actually about 1.5 times, since the batched file is compressed.) I'm currently running patch 23-Jul-89. (Haven't made time to upgrade to latest patch yet; my company expects me to do real work!!!) Apparently newsrun will at least remove these files in the middle of unbatching if it sees the disk space creep too low. Since this lessens the problem of running out of space due to keeping the batches around, the whole question is probably moot. I'm curious, though; why not just remove each file when you are done with it? Normally, this wouldn't cause me any problems. However, /usr/spool/news is not its own segment on our system, it shares with a couple of other lightly-used directory trees. Earlier this week our news feed got screwed up overnight, and I was unbatching nearly a whole day's worth of news around 9AM. If we were doing any other work on that segment, there would have existed a chance of running the system out of space before CNews finished its current file. [We weren't, however.] Granted, it's a slim chance, but it would be lessened if CNews would remove batch files one at a time. -- Mike Bryan, Applied Computing Devices, 100 N Campus Dr, Terre Haute IN 47802 Phone: 812/232-6051 FAX: 812/231-5280 Home: 812/232-0815 UUCP: uunet!acd4!mjb INTERNET: mjb%acd4@uunet.uu.net "Agony is born of desire; that's what you get for wanting." --- Moev
henry@utzoo.uucp (Henry Spencer) (10/23/89)
In article <1989Oct22.192159.4827@acd4.UUCP> mjb@acd4.UUCP ( Mike Bryan ) writes: >I've noticed that when C News is unbatching articles in the in.coming >directory, it doesn't remove the batch files until it is completely >done. It seems to me that newsrun should remove each file in turn >immediately after unbatching the articles it contains... If you are running a current C News, it will (a) remove the batch files after every 50 files processed, or (b) remove the batch file after each file processed, depending on whether space is tight or not. This was a deliberate change from our earlier version that always did (b). >... I'm curious, though; why >not just remove each file when you are done with it? Because there is a noticeable performance loss in doing *anything* unnecessary on each pass through the inner loop. As it is, on systems with high-performance shells (with the "test" command built in), when space is plentiful the inner loop runs compress and relaynews, period. >... If we were doing any other work >on that segment, there would have existed a chance of running the >system out of space before CNews finished its current file. [We >weren't, however.] Granted, it's a slim chance, but it would be >lessened if CNews would remove batch files one at a time. This sort of thing is the reason why spacefor, as shipped, has safety margins built in. C News as shipped will fall back on strategy (b) when space margin falls below 5000 blocks. If you have programs that suddenly eat multiple megabytes on your news filesystems, then you need to either crank the margins up or modify the shell files to change the strategy. Setting the margins to zero is a bad idea no matter how well-behaved your system is. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bill@twwells.com (T. William Wells) (10/25/89)
In article <1989Oct23.023759.17067@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: : In article <1989Oct22.192159.4827@acd4.UUCP> mjb@acd4.UUCP ( Mike Bryan ) writes: : >I've noticed that when C News is unbatching articles in the in.coming : >directory, it doesn't remove the batch files until it is completely : >done. It seems to me that newsrun should remove each file in turn : >immediately after unbatching the articles it contains... : : If you are running a current C News, it will (a) remove the batch files : after every 50 files processed, or (b) remove the batch file after each : file processed, depending on whether space is tight or not. This was : a deliberate change from our earlier version that always did (b). This really isn't adequate. Because it assumes relatively small batch files. On my system, I get batches of 250K (compressed) and up, and having them accumulate can be a real bear. Especially when disk space gets tight. (Why such big batches? It makes sense if you have a 9600+ BPS modem with built in error correction. Like my Telebit.) Could you either make this based on the file sizes, or provide an option or configuration parameter to make it delete after each batch? Perhaps you could make that 50 a configuration parameter? This behavior has messed me over more than a couple of times. Now that I think about it, I have a feeling that there are several assumptions in C news that are invalid when batch sizes are typically very large. Perhaps you might want to think in terms of a configuration options for these kinds of systems. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
henry@utzoo.uucp (Henry Spencer) (10/26/89)
In article <1989Oct25.023352.8840@twwells.com> bill@twwells.com (T. William Wells) writes: >Could you either make this based on the file sizes, or provide an >option or configuration parameter to make it delete after each >batch? Perhaps you could make that 50 a configuration parameter? Exactly what should the algorithm be for deciding how to make the choice? The current code does it based on whether space is tight or not; this seemed the only reasonable rule to me. If there is lots of space, better to optimize for time. If space is tight, better pay more attention to that. How else? I don't understand the circumstances in which the current code is as ill-behaved as you imply; can you elaborate? (If it's because the space margins in "spacefor" are set to zero or some very small number, your warranty is void. :-)) >Now that I think about it, I have a feeling that there are >several assumptions in C news that are invalid when batch sizes >are typically very large. Perhaps you might want to think in terms >of a configuration options for these kinds of systems. Compressed batches of 200-300K are common here; we haven't seen any problem with them. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/27/89)
>Now that I think about it, I have a feeling that there are >several assumptions in C news that are invalid when batch sizes >are typically very large. Perhaps you might want to think in terms The horrible per batch start-up overhead of C news make large batches the only way to go. -- Branch Technology <zeeff@b-tech.ann-arbor.mi.us>
chip@ateng.com (Chip Salzenberg) (10/27/89)
According to zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff): >The horrible per batch start-up overhead of C news make large batches the >only way to go. Oh, piffle. C News overhead isn't "horrible" no matter what you look at. Except for inews, maybe. :-) But inews isn't involved in unbatching. -- You may redistribute this article only to those who may freely do likewise. Chip Salzenberg at A T Engineering; <chip@ateng.com> or <uunet!ateng!chip> "'Why do we post to Usenet?' Naturally, the answer is, 'To get a response.'" -- Brad "Flame Me" Templeton
coolidge@brutus.cs.uiuc.edu (John Coolidge) (10/28/89)
chip@ateng.com (Chip Salzenberg) writes: >According to zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff): >>The horrible per batch start-up overhead of C news make large batches the >>only way to go. >C News overhead isn't "horrible" no matter what you look at. Except for >inews, maybe. :-) But inews isn't involved in unbatching. IMHO both sides are right to some extent. Jon is right in claiming that the per-batch start-up overhead is really pretty high. There's quite a lot done in the standard code --- shell scripts are not cheap to start with (lots of forks); locking and unlocking costs; checking free space costs; running down the directory looking for batches isn't free. On the other hand, Chip is right in claiming that C is never "horrible" if installed correctly. There are ways of doing things "better", but you give something up for what you gain (mainly error recovery, see other postings for specifics). A good partial compromise is to rewrite newsrun in C, thereby removing some of the really wasteful costs involved in running shell scripts (albeit at the cost of portability --- trade-offs all over the place). --John -------------------------------------------------------------------------- John L. Coolidge Internet:coolidge@cs.uiuc.edu UUCP:uiucdcs!coolidge Of course I don't speak for the U of I (or anyone else except myself) Copyright 1989 John L. Coolidge. Copying allowed if (and only if) attributed. You may redistribute this article if and only if your recipients may as well.
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (10/30/89)
>>The horrible per batch start-up overhead of C news make large batches the >>only way to go. > >C News overhead isn't "horrible" no matter what you look at. Except for >inews, maybe. :-) But inews isn't involved in unbatching. I posted test results awhile back showing that C news was slower than B news*. Given the design goals of C news, this is horrible. Please post your test results that show otherwise. * Sys V, 50k batches, measured for all processes (there were many) from rnews to article being in the spool directory. There was also much more disk i/o for C news. -- Branch Technology <zeeff@b-tech.ann-arbor.mi.us>
henry@utzoo.uucp (Henry Spencer) (10/30/89)
In article <9690@b-tech.ann-arbor.mi.us> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes: >>C News overhead isn't "horrible" no matter what you look at... > >I posted test results awhile back showing that C news was slower than >B news*. Given the design goals of C news, this is horrible. Please post >your test results that show otherwise. Well, apart from the fact that some of the problems you noticed have since been fixed, the number of people who see a major performance *improvement* -- including us -- is sufficient for a strong suspicion that you're measuring the wrong thing somehow. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bill@twwells.com (T. William Wells) (11/13/89)
In article <1989Nov1.184509.27953@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: : In article <1989Oct30.121734.1658@twwells.com> bill@twwells.com (T. William Wells) writes: : >I typically have about 10M free on my disk just before expire : >runs; unfortunately, I also tend to run jobs that do a lot of : >data manipulation. One core dump, or one extraneous big data : >file, can make that 10M disappear real quick! : : Unfortunately, this is really hard to deal with in any graceful way. Even : with a suitable change to the strategy in newsrun, there is still a problem : with C News's basic approach, which is to anticipate space problems rather : than trying to cope with them when they arrive. There is an inherent : assumption that the free space at a given time is a reasonable prediction : of what the free space will be in the immediate future. : : (C News does make some effort to cope well with running out of space. : Unfortunately, it is impossible to really do this right. There are too : many situations where space exhaustion in mid-stride means you trip and : fall. All the more so when stdio, dbm, the shell, etc. are involved as : middlemen and you don't have direct access to the problem. Hence the : emphasis on prevention rather than cure.) A generally good idea. The thoughts I have are based on a different kind of prevention: minimize the amount of extraneous space used by C news, especially the transient space, so that the probability of running out of space is diminished. So far I've been lucky. The closest I've come is a runaway process. I caught it with just 20K to spare.... : >What I'd like to see, as a configuration option is: : > : >"Your system may be one of those where the amount of free disk : >varies outrageously and unpredictably. If so, or if you have : >other reasons, would you like C news to minimize disk usage, : >almost certainly at the cost of increased processing time? [n]" : : I conjecture that this is a relatively rare situation. If there are lots : of people with such problems, I'd like to hear about them. (Note, we are : talking about major variations over very short time periods: minutes not : hours.) I fear it would not be trivial to fit this in; I'm reluctant : to do it unless there is widespread need. Well, did you get much of a response? BTW, the things I was thinking of doing, in response to a yes to the above, were stuff like deleting each input file after it was successfully processed and compressing the backup log and history files. Other things, like controlling batching and the length of time news is stored are already configurable. This would probably save almost 2M on my system during peak usage, a significant savings indeed (recall: at expire time I have 10M free. Actually, no longer true: now 9M.) Of course, I can patch C news to do all this myself, but if others have the same problem, having it as a configuration option would be nice. --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com
henry@utzoo.uucp (Henry Spencer) (11/14/89)
In article <1989Nov13.101830.19896@twwells.com> bill@twwells.com (T. William Wells) writes: >: I conjecture that this is a relatively rare situation. If there are lots >: of people with such problems, I'd like to hear about them. (Note, we are >: talking about major variations over very short time periods: minutes not >: hours.) ... > >Well, did you get much of a response? Total silence, I'm afraid. Looks like you're on your own on this one, Bill, unless I have a sudden inspiration showing some easy way to deal with it. >BTW, the things I was thinking of doing, in response to a yes to >the above, were stuff like deleting each input file after it was >successfully processed and compressing the backup log and history >files... On my current low-priority to-do list (the high-priority list is "dbz") is to look at changes resembling the last two. Utzoo, these days, in fact deletes history.o and compresses log.o when relevant processing is complete. You might also want to look at the -s option of expire (and doexpire). -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu