[news.sysadmin] IBM RS/6000 unsuitable for news

bglenden@colobus.cv.nrao.edu (Brian Glendenning) (05/07/91)

This might save someone some time:

In my opinion, IBM/RS6000 machines running AIX 3, are unsuitable for
running usenet news because the filesystems only have 4k blocks, and
thus waste a lot of space for usenet news. (Also, the number of inodes
is fixed, which would be painful if we could drop the block size...).

A quickie shell script tells me that:
(total size in 4k blocks)/(total # of bytes in files) = 2.0
On a Sun with 1k blocks the number is 1.2.

For us, this is unacceptable, so I guess I'll have to scare up some
disk space on a Sun somewhere. Too bad, because otherwise things
worked quite well.

This ignores the role of fragments, which I would have thought would
go a long way to saving the day for AIX. Since I observe a lot of
wasted space, I gather it isn't. Strange.

Brian
--
       Brian Glendenning - National Radio Astronomy Observatory
bglenden@nrao.edu          bglenden@nrao.bitnet          (804) 296-0286

henry@zoo.toronto.edu (Henry Spencer) (05/07/91)

In article <BGLENDEN.91May6130729@colobus.cv.nrao.edu> bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
>A quickie shell script tells me that:
>(total size in 4k blocks)/(total # of bytes in files) = 2.0
>On a Sun with 1k blocks the number is 1.2.
>This ignores the role of fragments, which I would have thought would
>go a long way to saving the day for AIX...

Do remember that the Sun is quoting space used in KB.  If AIX is being asked
to quote space usage in 4K blocks, quite plausibly it is rounding file sizes
up to multiples of 4K.  Make sure this is really a space-consumption problem
rather than just a reporting problem.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

josevela@mtecv2.mty.itesm.mx (Jose Angel Vela Avila) (05/07/91)

bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:

>In my opinion, IBM/RS6000 machines running AIX 3, are unsuitable for
>running usenet news because the filesystems only have 4k blocks, and
>thus waste a lot of space for usenet news. (Also, the number of inodes
>is fixed, which would be painful if we could drop the block size...).



  mmmm... How about News from remote server ???

 We have a 520 running News from our News server (Vax 6310) and everithing

  works ok !!


Jose A. Vela

henry@zoo.toronto.edu (Henry Spencer) (05/07/91)

In article <1991May6.181144.23900@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>... Make sure this is really a space-consumption problem
>rather than just a reporting problem.

Brian reports (in private mail) that alas, it seems to be a real problem,
not just a reporting defect.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry

fitz@wang.com (Tom Fitzgerald) (05/07/91)

bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
> In my opinion, IBM/RS6000 machines running AIX 3, are unsuitable for
> running usenet news because the filesystems only have 4k blocks, and
> thus waste a lot of space for usenet news. (Also, the number of inodes
> is fixed, which would be painful if we could drop the block size...).

It depends on how much you want to spend on disk....  The machine we're
using here also has 4 Kb block sizes, and yah, it wastes about 40% of
the disk.  (It fills up at about 180 Mb used out of 300 Mb allocated).
It's worth it to us because it's available - 120 Mb of wasted space is
cheaper than a whole new machine.

It helps a lot to throttle back the expiration time of talk.*, alt.flame,
etc., since the median article size for those groups is lower than the
median article size for comp.* and news.*.

---
Tom Fitzgerald   Wang Labs        fitz@wang.com
1-508-967-5278   Lowell MA, USA   ...!uunet!wang!fitz

jeffs@soul.esd.sgi.com (Jeff Smith) (05/07/91)

In <1991May6.181144.23900@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

>In article <BGLENDEN.91May6130729@colobus.cv.nrao.edu> bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
>>A quickie shell script tells me that:
>>(total size in 4k blocks)/(total # of bytes in files) = 2.0
>>On a Sun with 1k blocks the number is 1.2.
>>This ignores the role of fragments, which I would have thought would
>>go a long way to saving the day for AIX...

>Do remember that the Sun is quoting space used in KB.  If AIX is being asked
>to quote space usage in 4K blocks, quite plausibly it is rounding file sizes
>up to multiples of 4K.  Make sure this is really a space-consumption problem
>rather than just a reporting problem.

Well I'm not sure if bglenden's numbers are correct, but his conclusions are
correct--the JFS (AIX filesystem) uses a 4KB blocksize, and does not do
fragmentation (ala BSD FFS).  This makes undesirable for a netnews server.


jeffs@sgi.com						(former RS/6000 user.)
"Let's get SWIZin!"

wisner@ims.alaska.edu (Bill Wisner) (05/07/91)

But they make wonderful doorstops.

Bill Wisner <wisner@ims.alaska.edu> Gryphon Gang Fairbanks AK 99775

ralphs@seattleu.edu (Ralph Sims) (05/07/91)

fitz@wang.com (Tom Fitzgerald) writes:

> It helps a lot to throttle back the expiration time of talk.*, alt.flame,
> etc., since the median article size for those groups is lower than the
> median article size for comp.* and news.*.

I'm not sure how *nix figures this kind of thing, but a check of the
size of the average news post (based on 10,000 messages online) is
~3K.  This is on my MS-DOS system with an almost-full feed.

--
          Of all the things I've lost, I miss my mind the most.
                       halcyon!ralphs@seattleu.edu

jackv@turnkey.tcc.com (Jack F. Vogel) (05/07/91)

In article <BGLENDEN.91May6130729@colobus.cv.nrao.edu> bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
>
>In my opinion, IBM/RS6000 machines running AIX 3, are unsuitable for
>running usenet news because the filesystems only have 4k blocks, and
>thus waste a lot of space for usenet news. 

Yes, unfortunately this is also the case with AIX on the PS/2 and 370.
Here at Locus we keep news on a couple of 370 guests and it is a constant
struggle to keep from running out of space but, since we have lots of
370 DASD around we live with it.

One possibility struck me, since AIX Version 3 is a 'vnoded' filesystem
(which, alas, AIX/370 isn't) I wouldn't think it would be that hard to
provide a SysV 1K or FFS filesystem as an option. Has anyone out there in
6000 support thought about or considered this??

>(Also, the number of inodes
>is fixed, which would be painful if we could drop the block size...).
 
I have seen this said a couple of different times, is this literally
true? You mean you can't choose the number of inodes when the filesystem
is created?? This seems bizarre if true, but then I know nothing about
JFS.

Disclaimer: I don't speak for my employer.

-- 
Jack F. Vogel			jackv@locus.com
AIX370 Technical Support	       - or -
Locus Computing Corp.		jackv@turnkey.TCC.COM

ralphs@seattleu.edu (Ralph Sims) (05/08/91)

henry@zoo.toronto.edu (Henry Spencer) writes:

> Do remember that the Sun is quoting space used in KB.  If AIX is being asked
> to quote space usage in 4K blocks, quite plausibly it is rounding file sizes
> up to multiples of 4K.  Make sure this is really a space-consumption problem
> rather than just a reporting problem.

In an earlier post I mentioned that the average MS-DOS filesize for news
articles appeared to be ~3K.  Using a 4K blocksize would be fairly efficient
under that condition.  Would the same reasoning hold true with *nix and if
not, what differences are there?  I would reason that articles <2K would
get allocated a 4K block and those of 5K would get an 8K one.  Perhaps
with that in mind 1 or 2K blocks would be better.  My system uses 2K
clusters.


--
          Of all the things I've lost, I miss my mind the most.
                       halcyon!ralphs@seattleu.edu

dbeedle@rs6000.cmp.ilstu.edu (Dave Beedle) (05/08/91)

>But they make wonderful doorstops.

       Nahhh, they're too big...use a mac for that!    ;-)

-- 
  Dave Beedle                                    Office of Academic Computing
                                                    Illinois State University
  Internet:  dbeedle@rs6000.cmp.ilstu.edu                    136A Julian Hall  
    Bitnet:  dbeedle@ilstu.bitnet                          Normal, Il   61761

drake@drake.almaden.ibm.com (05/08/91)

In article <1991May07.160042.28634@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes:
>In article <BGLENDEN.91May6130729@colobus.cv.nrao.edu> bglenden@colobus.cv.nrao.edu (Brian Glendenning) writes:
>>(Also, the number of inodes
>>is fixed, which would be painful if we could drop the block size...).
> 
>I have seen this said a couple of different times, is this literally
>true? You mean you can't choose the number of inodes when the filesystem
>is created?? This seems bizarre if true, but then I know nothing about
>JFS.

This isn't an issue.  When a filesystem is created (or later expanded),
the number of inodes is set to the number of 4K blocks in the filesystem.
In other words, except in cases where there are zero length files in the
filesystem, it's impossible to run out of inodes.

It's true that the number is "fixed", but only in that you can't change it.
But the default is always "big enough".


Sam Drake / IBM Almaden Research Center 
Internet:  drake@ibm.com            BITNET:  DRAKE at ALMADEN
Usenet:    ...!uunet!ibmarc!drake   Phone:   (408) 927-1861

dcm@codesmith.austin.ibm.com (Craig Miller) (05/08/91)

In article <1991May07.160042.28634@turnkey.tcc.com> jackv@turnkey.TCC.COM (Jack F. Vogel) writes:
>One possibility struck me, since AIX Version 3 is a 'vnoded' filesystem
>(which, alas, AIX/370 isn't) I wouldn't think it would be that hard to
>provide a SysV 1K or FFS filesystem as an option. Has anyone out there in
>6000 support thought about or considered this??


	Yep, a number of people have tossed this idea around.  I think it's a
	great idea, however (in all seriousness) support probably does not
	have the time to attack this.
	
	I think development could handle it though :-).

>I have seen this said a couple of different times, is this literally
>true? You mean you can't choose the number of inodes when the filesystem
>is created?? This seems bizarre if true, but then I know nothing about
>JFS.
	
	The number of inodes is approx 3% of the total size of the filesystem.
	To allocate more inodes, you'll have to extend the filesystem (which
	will give you more data blocks that are probably useless to you
	anyway if you're just looking for more inodes).

>Disclaimer: I don't speak for my employer.

	me neither.

>Jack F. Vogel			jackv@locus.com
>AIX370 Technical Support	       - or -
>Locus Computing Corp.		jackv@turnkey.TCC.COM


	Craig
-- 
Craig Miller			Internet:	dcm@aixwiz.austin.ibm.com
IBM Austin			Vnet:		tkg007 at ausvmq
AIXV3 Change Team (level3)	IBM internal:	dcm@littleguy.austin.ibm.com
"I do not represent IBM or any other respectable company."

ghe@physics.orst.edu (Guangliang He) (05/08/91)

In article <1991May07.191244.3849@rs6000.cmp.ilstu.edu>, dbeedle@rs6000.cmp.ilstu.edu (Dave Beedle) writes:
|> >But they make wonderful doorstops.
|> 
|>        Nahhh, they're too big...use a mac for that!    ;-)

No. Mac is too light. RS/6000 has the right weight.

|> 
|> -- 
|>   Dave Beedle                                    Office of Academic Computing
|>                                                     Illinois State University
|>   Internet:  dbeedle@rs6000.cmp.ilstu.edu                    136A Julian Hall  
|>     Bitnet:  dbeedle@ilstu.bitnet                          Normal, Il   61761
 

---
  Guangliang He                |   If anything can go wrong, it will.
  ghe@physics.orst.edu         |            -- Murphy's Law

gwh@tornado.Berkeley.EDU (George William Herbert) (05/08/91)

In article <1991May07.191244.3849@rs6000.cmp.ilstu.edu> dbeedle@rs6000.cmp.ilstu.edu (Dave Beedle) writes:
>>But they make wonderful doorstops.
>       Nahhh, they're too big...use a mac for that!    ;-)

I dunnow.  After having one refuse to compile some of it's built in header
files, I reccomended to our campus people we use ours as a doorstop.  They
disagred, since moving it from the table would take too much effort as opposed
to letting it 'run' until we give it back.

-george
8-)

shaggy@kleikamp.austin.ibm.com (David J. Kleikamp) (05/08/91)

In article <731@rufus.UUCP> drake@drake.almaden.ibm.com writes:
>This isn't an issue.  When a filesystem is created (or later expanded),
>the number of inodes is set to the number of 4K blocks in the filesystem.
>In other words, except in cases where there are zero length files in the
>filesystem, it's impossible to run out of inodes.

Symbolic links are often contained within the inode.  If you have a
filesystem containing a large number of symbolic links you may very well
run out of inodes.

>It's true that the number is "fixed", but only in that you can't change it.
>But the default is always "big enough".

Usually big enough.

>Sam Drake / IBM Almaden Research Center 
-- 
---------------------------------------------------------------------------
David J. "Shaggy" Kleikamp	dave@kleikamp.austin.ibm.com
DISCLAIMER: The content of this posting is independent of
            official IBM position.

de5@ornl.gov (Dave Sill) (05/08/91)

In article <i03k25w164w@halcyon.uucp>, halcyon!ralphs@seattleu.edu (Ralph Sims) writes:
>
>I'm not sure how *nix figures this kind of thing, but a check of the
>size of the average news post (based on 10,000 messages online) is
>~3K.  This is on my MS-DOS system with an almost-full feed.

The average size is less important than the size distribution.  For
example, if 90% of the articles are under 2k and the other 10% are
very large source/binary/GIF's that bring the overall average up to
3k, then you're going to have a *lot* of wasted space with 4k blocks. 

-- 
Dave Sill (de5@ornl.gov)	  It will be a great day when our schools have
Martin Marietta Energy Systems    all the money they need and the Air Force
Workstation Support               has to hold a bake sale to buy a new bomber.

nraoaoc@nmt.edu (NRAO Array Operations Center) (05/09/91)

In article <1F7k22w164w@halcyon.uucp> halcyon!ralphs@seattleu.edu (Ralph Sims) writes:
>In an earlier post I mentioned that the average MS-DOS filesize for news
>articles appeared to be ~3K.  Using a 4K blocksize would be fairly efficient
>under that condition.  

Not if you have hundreds of tiny articles and a few giant ones which skew the
average.

>I would reason that articles <2K would
>get allocated a 4K block and those of 5K would get an 8K one.  Perhaps
>with that in mind 1 or 2K blocks would be better.  

Precisely. If your file sizes were random, you would always waste on average
half the filesystem block size per file. Even in this situation, 2K/file is
a lot to waste (as opposed to 512 bytes, which is half a Berkeley filesystem
fragment). In reality, news tends to be made up as stated above, which means 
that *on average* you are wasting more than 2K/file.
-- 
Ruth Milner
Systems Manager                     NRAO/VLA                    Socorro NM
Computing Division Head      rmilner@zia.aoc.nrao.edu

karish@pangea.Stanford.EDU (Chuck Karish) (05/09/91)

In article <1991May07.160042.28634@turnkey.tcc.com> jackv@turnkey.TCC.COM
(Jack F. Vogel) writes:
>One possibility struck me, since AIX Version 3 is a 'vnoded' filesystem
>(which, alas, AIX/370 isn't) I wouldn't think it would be that hard to
>provide a SysV 1K or FFS filesystem as an option. Has anyone out there in
>6000 support thought about or considered this??

If a widely-supported filesystem architecture were supported
IBM would also be adding value of another sort to their
machines.  In particular, having a FFS option available might
make it possible to use Sun or DEC SCSI disks when their hosts
are out off commission.

It would probably be easier for IBM to support DEC and/or Sun
filesystems than it would be for the others to support IBM's,
so there'd be a net advantage in value added to IBM.
--

	Chuck Karish		karish@mindcraft.com
	(415) 323-9000		karish@forel.stanford.edu

cudep@warwick.ac.uk (Ian Dickinson) (05/09/91)

In article <731@rufus.UUCP> drake@drake.almaden.ibm.com writes:
>This isn't an issue.  When a filesystem is created (or later expanded),
>the number of inodes is set to the number of 4K blocks in the filesystem.
>In other words, except in cases where there are zero length files in the
>filesystem, it's impossible to run out of inodes.
>It's true that the number is "fixed", but only in that you can't change it.
>But the default is always "big enough".

Saying something "simply isn't the case" doesn't change the world around you.

/usr/spool/news (or wherever) is the classic case of not having enough
inodes and the system often refusing to allow you to tune it.

Certainly, on every system I've come across, the default is *NOT*
"big enough" for news.  And it's only recently that more manufacturers
have allowed you to muck around with the number of inodes per partition.

( 4k block sizes for news? *UGH*! )
-- 
\/ato                                                               /'\  /`\
Ian Dickinson                  TED KALDIS FOR PRESIDENT!           /^^^\/^^^\
vato@warwick.ac.uk                                                /TWIN/TEATS\
@c=GB@o=University of Warwick@ou=Computing Services@cn=Ian Dickinson  /       \

kent@manzi.unx.sas.com (Paul Kent) (05/09/91)

In article <1991May8.191430.6864@nmt.edu>, nraoaoc@nmt.edu (NRAO Array Operations Center) writes:
>In article <1F7k22w164w@halcyon.uucp> halcyon!ralphs@seattleu.edu (Ralph Sims) writes:
>>In an earlier post I mentioned that the average MS-DOS filesize for news
>>articles appeared to be ~3K.  Using a 4K blocksize would be fairly efficient
>>under that condition.  
>
>Not if you have hundreds of tiny articles and a few giant ones which skew the
>average.
>

the discussion of the length of news articles allows me to kill
two birds with one stone... i can contribute a sample
distribution and make a shameless plug for SAS at the same time :-)

~80% of our news files are <2250 bytes.. food for thought, eh?

the 30000 group seems to have a lump, as it really includes all
files in the tail of the distribution.

SAS's interface to unix pipes makes it easy to summarise the
unix statistics in ways that were previously "challenging" if
you limit yourself to (say) 10 punch cards



this SAS job...
-----------------------------

  /*-- figure the stats on the lengths of the news files --*/
  /*-- mozart is the computer where news lives..         --*/
  /*-- i chose to awk the length field on the remote host--*/
  /*-- to minimise net traffic.                          --*/
  filename lenpip pipe "remsh mozart ""find /usr/spool/news -type f | xargs ls -l | awk '{print \$5}'""";

  data len;
   infile lenpip;
   input len;
   run;

  options ps=60 ls=75;
  title2 'File Size Distribution for /usr/spool/news';
  proc chart;
   hbar len /midpoints = 0 to 30000 by 500;
   run;
-----------------------------
 
gets you this chart...

                             The SAS System 
                File Size Distribution for /usr/spool/news                1

    LEN                                            Cum.              Cum.
  Midpoint                                  Freq   Freq  Percent  Percent
          |
      0   |                                   14     14     0.03     0.03
    500   |***********                      5671   5685    11.76    11.79
   1000   |******************************  14814  20499    30.72    42.51
   1500   |***********************         11347  31846    23.53    66.03
   2000   |************                     6159  38005    12.77    78.80
   2500   |*******                          3392  41397     7.03    85.84
   3000   |****                             1805  43202     3.74    89.58
   3500   |**                               1178  44380     2.44    92.02
   4000   |**                                841  45221     1.74    93.77
   4500   |*                                 618  45839     1.28    95.05
   5000   |*                                 381  46220     0.79    95.84
   5500   |*                                 310  46530     0.64    96.48
   6000   |                                  216  46746     0.45    96.93
   6500   |                                  174  46920     0.36    97.29
   7000   |                                  157  47077     0.33    97.62
   7500   |                                  114  47191     0.24    97.85
   8000   |                                   65  47256     0.13    97.99
   8500   |                                   92  47348     0.19    98.18
   9000   |                                   64  47412     0.13    98.31
   9500   |                                   47  47459     0.10    98.41
  10000   |                                   39  47498     0.08    98.49
  10500   |                                   35  47533     0.07    98.56
  11000   |                                   43  47576     0.09    98.65
  11500   |                                   38  47614     0.08    98.73
  12000   |                                   18  47632     0.04    98.77
  12500   |                                   36  47668     0.07    98.84
  13000   |                                   29  47697     0.06    98.90
  13500   |                                   18  47715     0.04    98.94
  14000   |                                   30  47745     0.06    99.00
  14500   |                                   26  47771     0.05    99.05
  15000   |                                   19  47790     0.04    99.09
  15500   |                                   21  47811     0.04    99.14
  16000   |                                   17  47828     0.04    99.17
  16500   |                                   12  47840     0.02    99.20
  17000   |                                   10  47850     0.02    99.22
  17500   |                                   16  47866     0.03    99.25
  18000   |                                   10  47876     0.02    99.27
  18500   |                                    8  47884     0.02    99.29
  19000   |                                   14  47898     0.03    99.32
  19500   |                                    5  47903     0.01    99.33
  20000   |                                   10  47913     0.02    99.35
  20500   |                                    5  47918     0.01    99.36
  21000   |                                    7  47925     0.01    99.37
  21500   |                                    4  47929     0.01    99.38
  22000   |                                    7  47936     0.01    99.40
  22500   |                                    4  47940     0.01    99.40
  23000   |                                    8  47948     0.02    99.42
  23500   |                                    5  47953     0.01    99.43
  24000   |                                    4  47957     0.01    99.44
  24500   |                                    2  47959     0.00    99.44
  25000   |                                    7  47966     0.01    99.46
  25500   |                                    5  47971     0.01    99.47
  26000   |                                    1  47972     0.00    99.47
  26500   |                                    3  47975     0.01    99.48
  27000   |                                    3  47978     0.01    99.48
  27500   |                                    2  47980     0.00    99.49
  28000   |                                    2  47982     0.00    99.49
  28500   |                                    2  47984     0.00    99.50
  29000   |                                    1  47985     0.00    99.50
  29500   |                                    2  47987     0.00    99.50
  30000   |                                  240  48227     0.50   100.00
          --------+-------+-------+------
                 4000    8000   12000

                     Frequency
--
Paul Kent (SQL r&d)                   " nothing ventured, nothing disclaimed "
kent@unx.sas.com         SAS Institute Inc, SAS Campus Dr, Cary NC 27513-2414.

lws@comm.wang.com (Lyle Seaman) (05/10/91)

jackv@turnkey.tcc.com (Jack F. Vogel) writes:
>One possibility struck me, since AIX Version 3 is a 'vnoded' filesystem
>(which, alas, AIX/370 isn't) I wouldn't think it would be that hard to
>provide a SysV 1K or FFS filesystem as an option. Has anyone out there in
>6000 support thought about or considered this??

Agh! don't do that!!   First off, it would be a pain to put
a SysV filesystem under vnodes, and useless.  Do the BSD FFS.

BTW, is AIX's VM system anything like SunOS's?  Ie, no distinction
between process pages and IO pages, just one big cache...  ?

-- 
Lyle 	508 967 2322  		
lws@wang.com 	
Wang Labs, Lowell, MA, USA 	

kent@manzi.unx.sas.com (Paul Kent) (05/11/91)

hello,

apologies if this allready made it out to your site.
i notied the first post hast distribution=sas which
(i hope) would have restricted the distribution of the article.


In article <1991May8.191430.6864@nmt.edu>, nraoaoc@nmt.edu (NRAO Array Operations Center) writes:
>In article <1F7k22w164w@halcyon.uucp> halcyon!ralphs@seattleu.edu (Ralph Sims) writes:
>>In an earlier post I mentioned that the average MS-DOS filesize for news
>>articles appeared to be ~3K.  Using a 4K blocksize would be fairly efficient
>>under that condition.  
>
>Not if you have hundreds of tiny articles and a few giant ones which skew the
>average.
>

the discussion of the length of news articles allows me to kill
two birds with one stone... i can contribute a sample
distribution and make a shameless plug for SAS at the same time :-)

~80% of our news files are <2250 bytes.. food for thought, eh?

the 30000 group seems to have a lump, as it really includes all
files in the tail of the distribution.

SAS's interface to unix pipes makes it easy to summarise the
unix statistics in ways that were previously "challenging" if
you limit yourself to (say) 10 punch cards



this SAS job...
-----------------------------

  /*-- figure the stats on the lengths of the news files --*/
  /*-- mozart is the computer where news lives..         --*/
  /*-- i chose to awk the length field on the remote host--*/
  /*-- to minimise net traffic.                          --*/
  filename lenpip pipe "remsh mozart ""find /usr/spool/news -type f | xargs ls -l | awk '{print \$5}'""";

  data len;
   infile lenpip;
   input len;
   run;

  options ps=60 ls=75;
  title2 'File Size Distribution for /usr/spool/news';
  proc chart;
   hbar len /midpoints = 0 to 30000 by 500;
   run;
-----------------------------
 
gets you this chart...

                             The SAS System 
                File Size Distribution for /usr/spool/news                1

    LEN                                            Cum.              Cum.
  Midpoint                                  Freq   Freq  Percent  Percent
          |
      0   |                                   14     14     0.03     0.03
    500   |***********                      5671   5685    11.76    11.79
   1000   |******************************  14814  20499    30.72    42.51
   1500   |***********************         11347  31846    23.53    66.03
   2000   |************                     6159  38005    12.77    78.80
   2500   |*******                          3392  41397     7.03    85.84
   3000   |****                             1805  43202     3.74    89.58
   3500   |**                               1178  44380     2.44    92.02
   4000   |**                                841  45221     1.74    93.77
   4500   |*                                 618  45839     1.28    95.05
   5000   |*                                 381  46220     0.79    95.84
   5500   |*                                 310  46530     0.64    96.48
   6000   |                                  216  46746     0.45    96.93
   6500   |                                  174  46920     0.36    97.29
   7000   |                                  157  47077     0.33    97.62
   7500   |                                  114  47191     0.24    97.85
   8000   |                                   65  47256     0.13    97.99
   8500   |                                   92  47348     0.19    98.18
   9000   |                                   64  47412     0.13    98.31
   9500   |                                   47  47459     0.10    98.41
  10000   |                                   39  47498     0.08    98.49
  10500   |                                   35  47533     0.07    98.56
  11000   |                                   43  47576     0.09    98.65
  11500   |                                   38  47614     0.08    98.73
  12000   |                                   18  47632     0.04    98.77
  12500   |                                   36  47668     0.07    98.84
  13000   |                                   29  47697     0.06    98.90
  13500   |                                   18  47715     0.04    98.94
  14000   |                                   30  47745     0.06    99.00
  14500   |                                   26  47771     0.05    99.05
  15000   |                                   19  47790     0.04    99.09
  15500   |                                   21  47811     0.04    99.14
  16000   |                                   17  47828     0.04    99.17
  16500   |                                   12  47840     0.02    99.20
  17000   |                                   10  47850     0.02    99.22
  17500   |                                   16  47866     0.03    99.25
  18000   |                                   10  47876     0.02    99.27
  18500   |                                    8  47884     0.02    99.29
  19000   |                                   14  47898     0.03    99.32
  19500   |                                    5  47903     0.01    99.33
  20000   |                                   10  47913     0.02    99.35
  20500   |                                    5  47918     0.01    99.36
  21000   |                                    7  47925     0.01    99.37
  21500   |                                    4  47929     0.01    99.38
  22000   |                                    7  47936     0.01    99.40
  22500   |                                    4  47940     0.01    99.40
  23000   |                                    8  47948     0.02    99.42
  23500   |                                    5  47953     0.01    99.43
  24000   |                                    4  47957     0.01    99.44
  24500   |                                    2  47959     0.00    99.44
  25000   |                                    7  47966     0.01    99.46
  25500   |                                    5  47971     0.01    99.47
  26000   |                                    1  47972     0.00    99.47
  26500   |                                    3  47975     0.01    99.48
  27000   |                                    3  47978     0.01    99.48
  27500   |                                    2  47980     0.00    99.49
  28000   |                                    2  47982     0.00    99.49
  28500   |                                    2  47984     0.00    99.50
  29000   |                                    1  47985     0.00    99.50
  29500   |                                    2  47987     0.00    99.50
  30000   |                                  240  48227     0.50   100.00
          --------+-------+-------+------
                 4000    8000   12000

                     Frequency



cheers,
--
Paul Kent (SQL r&d)                   " nothing ventured, nothing disclaimed "
kent@unx.sas.com         SAS Institute Inc, SAS Campus Dr, Cary NC 27513-2414.

fitz@wang.com (Tom Fitzgerald) (05/14/91)

> In article <1F7k22w164w@halcyon.uucp> halcyon!ralphs@seattleu.edu (Ralph Sims) writes:
> >In an earlier post I mentioned that the average MS-DOS filesize for news
> >articles appeared to be ~3K.  Using a 4K blocksize would be fairly efficient
> >under that condition.  

nraoaoc@nmt.edu (NRAO Array Operations Center) writes:
> Not if you have hundreds of tiny articles and a few giant ones which skew the
> average.

Which is indeed the case.  Most articles are less than 1536 bytes.  From
a snapshot of the news here:

size	      # articles	cumulative
----------    ----------	----------
1-512:		  832		  832
513-1024:	 8551		 9383
1025-1536:	10069		19452
1537-2048:	 6139		25591
2049-2560:	 3301		28892
2561-3072:	 1699		30591
3073-3584:	 1052		31643
3585-4096:	  734		32377
4097-4608:	  468		32845
4609-5120:	  316		33161
5121-5632:	  192		33353
5633-infinite:	 1513		34866

mean:	2603 bytes
median: 1300-1400 bytes, or somewhere around there

A 4K block size wastes about 40% of the disk.  Take my word for it, that's
what we're running here.

It depends a LOT on the flavor of the newsfeed, too.  Articles in talk.*,
rec.* and soc.* have a smaller median size than articles in comp.* and
news.*.  Moderated groups have larger articles than non-moderated groups.

---
Tom Fitzgerald   Wang Labs        fitz@wang.com
1-508-967-5278   Lowell MA, USA   ...!uunet!wang!fitz

rg@gandp (Dick Gill) (05/14/91)

In article <1991May7.015756.9432@ims.alaska.edu> wisner@ims.alaska.edu (Bill Wisner) writes:
>But they make wonderful doorstops.
               ^^^^^^^^^^^^^^^^^^^^

I offer, instead, an NCR 32/200 to hold your door open; please
Fedex me the RS/6000 you are currently using for your door !-)

-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dick Gill     Gill & Piette, Inc.             (703)761-1163  ..uunet!gandp!rg

  

deraadt@cpsc.ucalgary.ca (deraadt) (05/14/91)

In article <353@gandp> rg@gandp (Dick Gill) writes:
> In article <1991May7.015756.9432@ims.alaska.edu> wisner@ims.alaska.edu (Bill Wisner) writes:
> >But they make wonderful doorstops.
>               ^^^^^^^^^^^^^^^^^^^^
>
> I offer, instead, an NCR 32/200 to hold your door open; please
> Fedex me the RS/6000 you are currently using for your door !-)
You're making a big mistake..... :-)
--

SunOS 4.0.3: /usr/include/vm/as.h, Line 44      | Theo de Raadt
SunOS 4.1.1: /usr/include/vm/as.h, Line 49      | deraadt@cpsc.ucalgary.ca
Is it a typo? Should the '_'  be an 's'?? :-)   | deraadt@cpsc.ucalgary.ca