[net.unix-wizards] UNIX 4.2 thrashing - the cause?

long@ittvax.UUCP (H. Morrow Long [Systems Center]) (01/21/85)

> Somewhile ago, our VAX 780 showed a disastrous performance.
> It couldn't even echo the keystrokes from the terminal not to say of
> any works. 'uptime' reported 14 users, load average about 4.5.
> I only had to halt the cpu (^P) followed by a reboot.
> As far as I can guess, it surely is a thrashing, a rarely occurence.
> If you have any experience, please send me relevent information.
> Like the cause you found, the remedy, etc.
> Here is the disk usage for reference ( any hint? ).
> 
> Filesystem    kbytes    used   avail capacity  Mounted on
> /dev/hp0a       7421    6519     159    98%    /
> /dev/hp1h     137616  120835    3019    98%    /usr
> /dev/hp1g      74691   65354    1868    97%    /usr/spool
> /dev/hp2h     137616  120775    3079    98%    /va <- user area
> /dev/hp0g      38639   34747      28   100%    /vb <- user area
> /dev/hp2a       7429      30    6656     0%    /tmp
> /dev/hp2g      74691   62096    5126    92%    /UDS <- utilities
> -- 

We have also experienced the problem described above.  Here is what I found
out:

	From "A Fast File System for UNIX*", Revised July 27, 1983,
	by Marshall Kirk McKusick, William N. Joy, Samuel J, Leffler,
	Robert S. Fabry. CSRG UCB.  Unix Pgmrs Manual Vol 2c.


	"In order for the layout policies to be effective, the disk cannot be
	kept completely full.  Each file system maintains a parameter that
	gives the minimum acceptable percentage of file system blocks that can
	be free.  If the the number of free blocks drops below this level only
	the system administrator can continue to allocate blocks.  The value of
	this parameter can be changed at any time, even when the file system is
	mounted and active.  The transfer rates to be given in section 4 were
	measured on file systems kept less than 90% full.  If the reserve of
	free blocks is set to zero, the file system throughput rate tends to be
	cut in half, because of the inability of the file system to localize
	the blocks in a file.  If the performance is impaired because of
	overfilling, it may be restored by removing enough files to obtain 10%
	free space.  Access speed for files created during periods of little
	free space can be restored by recreating them once enough space is
	available............"


I believe there is a lesson here.  4.2bsd sites should try to keep all
filesystems below 90% full (especially those where a great amount of creation
and deletion take place daily - /usr, /usr/spool) or suffer degradation.

This is our configuration before and after heeding this advice:

Filesystem    kbytes    used   avail capacity  Mounted on
/dev/hp0a       7421    6449     229    97%    /
/dev/hp2a       7415     308    6365     5%    /tmp
/dev/hp2g     208595  124922   62813    67%    /u
/dev/hp2h     140564  108384   18124    86%    /ittvax
/dev/hp3g     208595  181990    5745    97%    /usr
/dev/hp3h     140564  115800   10707    92%    /psc
/dev/hp0e      26223   24036       0   102%    /usr/src


Filesystem    kbytes    used   avail capacity  Mounted on
/dev/hp0a       7421    5660    1018    85%    /
/dev/hp2a       7415     425    6248     6%    /tmp
/dev/hp2g     208595  124922   62813    67%    /u
/dev/hp2h     140564  108397   18111    86%    /ittvax
/dev/hp3g     208595  137287   50448    73%    /usr
/dev/hp3h     140564  115808   10700    92%    /psc
/dev/hp0d       7421    4605    2073    69%    /sys
/dev/hp0e      26223   24036       0   102%    /usr/src
/dev/hp0f     102899   42003   50606    45%    /usr/local


-- 

				H. Morrow Long
				ITT-ATC Systems Center, Shelton, CT
	
path = {allegra bunker ctcgrafx dcdvaxb dcdwest ucbvax!decvax duke eosp1
	ittral lbl-csam milford mit-eddie psuvax1 purdue qubix qumix 
	research sii supai tmmnet twg uf-cgrl wxlvax yale}!ittvax!long

jhhur@kaist.UUCP (Hur, Jinho) (01/21/85)

Somewhile ago, our VAX 780 showed a disastrous performance.
It couldn't even echo the keystrokes from the terminal not to say of
any works. 'uptime' reported 14 users, load average about 4.5.
I only had to halt the cpu (^P) followed by a reboot.
As far as I can guess, it surely is a thrashing, a rarely occurence.
If you have any experience, please send me relevent information.
Like the cause you found, the remedy, etc.
Here is the disk usage for reference ( any hint? ).

Filesystem    kbytes    used   avail capacity  Mounted on
/dev/hp0a       7421    6519     159    98%    /
/dev/hp1h     137616  120835    3019    98%    /usr
/dev/hp1g      74691   65354    1868    97%    /usr/spool
/dev/hp2h     137616  120775    3079    98%    /va <- user area
/dev/hp0g      38639   34747      28   100%    /vb <- user area
/dev/hp2a       7429      30    6656     0%    /tmp
/dev/hp2g      74691   62096    5126    92%    /UDS <- utilities
-- 
real:	Hur, Jinho	Dept of Computer Science, KAIST
uucp:	..!hplabs!kaist!jhhur
csnet:	jhhur%kaist.csnet

john@genrad.UUCP (John Nelson) (01/22/85)

In article <1608@ittvax.UUCP> long@ittvax.UUCP (H. Morrow Long [Systems Center]) writes:
>> Filesystem    kbytes    used   avail capacity  Mounted on
>> /dev/hp0a       7421    6519     159    98%    /
>> /dev/hp1h     137616  120835    3019    98%    /usr
>> /dev/hp1g      74691   65354    1868    97%    /usr/spool
>> /dev/hp2h     137616  120775    3079    98%    /va <- user area
>> /dev/hp0g      38639   34747      28   100%    /vb <- user area
>> -- 
>	The transfer rates to be given in section 4 were
>	measured on file systems kept less than 90% full.  If the reserve of
>	free blocks is set to zero, the file system throughput rate tends to be
>	cut in half, because of the inability of the file system to localize
>	the blocks in a file.

I think that you missed the fact that the available free space reported by
4.2 df ALREADY TAKES the 10% into account when reporting available blocks
and capacity used.  Note that on the 100% segment kbytes != used.  This
10% is not hardwired, and is settable by tunefs(8)

west@sdcsla.UUCP (Larry West) (01/23/85)

In article <1608@ittvax.UUCP> long@ittvax.UUCP (H. Morrow Long
[Systems Center]) writes: {this is severely edited!}:
 > > Somewhile ago, our VAX 780 showed a disastrous performance.
 > > As far as I can guess, it surely is a thrashing, a rarely occurence.
 > > Here is the disk usage for reference ( any hint? ).
 > > 
 > > Filesystem    kbytes    used   avail capacity  Mounted on
 > > /dev/hp0a       7421    6519     159    98%    /
 > > /dev/hp2h     137616  120775    3079    98%    /va <- user area
 > > /dev/hp0g      38639   34747      28   100%    /vb <- user area
 >
 >We have also experienced the problem described above.  Here is what I found
 >out:
 >	"In order for the layout policies to be effective, the disk cannot be
 >	kept completely full.  Each file system maintains a parameter that
 >	gives the minimum acceptable percentage of file system blocks that can
 >	be free. 
 >
 >I believe there is a lesson here.  4.2bsd sites should try to keep all
 >filesystems below 90% full (especially those where a great amount of creation
 >and deletion take place daily - /usr, /usr/spool) or suffer degradation.
 >

From the 4.2bsd manual on "df" (which both previous contributors were
using to determine the "fullness" of their disks):

	Note that used+avail is less than the amount of space in the
	file  system (kbytes); this is because the system reserves a
	fraction of the space in the file system to allow  its  file
	system   allocation  routines  to  work  well.   The  amount
	reserved is typically about 10%; 

So a "df" listing saying 100% really means (ignoring "tunefs(8)") 90%.

-- 

--|  Larry West, UC San Diego, Institute for Cognitive Science
--|  UUCP:	{decvax!ucbvax,ihnp4}!sdcsvax!sdcsla!west
--|  ARPA:	west@NPRDC	{ NOT: <sdcsla!west@NPRDC> }

richl@daemon.UUCP (Rick Lindsley) (01/23/85)

I don't think you understand...the file systems, in the example you
gave, ARE only 90% full. If you add the free and used, it does not
equal (the partition - superblock). 4.2 has ALREADY reserved that 10%
for you, and if you have a relatively inactive file system which you
are just using for convenient storage, and efficiency is not as big a
concern as available space, it may be worth your while to regain that
space for yourself.  Check out tunefs(8).

This is not to say your solution is incorrect -- anytime a disk is
near-full you will suffer a slight but increasing degradation as files
fragment hither and yon. It IS better to spread things around. But your
view of the 10% threshold is not entirely correct.

Rick Lindsley

bloom%ucbshadow@Berkeley (Jim Bloom) (01/24/85)

The 10% is automatically left free for you by the system.  If one
examines the figures below, the sum of "used" "and" avail equals 
90% of the actual disk space listed under "kbytes".  Another 10%
free won't hurt performance, but I don't think that it gives too
much better improvement in the filesystems.  One probably wants
some extra space (1-2 Mbytes) free just for normal fluctuations.

					Jim Bloom
					ucbvax!bloom
					bloom@ucb-vax.arpa
> 
> Filesystem    kbytes    used   avail capacity  Mounted on
> /dev/hp0a       7421    6519     159    98%    /
> /dev/hp1h     137616  120835    3019    98%    /usr
> /dev/hp1g      74691   65354    1868    97%    /usr/spool
> /dev/hp2h     137616  120775    3079    98%    /va <- user area
> /dev/hp0g      38639   34747      28   100%    /vb <- user area
> /dev/hp2a       7429      30    6656     0%    /tmp
> /dev/hp2g      74691   62096    5126    92%    /UDS <- utilities
> -- 

preece@ccvaxa.UUCP (01/25/85)

>	I believe there is a lesson here.  4.2bsd sites should try to keep all
>	filesystems below 90% full (especially those where a great amount of
>	creation and deletion take place daily - /usr, /usr/spool) or suffer
>	degradation.
----------
You don't really have to try very hard to keep your filesystems under 90%
full -- 4.2 will keep normal users from pushing them past that point.
You're missing the fact that the df listing is giving you percentages
relative to a 90% ceiling, NOT relative to total capacity.  Note that
a filesystem shown as 100% full and with 0 blocks available will still
have 10% fewer blocks in the used column than in the capacity column.
Superusers are able to go beyond the 90% limit and df will obediently
report that up to 110% of the filesystem is being used.

Still, it's probably better to try to keep from pushing the 90% (shown
as 100%) level.

scott preece
ihnp4!uiucdcs!ccvaxa!preece