[comp.unix.wizards] What, exactly, are stat.st_blocks, statfs.f_bsize?

jik@athena.mit.edu (Jonathan I. Kamens) (02/26/91)

  My "delete" application wants to be able to tell exactly how much space on a
disk is actually occupied by a file.  This involves two things:

1. Finding out how many disk blocks the file occupies.
2. Finding out the size of each block.

Now, on a 4.3BSD system, the stat structure contains the st_blocks field,
which tells the "actual number of blocks allocated."  Given that
description, the question becomes, what exactly is a "block?"  There are two
possible answers:

1. The size specified by DEV_BSIZE.
2. The size in the f_bsize field of the statfs structure of the filesystem on
   which the file resides.

Now, it seemed to me that f_bsize would be the logical choice, since different
filesystems can have different minimum block sizes, but some experimentation
indicates that actually, DEV_BSIZE is what's being used.  The 4.3reno stat(2)
man page goes even further; it describes st_blocks as "The actual number of
blocks allocated for the file in 512-byte units."  But that leaves me with
another question -- is it DEV_BSIZE, or 512 bytes?

  Besides all of these problems, we have the problem that some Unix
implementations (POSIX systems, in particular) don't even have an st_blocks
structure, so all I've got to work with is st_size.

  So, I would like to ask if the following outlined method of determining the
actual space usage of a file on many different flavors of Unix is reliable:

#include <sys/param.h> /* for DEV_BSIZE */

int actual_bytes;
struct stat statbuf;

... assume statbuf is initialized ...

#ifdef ST_BLOCKS_EXISTS /* I'll define this myself, as necessary */
#ifdef DEV_BSIZE
actual_bytes = statbuf.st_blocks * DEV_BSIZE;
#else
actual_bytes = statbuf.st_blocks * 512;
#endif /* DEV_BSIZE */
#else /* ! ST_BLOCKS_EXISTS */
#ifdef DEV_BSIZE
actual_bytes = DEV_BSIZE * (statbuf.st_size / DEV_BSIZE + 
                            ((statbuf.st_size % DEV_BSIZE) ? 1 : 0));
#else
actual_bytes = statbuf.st_size;
#endif /* DEV_BSIZE */
#endif /* ST_BLOCKS_EXISTS */

  One final question: I thought that f_bsize was the minimum block size for a
filesystem, but when statfs()ing certain filesystems, I find it possible to
create a file that takes up much less space than what f_bsize says.  So, what
is f_bsize supposed to represent?

  Thanks for any help you can provide!
-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

torek@elf.ee.lbl.gov (Chris Torek) (02/26/91)

In article <1991Feb25.205932.16587@athena.mit.edu> jik@athena.mit.edu
(Jonathan I. Kamens) writes:
>... on a 4.3BSD system, the stat structure contains the st_blocks field,
>which tells the "actual number of blocks allocated."  Given that
>description, the question becomes, what exactly is a "block?"  There are two
>possible answers:
>
>1. The size specified by DEV_BSIZE.
>2. The size in the f_bsize field of the statfs structure of the filesystem on
>   which the file resides.

The answer is `none of the above'.

>Now, it seemed to me that f_bsize would be the logical choice,

No: f_bsize is the `block' size and not the `fragment' size under
4.3BSD-reno, i.e., typically 8K rather than 1K.  (SunOS and 4BSD are
different here; SunOS defines f_bsize as the fragment size.)

>The 4.3reno stat(2) man page goes even further; it describes st_blocks
>as "The actual number of blocks allocated for the file in 512-byte units."
>But that leaves me with another question -- is it DEV_BSIZE, or 512 bytes?

It is 512 bytes; it does not matter what DEV_BSIZE is.  Under 4.3tahoe
on the Tahoe, DEV_BSIZE was 1024; 4.3reno has no DEV_BSIZE at all (well,
it has one as a compatibility hack) and each disk's block size is a
property of that disk.

Note that there may be (probably are) some systems out there in which
st_blocks is in terms of 1 kbyte blocks; these should dwindle away,
but will probably leave a lingering stench. :-)
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

jik@athena.mit.edu (Jonathan I. Kamens) (02/26/91)

  Well, if there are systems that measure st_blocks in terms of 1k blocks, how
can I detect them in my source code?

  Assuming that it's always 512 bytes would leave me with the following code:

int actual_bytes;
struct stat statbuf;

... assume statbuf is initialized ...

#ifdef ST_BLOCKS_EXISTS
actual_bytes = statbuf.st_blocks * 512;
#else
actual_bytes = statbuf.st_size;
#endif /* ST_BLOCKS_EXISTS */

But this is going to lose on sites that have 1k blocks.  Is there any way to
detect them.

  And, on a historical note, what led to the decision to measure in terms of
512-byte blocks, and why do some sites measure in terms of 1k blocks instead?

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8085			      Home: 617-782-0710

rbj@uunet.UU.NET (Root Boy Jim) (03/01/91)

In article <10283@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:

>[Reno is 512, Tahoe is 1024]
>Note that there may be (probably are) some systems out there in which
>st_blocks is in terms of 1 kbyte blocks; these should dwindle away,
>but will probably leave a lingering stench. :-)

Methinks the stench comes from POSIX, which gutlessly refused
to buck existing but ancient practice. I assume that's why BSD changed.

Oddly enuf, this came at a time when 1K FS blocks were becoming
more common in System V. It is a fortunate coincidence that
2^10 ~= 10^3. Too important not to take advantage of.

I was delighted when Berkeley "defined" a "block" as 1K. No more
doubling or halving in one's head when trying convert blocks to chars.

To make it worse, Pyramid's FS block sizes are 2K to 16K (yes, the
sectors are 2K), and so they report blocks in 2K increments. It is
rather sad to see filesystems quadruple in size when reported
between NFS partitions mounted to or from a Pyramid. Oh well...
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane

md@sco.COM (Michael Davidson) (03/02/91)

jik@athena.mit.edu (Jonathan I. Kamens) writes:

>  And, on a historical note, what led to the decision to measure in terms of
>512-byte blocks, and why do some sites measure in terms of 1k blocks instead?

I'm sure that, like most design decisions, it was essentially
arbitrary. It was, however, a very natural choice in the context
of the machines on which the early UNIX filesystems were implemented.
These machines had small disks with a physical sector size of 512 bytes
and small amounts of main memory. So, 512 bytes was a natural choice.

Try reading some of Ritchie and Thompson's papers on UNIX for an
introduction to the design philosophy (Bell System Technical Journal
for July-August 1978 is a good place to start).

A more interesting question is "why did it stay that way so long ..."

bzs@world.std.com (Barry Shein) (03/04/91)

>To make it worse, Pyramid's FS block sizes are 2K to 16K (yes, the
>sectors are 2K), and so they report blocks in 2K increments. It is
>rather sad to see filesystems quadruple in size when reported
>between NFS partitions mounted to or from a Pyramid. Oh well...

That's a real bug, I bet their NFS is based on an older LAI version.
There are two different values kept in the server, one for the local
unit and another for the external unit. They're confused in a few
places in the code, particularly in the call that does whatever it is
that "df" wants, I forget the NFS name for this op.

I've fixed this before in that code, it's just a matter of changing
the name of the struct element used in a few places to the correct
one.

-- 
        -Barry Shein

Software Tool & Die    | bzs@world.std.com          | uunet!world!bzs
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

greywolf@unisoft.UUCP (The Grey Wolf) (03/05/91)

<1991Feb26.010146.27490@athena.mit.edu> by jik@athena.mit.edu (Jonathan I. Kamens)
# 
#   And, on a historical note, what led to the decision to measure in terms of
# 512-byte blocks, and why do some sites measure in terms of 1k blocks instead?
#

512 bytes seems to be the usual size of a physical sector on a disk
(as I have discovered the hard way via the /stand/diag stuff for a sun).
System V used 512-byte blocks in their filesystems from <insert generic
deity here>-knows-when up until 1k and 8k blocks in filesystems were
available.  (8k, I suspect, was for filesystems chock full of data so that
it could be schlumped around with "reasonable" speed.)  And even after
that, the 512-byte block survived.

Of course, there's one in every crowd:  Our pyramid's filesystems have
16k blocks and 2k fragments, and the disk itself is tuned to 2k physical
block size.  Go figure.

The reasoning behind figuring in terms of 1k blocks (or appearing to,
in the case of "du") was probably mathematical.  It might not take a whole
lot of effort to double or half a number (especially if you make the machine
do it for you :-), but someone out there probably figured that Wouldn't
Life Be So Much Simpler If... and it went from there.

Someone out there probably has even more historical info than this.

It would break some things, but would anyone else out there find it useful
to have the stat structure contain the number of logical blocks and the
number of fragments, rather than/in addition to the number of physical
blocks?

Are fs_blocksize and fs_fragsize for a file system defined anywhere?
(probably somewhere in the superblock, but is there a system call to
return this information?  fsstat/statfs deal with the fundamental blocksize,
but they don't provide info about the fragment size.  Assuming 8:1
isn't always right.)

Is it just me or should more information about a filesystem be available?


# -- 
# Jonathan Kamens			              USnail:
# MIT Project Athena				11 Ashford Terrace
# jik@Athena.MIT.EDU				Allston, MA  02134
# Office: 617-253-8085			      Home: 617-782-0710


-- 
# The days of the computer priesthood are not over.
# May they never be.
# If it sounds selfish, consider how most companies stay in business.

sas@shadow.pyramid.com (Scott Schoenthal) (03/08/91)

In article <BZS.91Mar3133828@world.std.com> bzs@world.std.com (Barry Shein) writes:
>
>>To make it worse, Pyramid's FS block sizes are 2K to 16K (yes, the
>>sectors are 2K), and so they report blocks in 2K increments. It is
>>rather sad to see filesystems quadruple in size when reported
>>between NFS partitions mounted to or from a Pyramid. Oh well...

There is nothing in the NFS protocol that specifies a required filesystem
or directory block size.  The NFS statfs response returns the "fundamental"
block size and the total and free # of blocks in the server's filesystem.

Some applications (e.g., OSx 'du') don't do the statfs() when calculating
# of blocks used.  If an application uses the local notion of device
block size, block calculations will be wrong when interacting with
a server with a different block size.

>That's a real bug, I bet their NFS is based on an older LAI version.

You would lose.  OSx NFS is based upon Sun NFSSRC with multi-processor,
scaling, and "dual universe" extensions.

>There are two different values kept in the server, one for the local
>unit and another for the external unit. They're confused in a few
>places in the code, particularly in the call that does whatever it is
>that "df" wants, I forget the NFS name for this op.

'df' does use statfs() (at least the Sun NFSSRC and OSx 'df') and ought to
work properly.  If not, send mail to bugs@pyramid.com  If the problem
is in our server code, it will get fixed in a timeframe relative to the
customer severity.

Pyramid has successfully participated in Sun NFS/ONC Connectathons
for several (>5) years.

/sas
----
Scott Schoenthal   			sas@shadow.pyramid.com
Pyramid Technology Corp.		{sun,hplabs,decwrl,uunet}!pyramid!sas

guy@auspex.auspex.com (Guy Harris) (03/12/91)

>There is nothing in the NFS protocol that specifies a required filesystem
>or directory block size.

It also doesn't specify the units to be used in the "blocks" field of
the "fattr" structure in an NFS GETATTR reply; this is extremely
unfortunate, as it led various vendors not to use 512-byte chunks as the
size, and therefore cause users of programs running on other machines to be
unpleasantly surprised when said programs assume, incorrectly, that when
they do a "stat()" the "st_blocks" result isn't in units of 512-byte
chunks.

Given that S5R4 and 4.3-reno both specify, in the documentation, that
"st_blocks" is in units of 512-byte chunks, a convention needs to be
specified - either in the NFS protocol, or in some kind of side notes to
it - ensuring that (modern UNIX) clients can arrange to report
"st_blocks" in those units.

Given that most (modern UNIX) clients probably just use what they get
back from the server in the "blocks" field, the most appropriate
convention would probably be to say "'blocks' is in units of 512-byte
chunks, regardless of what the block or fragment size of the underlying
file system, or the disk block size, is."

>Some applications (e.g., OSx 'du') don't do the statfs() when calculating
># of blocks used.  If an application uses the local notion of device
>block size, block calculations will be wrong when interacting with
>a server with a different block size.

*Lots* of applications on *non*-Pyramid systems don't do the "statfs()"
when calculating # of blocks used from the "st_blocks" field; I suspect,
in fact, most applications on most systems don't.

rbj@uunet.UU.NET (Root Boy Jim) (03/12/91)

In article <147432@pyramid.pyramid.com> sas@shadow.pyramid.com (Scott Schoenthal) writes:
>In article <BZS.91Mar3133828@world.std.com> bzs@world.std.com (Barry Shein) writes:
>>
>There is nothing in the NFS protocol that specifies a required filesystem
>or directory block size.  The NFS statfs response returns the "fundamental"
>block size and the total and free # of blocks in the server's filesystem.

No, but there is existing practice, and there are the Connectathons
you mentioned below. Didn't you notice that your numbers were
different locally than across the network? Didn't it bother you?

>Stat doesn't say how big your block
>Some applications (e.g., OSx 'du') don't do the statfs() when calculating
># of blocks used.  If an application uses the local notion of device
>block size, block calculations will be wrong when interacting with
>a server with a different block size.

DU shouldn't use statfs. It can cross filesystems.

So what's left? Lowest common denominator, 512 byte blocks.
I would rather see 1K "Blocks" regardless of actual size.

BTW, kudos for making your sector sizes 2k and allowing 16k blocks.

>'df' does use statfs() (at least the Sun NFSSRC and OSx 'df') and ought to
>work properly.  If not, send mail to bugs@pyramid.com  If the problem
>is in our server code, it will get fixed in a timeframe relative to the
>customer severity.

DF also prints in kilobytes, as does ls, as does du, as does sum.
Here again (sum), y'all took 'blocks' literally, and print a different
result than other BSD systems.

>Pyramid has successfully participated in Sun NFS/ONC Connectathons
>for several (>5) years.

Yes, I note that you were one of the first. However, why don't
you have a lock daemon, and is your code the latest version?
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane

richard@aiai.ed.ac.uk (Richard Tobin) (03/12/91)

In article <6558@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>There is nothing in the NFS protocol that specifies a required filesystem
>>or directory block size.

>It also doesn't specify the units to be used in the "blocks" field of
>the "fattr" structure in an NFS GETATTR reply; this is extremely
>unfortunate,

Particularly given that it appears to specify it; the obvious
interpretation of the protocol specification is that it's in units of
"blocksize":

 "'blocksize' is the size in bytes of a block of the file ... 'blocks'
  is the number of blocks the file takes up on disk"

  [NFS: Version 2 Protocol Specification, reproduced in the SunOS
   4.1 documentation]

It would take a mind-reader to guess that the two uses of "block" in
one sentence had different meanings, and that the first use in fact
meant "the optimal transfer size".

>Given that most (modern UNIX) clients probably just use what they get
>back from the server in the "blocks" field, the most appropriate
>convention would probably be to say "'blocks' is in units of 512-byte
>chunks, regardless of what the block or fragment size of the underlying
>file system, or the disk block size, is."

This does seem the best solution.  Fortunately, disk block sizes are
usually a multiple of 512 bytes, so the space occupied can be reported
accurately.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin