[comp.sys.ibm.pc] confused: text files wastes space? How to correct it?

deng@shire (Mingqi Deng) (02/15/89)

While using NC and Norton utility programs, I found that disk spaces
were used very ineffeciently with ASCII files (nearly 40% wasted). 

Both NC and Norton showed that the space used by some of my text files 
used space up to 78% percent more than their 'real size' (shown by DIR
command). (What was actually shown is the space used by all the files
in a directory. FS of Norton gives an explicit report of percentage
slackness. I figured out for NC.)

I am confused. I know each sector is 512 (256? The real figure is not
very important here.) bytes in DOS 3.3 and any file whose size is not a
multiple of 512 bytes will leave last sector allocated for it to be 
partially empty. But this only makes the space used increase by about 
500 bytes. My files are not just 200 bytes in size. That is why I am
puzzeled.

One reasonable explanation I can think of is as the following. The text
file could had been created through many editing sessions, and the DOS
simply uses additional sectors for the sectors to which new text has 
been inserted, rather than rewrite the whole file to opitmize space.
This is a trade between speed and space.

But for largre files, this is certainly a big waste. Does anybody know
how to 'compact' the disk spaces used by text files?

Thanks.

Mingqi 

deng@shire.cs.psu.edu
deng@psuvaxs.bitnet
deng@psuvaxs.UUCP

silver@eniac.seas.upenn.edu (Andy Silverman) (02/16/89)

In article <4295@psuvax1.cs.psu.edu> deng@shire (Mingqi Deng) writes:
>While using NC and Norton utility programs, I found that disk spaces
>were used very ineffeciently with ASCII files (nearly 40% wasted). 
>
>Both NC and Norton showed that the space used by some of my text files 
>used space up to 78% percent more than their 'real size' (shown by DIR
>command). (What was actually shown is the space used by all the files
>in a directory. FS of Norton gives an explicit report of percentage
>slackness. I figured out for NC.)
>
>I am confused. I know each sector is 512 (256? The real figure is not
>very important here.) bytes in DOS 3.3 and any file whose size is not a
>multiple of 512 bytes will leave last sector allocated for it to be 
>partially empty. But this only makes the space used increase by about 
>500 bytes. My files are not just 200 bytes in size. That is why I am
>puzzeled.
>
Well, the fallacy here is that while a sector in DOS does indeed contain
512 bytes, all files are allocated in minimum units of "clusters."  A
cluster on a floppy is usually 2 sectors (1K of data), and hard drives
have cluster sizes ranging from 4 to 16 sectors, depending on the size of
a drive.  So while you may have a text file that's only 15 bytes, it will
take up a minimum of 1K on a floppy, or 2K or even 8K on a hard disk.
Norton's program reports on "slack" which is the difference between a
file's true size and the amount of space it takes up on the drive (determined
by cluster size).  One way to compress text files so that they take up less
drive space is to use a program like ARC or PKPAK to combine several text
files into one large library file, which is then further squashed using 
mathematical techniques to reduce disk usage even more.

Andy Silverman
Internet: silver@eniac.seas.upenn.edu
CompuServe: 72261,531

hollen@spot.megatek.uucp (Dion Hollenbeck) (02/17/89)

From article <4295@psuvax1.cs.psu.edu>, by deng@shire (Mingqi Deng):
> While using NC and Norton utility programs, I found that disk spaces
> were used very ineffeciently with ASCII files (nearly 40% wasted). 
> 
> [...stuff deleted...]
> 
> I am confused. I know each sector is 512 (256? The real figure is not
> very important here.) bytes in DOS 3.3 and any file whose size is not a
> multiple of 512 bytes will leave last sector allocated for it to be 
> partially empty. But this only makes the space used increase by about 
> 500 bytes. My files are not just 200 bytes in size. That is why I am
> puzzeled.
> 
One factor you have not taken into account is allocation unit size.
DOS does not allocate one sector at a time, but one allocation unit at
at a time and depending on the size of the disk, the size of the
allocation unit changes (the bigger the disk, the bigger the allocation
unit) otherwise, the standard File Allocation Table would not be big
enough to map large disks.  Due to this scheme, a file containing only
1 byte could take up to 4096 bytes on a very large disk.  Try increasing
your file sizes by multiples of 512 and see when the next jump in actual
size allocated happens and you should then know what the allocation
unit size is on your disk.

	Dion Hollenbeck             (619) 455-5590 x2814
	Megatek Corporation, 9645 Scranton Road, San Diego, CA  92121

                                seismo!s3sun!megatek!hollen
                                ames!scubed/

simon@ms.uky.edu (Simon Gales) (02/22/89)

In article <495@megatek.UUCP> hollen@spot.megatek.uucp (Dion Hollenbeck) writes:
> ...
>Try increasing
>your file sizes by multiples of 512 and see when the next jump in actual
>size allocated happens and you should then know what the allocation
>unit size is on your disk.
>

Just run chkdsk, it will tell you your cluster (allocation unit) size.
I have a 40meg hd running under DOS 4, the cluster size is 2K.

-- 
/------------------------------------------------------------------------\
  Simon Gales@University of Ky
  {rutgers, uunet}!ukma!simon  -  simon@ms.uky.edu  -  simon@UKMA.BITNET