[comp.sys.mac] Inefficiency with DiskFit 1.5 during backups

jlc@atux01.UUCP (J. Collymore) (04/11/89)

I am having a small problem with DiskFit 1.5.  I use DiskFit 1.5 to backup my
entire 30Mb external hard disk everyday.  Right now I am holding at around 20-
21 Mb on the hard disk.  Optimally, this should require 27 or 28 disks, but
instead it is using about 31 disks.  I find that SOME disks, after doing
incrementals, have anywhere from 50K to 250K unused!  I thought DiskFit was to
shuffle things around so that it weould make maximum use of ALL free space on
the backup disks, but repeatedly since I began using DiskFit 2 years ago, this
in NOT the case!

What I have to do periodically is trash the DiskFit Info file in the system
folder and start my backup FROM SCRATCH to ensure that each disk's space is
used optimally.  At the 21Mb level, this can free up anywhere from 1 to 3 disks
that I was otherwise using in my "SmartSet."  I wish I did NOT have to use this
method, because (with Write Verifies) this procedure takes 45 - 60 minutes for
21Mb!

If Fabian Ramirez, or anyone else working for SuperMac, reads this, please
tell me if there is something else I should be doing to improve optimal use
of my disks, and if not, would SuperMac PLEASE do something to improve this
feature in the next release?

Thank you.

						Jim Collymore

chuq@Apple.COM (Chuq Von Rospach) (04/13/89)

>Optimally, this should require 27 or 28 disks, but
>instead it is using about 31 disks.  I find that SOME disks, after doing
>incrementals, have anywhere from 50K to 250K unused!  I thought DiskFit was to
>shuffle things around so that it weould make maximum use of ALL free space on
>the backup disks, but repeatedly since I began using DiskFit 2 years ago, this
>in NOT the case!

Yes and no. Yes, DiskFit will try to use disks as effiently as possible, but
no, it won't move files around unless they've changed. DiskFit also won't
release a disk from a DiskFit set, either, so the size of the DiskFit
archive matches the highwater mark of your disk usage.

I've mentioned both limitations (they aren't problems or bugs -- that's the
way DiskFit is supposed to work) to Dantz, and have been told they'll look
into it. What I suggested was (1) to allow a migration option so that files
can move from the end of the DiskFit backup to lower numbered disks when
possible, and (2) to allow a floppy to be de-commissioned without having to
scrag the set and starting over [although, if you read my recent posting on
floppy disks going bad out from under you, maybe that's not a bad idea every
few months....].

No bugs here -- just design features that are different than what you'd
want. 



Chuq Von Rospach       -*-      Editor,OtherRealms      -*-      Member SFWA
chuq@apple.com  -*-  CI$: 73317,635  -*-  Delphi: CHUQ  -*-  Applelink: CHUQ
      [This is myself speaking. No company can control my thoughts.]

USENET: N. A self-replicating phage engineered by the phone company to cause
computers to spend large amounts of their owners budget on modem charges.

dplatt@coherent.com (Dave Platt) (04/13/89)

In article <1122@atux01.UUCP> jlc@atux01.UUCP (J. Collymore) writes:

> I am having a small problem with DiskFit 1.5.  I use DiskFit 1.5 to
> backup my entire 30Mb external hard disk everyday.  Right now I am
> holding at around 20- 21 Mb on the hard disk.  Optimally, this should
> require 27 or 28 disks, but instead it is using about 31 disks.  I find
> that SOME disks, after doing incrementals, have anywhere from 50K to
> 250K unused!  I thought DiskFit was to shuffle things around so that it
> weould make maximum use of ALL free space on the backup disks, but
> repeatedly since I began using DiskFit 2 years ago, this in NOT the
> case!

Jim: you've just described a very difficult problem.  The job of packing
M objects (files) of different sizes into N "buckets" (diskettes), so
that the total space used/wasted is minimized, is one of a large class
of problems that are "NP-complete".  No one has ever designed an
algorithm that will produce the "best" solution for any of these
problems that is capable of running in less than "exponential" time.

It turns out that the only way known to find the "best" solution to one
of these problems is to examine _every_ alternative, and then pick the
best one... there's no way to prune the search-space down to a
reasonable size, unless you're willing to abandon hope for the "best"
solution and accept one that's merely "good".  The number of possible
alternative solutions goes up _very_ fast as the number of objects
increases.  Adding one file to the problem may double or triple the
number of alternatives that must be considered... or even worse!

I believe that the time required to calculate the optimal packing of as
few as 300 files on 30 diskettes would take your Mac longer than the
remaining lifespan of the Sun.  This doesn't include the time necessary
to write the files to disk ;-}.

If you compare the "best" solutions for a collection of M files, and
the best solution for M+1 files (i.e. add one new file, with all of the
others being unchanged), then you will frequently find that the
solutions look entirely different... all of the diskettes will have to
be rearrange to move from the M-file case to the M+1-file case.

No, DiskFit does not "shuffle around" files in the way that you're
thinking.  If a file has not been changed on your hard disk, then
DiskFit will _not_ move the backup-copy around from one disk to another
in order to permit tighter file-packing.  This could greatly increase
the amount of time that an incremental backup would require... in
effect, any incremental backup could take just as long as a full backup.

I'm not sure what algorithm DiskFit uses to decide how to group files
together.  I do know that it tends to place files from a specific folder
onto the same set of diskettes... this helps keep the number of HFS
directories on each diskette to a relatively small number (a Good Thing
for several reasons) and makes it somewhat easier to locate and restore
files by hand (from the Finder) if you should wish to do so.

DiskFit could perhaps pack files more tightly together if it were
willing to split even small files across multiple floppies.  Currently,
it splits files only if they're too large to fit on a single floppy.  I
would not wish to see this behavior change... it'd make manual file
restoration rather more of a pain.

> What I have to do periodically is trash the DiskFit Info file in the
> system folder and start my backup FROM SCRATCH to ensure that each
> disk's space is used optimally.  At the 21Mb level, this can free up
> anywhere from 1 to 3 disks that I was otherwise using in my "SmartSet."
> I wish I did NOT have to use this method, because (with Write Verifies)
> this procedure takes 45 - 60 minutes for 21Mb!

Doing a full-backup-from-scratch is a good idea occasionally, anyhow,
especially if you reinitialize your floppies first... this gives you an
additional guarantee that the floppies themselves are still good.

Is it so bad a thing to have a SmartSet that's 31 diskettes long, rather
than 27 or 28?  I figure that the slight inefficiency is costing you
perhaps $6 per SmartSet... hardly a backbreaking cost.

Re what you can do... not much.  In my experience, DiskFit seems to keep
the SmartSets within about 10% of being fully-utilized, even if I've
been doing incremental backups to the same set for weeks or months.  The
space-waste doesn't seem to rise above 15% unless I delete a large
number of files... and the space gets reused when I add new files to the
hard-disk.

On a more theoretical level... if you can devise an algorithm for any
NP-complete problem that will produce an optimal solution while
requiring only a polynomial-order amount of time and space, you'll be
famous... I mean that quite seriously.  If you can _prove_ that it
cannot be done, you'll be equally famous.  The NP-completeness problem
is to Computer Science what Fermat's Last Theorem is to mathematics.
-- 
Dave Platt    FIDONET:  Dave Platt on 1:204/444        VOICE: (415) 493-8805
  UUCP: ...!{ames,sun,uunet}!coherent!dplatt     DOMAIN: dplatt@coherent.com
  INTERNET:   coherent!dplatt@ames.arpa,  ...@uunet.uu.net 
  USNAIL: Coherent Thought Inc.  3350 West Bayshore #205  Palo Alto CA 94303

fozzard@boulder.Colorado.EDU (Richard Fozzard) (04/13/89)

In article <1122@atux01.UUCP> jlc@atux01.UUCP (J. Collymore) writes:
>I find that SOME disks, after doing
>incrementals, have anywhere from 50K to 250K unused!  I thought DiskFit was to
>shuffle things around so that it weould make maximum use of ALL free space on
>... would SuperMac PLEASE do something to improve this
>feature in the next release?

I second this motion - I am having the same problem



========================================================================
Richard Fozzard
University of Colorado				"Serendipity empowers"
fozzard@boulder.colorado.edu

chuq@Apple.COM (Chuq Von Rospach) (04/14/89)

>I'm not sure what algorithm DiskFit uses to decide how to group files
>together.

I've taken a look at what DiskFit seems to be doing, and while I don't have
the exact algorithm, I can make some educated guesses:

o Use as few floppies as possible.
o Don't split files unless necessary.
o Try to group the older (not recently modified) files together, so you
  don't get into the "I grew 5K, grab a new floppy" shuffle. 
o try to keep files in folders together.
o try to put a file back on the same floppy it came from.

Diskfit also tries to leave a little free space on a floppy, for two
reasons: one is to allow a file to grow slightly without having to move it
to a new floppy (thereby reducing fragmentation). It also needs to leave
room for the Desktop file that is created if you load the floppy from the
Finder (DiskFit will delete it when it sees it...).

>DiskFit could perhaps pack files more tightly together if it were
>willing to split even small files across multiple floppies.

Except that one design aspect of DiskFit I *really* like is that all of the
floppies are Finder readable -- the only backup utility to do so. This means
that I can go get a file, even if DiskFit throws up and dies on me. After
losing some files to HFS Backup 1.0 due to corruption, I'll happily toss in
a couple of extra floppies for this feature.




Chuq Von Rospach       -*-      Editor,OtherRealms      -*-      Member SFWA
chuq@apple.com  -*-  CI$: 73317,635  -*-  Delphi: CHUQ  -*-  Applelink: CHUQ
      [This is myself speaking. No company can control my thoughts.]

USENET: N. A self-replicating phage engineered by the phone company to cause
computers to spend large amounts of their owners budget on modem charges.

Fabian@cup.portal.com (Fabian Fabe Ramirez) (04/14/89)

Jim,

Boy...Chuq and Dave sure gave you a lot to think about.  Thanks(!) for the
feedback, I'll pass it along to product management and to Dantz.

One thing to remember, DiskFit tries to prevent itself from having to split
files.  That's one of its features, Finder-compatible backup.  Just insert the
particular SmartSet diskette and copy the appropriate icon back to one's hard
disk.  And yes, it true.  It would be neat if DiskFit would "release" unused 
diskettes, but it does.  That's part of the design.

One way as you indicated is to delete the DiskFit Info file or a better way 
would be to delete the SmartSet information via the SmartSet item under 
DiskFit's Window menu.  This will cause DiskFit to not recognize the SmartSet 
and give you the option to Scan or start a new SmartSet.

During the rescan, for example you have a 30 disk SmartSet and you know that 5
diskettes are empty.  When you get to the last 5 diskettes of your SmartSet 
(26.whatever - 30.whatever), simply use the Missing button.  This will then 
tell DiskFit that your SmartSet is actually 25 diskettes and then when DiskFit
starts the backup, it will use those "empties".

Or as Dave suggested, it never hurts to create a new SmartSet every now and 
then.  As for the time to backup, is there any particular reason why you're 
using the Verify Writes option?  This is actually an additional safeguard that
goes beyond the "normal" safeguards that are "extra" on others.

Again, thanks for the feedback!

Fabian Ramirez
SuperMac Technology

fabian@cup.portal.com
sun!cup.portal.com!fabian

kaz@nanovx.UUCP (Mike Kazmierczak) (04/15/89)

In article <1122@atux01.UUCP> jlc@atux01.UUCP (J. Collymore) writes:
>What I have to do periodically is trash the DiskFit Info file in the system
>folder and start my backup FROM SCRATCH to ensure that each disk's space is
>used optimally.  At the 21Mb level, this can free up anywhere from 1 to 3 disks
>that I was otherwise using in my "SmartSet."  I wish I did NOT have to use this
>method, because (with Write Verifies) this procedure takes 45 - 60 minutes for
>21Mb

When you delete the SmartSet Disk Inf (Scan all disks to rebuild set)o file, tell the program that about 75%
of your disks still exist and that the last quarter are missing.  This will
force reorganization into as few disks as possible

Mike Kazmierczak, Ph.D. (and proud of it!)
kaz @ nanovx.uucp

canonical disclaimer applies...
!

Fabian@cup.portal.com (Fabian Fabe Ramirez) (04/15/89)

Greetings,

Here's a reply from Larry Zulch of Dantz Devlopment Corp.

"Optimal use of backup disks must take into account other factros than raw
percentage utilization.  DiskFit employs a rather sophisticated algorithm
that weights quite a number of factors.  In rough order of greater or lesser
weighting, some of these factors include: keeping a file in Finder format, 
utilizing existing space on the backup set, preventing folders from being
fragmented across backup disks, minimizing disk insertions.

If files are removed from the source, DiskFit will remove them from the
backup set and reclaim the space.  It will not, however, move files from one
disk to another in order to release disks from the set--that would be slow
and require a great number of disk insertions or recopying.  Typically, the
amount of data stored on a hard disk ebbs and flows so the released disks 
would have to be added again.  Anyway, relative to data, disks are not
particularly expensive.

We have carefully analyzed the direct and resultant usage of DiskFit 
SmartSet disks and have spent a lot of time tuning the performance.  I do
not think you will find a faster, more reliable way to keep a virtual, or 
shadow, copy of your hard disk for backup purposes.

Larry Zulch
Dantz Development Corp."

gae@sphere.mast.ohio-state.edu (Gerald Edgar) (04/15/89)

jlc@atux01.UUCP (J. Collymore) writes:
]I find that SOME disks, after doing
]incrementals, have anywhere from 50K to 250K unused!  I thought DiskFit was to
]shuffle things around so that it weould make maximum use of ALL free space on
]... would SuperMac PLEASE do something to improve this
]feature in the next release?

fozzard@boulder.Colorado.EDU (Richard Fozzard) writes:
>I second this motion - I am having the same problem
>

Actually, I disagree.  If it can be done with no trade-off, fine.  But I doubt
it.  I would not want to reduce the size of my Smart set by 10 percent if
it means an increase in the time taken by incremental backups.  I figure
I'll spend a few extra dollars for disks in return for the time saved.

-- 
  Gerald A. Edgar          
  Department of Mathematics                     TS1871@OHSTVMA.bitnet
  The Ohio State University                     gae@sphere.mast.ohio-state.edu
  Columbus, OH 43210   ...!{att,pyramid}!osu-cis!sphere.mast.ohio-state.edu!gae