[comp.lang.pascal] Fitting files on disk

NORM%IONAACAD.BITNET@cunyvm.cuny.edu ( Norman Walsh) (03/14/91)

Hello,

I have a question that I cannot answer sufficiently for myself:

Given a collection of n files (.zip files in my case) is there an
algorithm (short of brute-force) for determining the best arrangement
of files to fit on as few disks (of a given size) as possible?

Suppose, for example, I have 40 files that range in size from 30 to
400 kb.  I want to move them all to floppies but I want to do so in
such a way as to use as few floppies as possible.  I want each file to
be complete on one disk, i.e. I don't want to split files across
disks.

My first-guess algorithm is to always place the largest file that will
fit on a particular disk on that disk.  However, I have my suspicions
that this is not the gauranteed best way of doing it.  Unfortunately,
I can't think of a better way nor can I demonstrate mathematically that
the biggest-first algorthim is the best...(or not)

Any thoughts?

                                                  ndw

forbesmc@matai.vuw.ac.nz (03/15/91)

In article <26272@adm.brl.mil>, NORM%IONAACAD.BITNET@cunyvm.cuny.edu ( Norman Walsh) writes:
> Hello,
> 
> I have a question that I cannot answer sufficiently for myself:
> 
> Given a collection of n files (.zip files in my case) is there an
> algorithm (short of brute-force) for determining the best arrangement
> of files to fit on as few disks (of a given size) as possible?
> 
> Suppose, for example, I have 40 files that range in size from 30 to
> 400 kb.  I want to move them all to floppies but I want to do so in
> such a way as to use as few floppies as possible.  I want each file to
> be complete on one disk, i.e. I don't want to split files across
> disks.
> 
> My first-guess algorithm is to always place the largest file that will
> fit on a particular disk on that disk.  However, I have my suspicions
> that this is not the gauranteed best way of doing it.  Unfortunately,
> I can't think of a better way nor can I demonstrate mathematically that
> the biggest-first algorthim is the best...(or not)
> 
> Any thoughts?
> 
>                                                   ndw
I think your first guess is right : have a list of the files in order
of descending size, pick the largest file and write to the disk, 
find out how much space is left on the disk and put the largest
remaining file that will fit onto it.  Repeat until no more files
will fit on the disk.  Now repeat the entire procedure for the 
next disk etc.  (this method is used to minimise the queuing time
for printers etc so should work for minimising the number of disks).

-- 
+------------------------------------------------------------------+
| I'm a toxophilite,         | Murray Forbes,                      |
| so what's your problem?    | Physics Department,                 |
|----------------------------| Victoria University of Wellington,  |
| standard disclaimer :      | New Zealand.                        |
| all opinions here are mine | FORBESMC@MATAI.VUW.AC.NZ            |
+------------------------------------------------------------------+

ts@uwasa.fi (Timo Salmi) (03/15/91)

In article <26272@adm.brl.mil> NORM%IONAACAD.BITNET@cunyvm.cuny.edu ( Norman Walsh) writes:
>
>Given a collection of n files (.zip files in my case) is there an
>algorithm (short of brute-force) for determining the best arrangement
>of files to fit on as few disks (of a given size) as possible?
:
>Any thoughts?

Your problem is an operations research problem called cutting plane
problem.  You could conduct a literature search using these
keywords, or something near them.  I'm not personally familiar with
the exact algorithms, but I know that the task has been known and
studied for a long time in the O.R. science.  Perhaps readers of
sci.math.num-analysis would have some further information on this. 

...................................................................
Prof. Timo Salmi        
Moderating at garbo.uwasa.fi anonymous ftp archives 128.214.12.37
School of Business Studies, University of Vaasa, SF-65101, Finland
Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun

mcastle@mcs213e.cs.umr.edu (Mike Castle {Nexus}) (03/15/91)

In article <26272@adm.brl.mil> NORM%IONAACAD.BITNET@cunyvm.cuny.edu ( Norman Walsh) writes:
>My first-guess algorithm is to always place the largest file that will
>fit on a particular disk on that disk.  However, I have my suspicions
>that this is not the gauranteed best way of doing it.  Unfortunately,
>I can't think of a better way nor can I demonstrate mathematically that
>the biggest-first algorthim is the best...(or not)
>
>Any thoughts?

Amazingly enough, I happen to have to do a program that does something
very similiar to this (puttint objects into backpacks) for my data
structures class.  Of course, I haven't started on it yet.

Starting off with the biggest file IS one method (a heuristic method, as my
prof said).  The method we have to use will be to go through all the possible
packing combinations, reporting which one will work (well, we stop tracing a
method back once it becomes apparent that that combination wont work).  I
believe the complexity of O(n^2) once you apply dynamic programming in addition
to the divide and conquer method, but I'm not sure (too lazy to look it up
in my notes :-).  Maybe when I get done with that program, I'll upload the
generic code to simtel and garbo.  

Interesting.
-- 
Mike Castle (Nexus) S087891@UMRVMA.UMR.EDU (preferred)       | XEDIT: Emacs
                mcastle@mcs213k.cs.umr.edu (unix mail-YEACH!)| on a REAL
Life is like a clock:  You can work constantly, and be right | operating
all the time, or not work at all, and be right twice a day.  | system. :->

ccoprrm@prism.gatech.EDU (Robert E. Minsk) (03/15/91)

  This algoritm has a time complexity in O(n^2) and for "n" files of random
size will use approximately 0.3*sqrt(n) extra disk.  For a good discussion
of this and many more algorithms see,
Computer Algorithms
Introduction to Design and Analysis
by Sara Baase
-- 
Robert E. Minsk - Information Technology | After WREKage their is Ruins...   |
ARPA: ccoprrm@prism.gatech.edu           | Fridays at 10pm WREK Atlanta 91.1 |
uucp: ...!{allegra,amd,hplabs,seismo,ut-ngp}!gatech!prism!ccoprrm
Georgia Institute of Technology, Atlanta Georgia, 30332 

amead@s.psych.uiuc.edu (alan mead) (03/16/91)

As a practical matter, you would be better served by utilities that
"section" zip files into multiple parts of arbitrary (say 360K) sizes.

I'm not sure where such a beast can be had, but it exists (I suppose
you could write your own in TP pretty easily--just watch out for extra
DOS characters like ^Z).  I think I friend said that the version he had
wasn't 100% reliable, so use with some caution.

-alan mead : amead@s.psych.uiuc.edu

nmouawad@watmath.waterloo.edu (Naji Mouawad) (03/16/91)

In article <26272@adm.brl.mil> NORM%IONAACAD.BITNET@cunyvm.cuny.edu ( Norman Walsh) writes:
>Hello,
>
>I have a question that I cannot answer sufficiently for myself:
>
>Given a collection of n files (.zip files in my case) is there an
>algorithm (short of brute-force) for determining the best arrangement
>of files to fit on as few disks (of a given size) as possible?
>
>Suppose, for example, I have 40 files that range in size from 30 to
>400 kb.  I want to move them all to floppies but I want to do so in
>such a way as to use as few floppies as possible.  I want each file to
>be complete on one disk, i.e. I don't want to split files across
>disks.
>
>My first-guess algorithm is to always place the largest file that will
>fit on a particular disk on that disk.  However, I have my suspicions
>that this is not the gauranteed best way of doing it.  Unfortunately,
>I can't think of a better way nor can I demonstrate mathematically that
>the biggest-first algorthim is the best...(or not)
>
>Any thoughts?
>
>                                                  ndw


This is the well know Knapsack problem, where you are given
a knapsack and a set of different weights and you are asked to
fit in the knapsack as much weights as possible.

It is an NP-complete problem. Meaning that the best known
algorithm will run in time exponential in the number of weights.

If you really want to write such a program, your best
bet is dynamic programing, were you built a table to 
solve the problem for a set of increasing knapsacks using
earlier solutions to construct the current one.

Having said that, your intuitive algorithm may give
you on average a solution that differs from the optimal
one by a constant. I suggest that you stick to it, unless
you really want to squeeze the last byte out of your diskettes.

--Naji.

References:
  -Garey and Johnson: Computers and Intractability
   A guide to the Theory of NP-Completness, Freeman.

  -Aho Hopcroft and Ullman, Data Structures and Algorithms,
   Addison-Wesley.
-- 
     -------------------------------------------------------------------
    | Naji Mouawad  |          nmouawad@watmath.waterloo.edu            |
    |  University   |---------------------------------------------------|
    | Of Waterloo   |   "The Stranger in us is our most familiar Self"  |

hampton@tigger.Colorado.EDU (Jeff Hampton) (03/16/91)

Expires:  04/01/91

References: <26272@adm.brl.mil>
Sender: Jeff Hampton

Followup-To: 
Distribution: usa
Organization: University of Colorado, Boulder (dept. of Computer Science)

Keywords: algorithms


I have been thinking of a similar type of problem myself. The problem however is
an NP-complete problem. If you use the strategy of first fit, the worst-case
time complexity is on the order of n^2 and produces good results. Below is an
example algorithm for nonincreasing first fit(taken from Computer Algorithms
by Sara Baase, 2nd ED. p.339)

input: S=(s1,...,sn), where 0<si<=1 for 1<=i<=n
output: An array bin where for 1<=i<=n, bin[i] is the number of the bin into
        which si is placed.

procedure NIFF (S:RealArray;n:integer;var bin:IntegerArray);
var used: array[1..n] of real;
    {used[j] is the amount of space in bin j already used up.}
    i,j:integer;
BEGIN
  {Sort S into nonincreasing order.}
  for j:=1 to n do used[j]:=0 end;
  for i:=1 to n do
    {Look for a bin in which si fits.}
     j:=1;
     while used[j]+si > 1 do j:=j+1;
     bin[i]:=j;
     used[j]:=used[j]+si;
  end; { for }
end. { NIFF }

The sort can be done in n*lg(n), and j is incremented while searching for an
appropriate bin at most n(n-1)/2 times. All the other instructions are executed
at most n times, so the worst-case time complexity is no more than n^2.
The expected number of extra bins is at most roughly 0.3*sqrt(n). Other
strategies include best-fit, and next-fit. Best-fit if the si or sorted in
nonincreasing order, it works about as well as first-fit. Next fit is faster
but will create extra bins. Thus are some things to think about when
considering your problem. I hope this helps somewhat.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Jeff Hampton | email: hampton@tigger.colorado.edu | University of Colorado +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

phys169@csc.canterbury.ac.nz (03/19/91)

In article <26272@adm.brl.mil>, NORM%IONAACAD.BITNET@cunyvm.cuny.edu ( Norman Walsh) writes:
> 
> Given a collection of n files (.zip files in my case) is there an
> algorithm (short of brute-force) for determining the best arrangement
> of files to fit on as few disks (of a given size) as possible?

As others have said, this can take a long time to solve. I have got a program
to do exactly this, and you should be able to get it via anonymous ftp to
cantva.canterbury.ac.nz [132.181.30.3] in the pc subdirectory. It outputs a
series of COPY and PAUSE commands to the standard output, and has various
levels of optimisation. Sorry, the source isn't available at the moment.
> 
> My first-guess algorithm is to always place the largest file that will
> fit on a particular disk on that disk.  However, I have my suspicions
> that this is not the gauranteed best way of doing it. 

That's what my program does with level zero optimisation does. The default 
level (2) of optimisation usually takes a bit longer (varies a lot with the data,
but 50% longer is typical), i.e. not as long as the brute-force method but often
enough it is better than your (DFF) method.  

There is little benefit in putting too much computational effort into finding
the optimum; you might only save one diskette. What is worth doing is using
a backup program that can split files across diskettes.  I notice that the
PKZIP file format specifications allow for multiple-volume backups, and wonder
why this hasn't been implemented yet. 

Mark Aitchison, Physics, University of Canterbury, New Zealand.