[comp.parallel] More on very large data sets

eugene@eos.arc.nasa.gov (Eugene Miya) (03/24/89)

I was asked a couple of more questions on the nature of large data sets.

The people who work on large data sets can tell you the issues:
1) that access time is sometimes more important than volume, so
you can't JUST have volume. 2) cost is obviously a factor, and there
are several types of cost. 3) removing old data.  Scientists
never want to throw data away.  Ever.  Witness the O3 hole.  Thought of
as an anomaly at first.

Some problems I never see mentioned: the incompatibility of formats of
analog recording devices to digital tape systems.

When I first started working in the "real" world, I joined a thing
called the Radar Science and Applications Group at JPL.  I walked into
a room which had racks of 9 track tape.  It was pointed out to me that
these were tapes which had flown to the Moon on Apollo 17.  There had been
money for processing only 1% of the data.  The tapes (recorded in 1972)
were 200 BPI.  JPL also had 7 track 556 BPI tapes (thousands).  I worked
two summers in an R&D group for a company which made tape heads.  I had
never heard of 556 (but then we were doing 6250 in 1977 and looking
at 50K BPI problems).  There was no JPL money to convert 200 BPI to
anything more recent.

I can imagine writing a special program for a 9 track tape to read 200
BPI bits on a 6250 drive.  Gawd, what a nightmare!  Then getting a
note: "The project has been cancelled."

I do think there are ways to grab large volumes of N dimensional
data, but its not clear to me that existing computers (even the
spiffy new database machines can do this for arbitrary N).  Consider
a 3 or 4 dimensional cube.  You want any row, column, surface, or
subcube as fast as any other.  Instead we map all this into a single
dimensional address space.  Sutherland wrote about this problem
in a display memory paper in TOGs some years back.  He BTW was one
of the inspirations to the CM.  Too bad he does not make
supercomputers like he makes graphics systems (yet, anyway).

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  		Live free or die. <-JPL management won't understand this.