[comp.misc] Binary data formats

gg10@prism.gatech.EDU (Galloway, Greg) (12/03/89)

Problem:

We have code and data which is developed on a set of MicroVAXes and then
ported to other larger and faster (non-VAX) machines to do the processing.
The most difficult task is converting all the necessary files to ASCII,
sending them to the larger machine and then converting them back to binary.
This usually means having well over a dozen file conversion programs.
Data consists of either characters, integers, booleans, or floating point.
The data consists of images (real and byte), facet information for models,
topographical height fields and associated color and infared files, etc.

* Boolean do not pose a problem when sending between machines as long as
  they are stored as a single (8-bit) byte with 0 as false and non-0 as true.

* Characters are only 8-bit bytes on all machines, far as I know, and 
  are in either ASCII or EBCDIC format.

* Integer data (to my knowledge) only consists of two types:

  - Big-endian, or Intel-type: Most significant byte first (16 or 32 bit)
  - Little-endian, or Motorola-type: Least significant byte first

* Floating point data is the greatest obstacle.  Almost all machines that
  I have been able to find use ANSI-IEEE Std (1975) for floating point.
  The only exceptions are the VAX and CONVEX (the CONVEX has IEEE as 
  optional hardware).

Questions:

* Does anyone know of any other formats for characters, integers or 
  floating point?

* Are there any binary file formats in existance which attempt to attack
  this problem?

I am in the process of attempting to develop a public-domain tag-based
binary file format which will tag the datatype with the data so that the
application program can read the data and the type of machine on which
it was created and convert as necessary to be useable.  This file format
will be similar to the Tiff file format developed by Microsoft and Aldus
for use in Desktop publishing but will support floating point as well
as integer data.

I have heard of the IFF format from Amiga.  Does any have a write-up of
this standard?  Does it support various formats between various machines?

If anyone has any comments or interest I would like to hear from them.

Please reply by E-mail (I don't subscribe to all these newgroups).

Thanks,

Greg Galloway
Georgia Tech Research Institute
Georgia Institute of Technology
gg10@prism.gatech.edu
(404)894-3357

salem@mandala.think.com (Jim Salem) (12/04/89)

In article <4021@hydra.gatech.EDU> gg10@prism.gatech.EDU (Galloway, Greg) writes:

   Problem:

   We have code and data which is developed on a set of MicroVAXes and then
   ported to other larger and faster (non-VAX) machines to do the processing.

   * Floating point data is the greatest obstacle.  Almost all machines that
     I have been able to find use ANSI-IEEE Std (1975) for floating point.
     The only exceptions are the VAX and CONVEX (the CONVEX has IEEE as 
     optional hardware).

   Questions:

   * Does anyone know of any other formats for characters, integers or 
     floating point?
Cray also has their own float format.


   * Are there any binary file formats in existance which attempt to attack
     this problem?
Yes.  Binary floating point and integer data is handled quite well by
either the HDF and CDF standards.  I've included info on them below.

   I am in the process of attempting to develop a public-domain tag-based
   binary file format which will tag the datatype with the data so that the
   application program can read the data and the type of machine on which
   it was created and convert as necessary to be useable.  

Ack !  I was beginning to do this myself before I found HDF and CDF.  HDF
is a tagged format (CDF may be one as well).  I'm virtually certain that
either HDF or CDF will solve your needs, plus they are free and best of
all, supported.

The graphics industry has already lost due to the plethora of virtually
identical but incompatible image file formats.  Two floating point file
formats is already one too many.  I hope you help push towards 
standardization by supporting one of these formats.

------------------------------------------------------------------------------
HDF Info  

HDF is an extensible tagged file format for storing scientific data.  The
current data types supported are 8 and 24 bit images and floating point
data.  The data may be annotated with labels and other useful information.
I have used this extensively and am happy with it.

Here is an excerpt from their README :-
--------------------------------------------------
NCSA HDF is the Hierarchical Data Format, a standard file format 
developed by NCSA.  For more information about HDF, see the 
January/February 1989 NCSA Data Link article, the document "NCSA HDF", and 
the document "HDF  Specification".

This version of HDF runs on CRAYs running UNICOS, ALLIANTs, SUNs and 
IRIS 4D machines running Unix, MACs running MacOS, VAXen running VMS and 
PCs running MS/DOS.

Compilation of these programs produces a library of HDF routines that 
can be called from either FORTRAN or C programs.

You can FTP the latest version from zaphod.ncsa.uiuc.edu (128.174.20.50).

If you have any questions, problems or suggestions, you can contact us 
via Email at mfolk@ncsa.uiuc.edu or likkai@ncsa.uiuc.edu,  or by writing 
to Mike Folk, Software Development, NCSA, 605 East Springfield Ave., 
Champaign, IL 61820, or call 217 244 0647.

------------------------------------------------------------------------------
CDF Info

It was originally developed at NASA Goddard and now seems to be largely
supported by UCAR Unidata (related to the Nat. Center for Atmospheric
Research).  I have not had very much experience with it but it looks quite
good. 

Here is an excerpt from their README :-
--------------------------------------------------
The purpose of the Network Common Data Form (netCDF) interface is to
allow you to create, access, and share scientific data in a form that
is self-describing and network-transparent.  "Self-describing" means
that a file includes information defining the data it contains.
"Network-transparent" means that a file is represented in a form
that can be accessed by computers with different ways of storing
integers, characters, and floating-point numbers.  Using the netCDF
interface for creating new scientific data sets can improve the
accessibility of the data.  Using the netCDF interface in new software
for scientific data access, management, analysis, and display can
improve the reusability of the software for other data sets and by
other users.

You can obtain a copy of the latest version of netCDF software using
anonymous FTP.  For UNIX systems, a compressed tar file can be
accessed (in binary mode) from the file netCDF.tar.Z in the anonymous
FTP directory of unidata.ucar.edu (128.117.140.3).  VMS sites can get
a backup saveset of the same software from the anonymous FTP directory
of laurel.ucar.edu (128.117.140.6).

We welcome comments or suggestions about the netCDF data access
interface and utilities.  Please direct questions and comments to
russ@unidata.ucar.edu, or write to Russ Rew, UCAR Unidata Project
Center, P.O. Box 3000, Boulder, Colorado 80307-3000.
------------------------------------------------------------------------------

Good luck !

-- jim
Jim Salem (salem@think.com)
Thinking Machines Corporation, Cambridge MA

--

-- Jim Salem
   Thinking Machines Corporation, Cambridge, MA   617-876-1111
   salem@think.com
   mit-eddie!think!salem, rutgers!think!salem, harvard!think!salem