[comp.sys.cdc] Summary of responses, reading a CDC tape

dave@seas.gwu.edu (David M. Owczarek) (05/08/91)

The question:

I recently was asked if I could read two CDC tapes on our systems here.
We have a Sun 3/480 (SunOS 4.1), Sun 4-280 (SunOS 4.1.1), HP 9000-835SE
(HP-UX 3.1B), and Alliant FX/8 (Concentrix 5.6.00).  I was also told that
a CDC tape uses either an IBM CMS format or something similar.  I was able
to get a program which reads IBM tapes from uunet.uu.net, but this 
program tells me the tape is not IBM format.

Is there a kind sole out there who might be able to suggest:

1) A source for code to read the tape (anonymous ftp)?
2) What the tape format looks like?

-Dave

---summary of responses

From robp@anubis.network.com Mon Apr 29 15:45:04 1991

  A primer on CDC formats:

  Your tape could be any one of these formats:  
  I = (NOS) "internal" format
  SI = "system internal" format (NOS,NOS/BE)
  LB = "large block" format (VSOS)
  V = "variable" format (VSOS)

  There are even older ones called, if I remember correctly, E and L
  format.  L format tapes are called "stranger" tapes on NOS and BE.

  Your best bet is read the tape using "dd" or some other utility
  that reads the (tape) device directly, and wish for luck.

From pwh@bradley.bradley.edu Mon Apr 29 16:20:02 1991

  My bet is your biggest problem is going to be finding the density.

  Our CDC 932 can write tapes at a variety of densities, from 1600bpi
  to 6250bpi.  Most unix machines I'm familiar with only handle 1600bpi
  from 9-track.

From cc100aa@prism.gatech.edu Mon Apr 29 17:22:48 1991

  Golly, it could be most anything.  There are three CDC operating
  systems: NOS/BE, NOS, and NOS/VE, each with distinct formats.  All can
  create tapes in reasonable format for interchange, but the default,
  "internal" formats may be difficult to untangle: NOS/BE and NOS use
  6-bit "display code" characters; NOS/VE is ASCII-based.

  If you could determine the physical block size on the tape, that would
  help narrow it down.  For example, NOS "I" format has 512 60-bit words
  plus a 48-bit trailer per full block.  That would show as 3846 8-bit
  bytes per block.

  We're working on a CDC-to-UNIX tape conversion program, but it's not
  quite done yet.  We'd have to run it through our legal eagles before we
  could distribute it, anyway.

From rrr@duck.svl.cdc.com Mon Apr 29 17:44:42 1991

  I fear you will need to get some more information. There are
  a multitude of tape formats available to users on CDC systems
  (just as on other systems). The basic order of identification
  for an unknown tape goes something like:

  - 7 track or 9-track (fortunately most 7-track are gone)
  - Density (800, 1600, 6250)
  - Labeled (if so ANSI or non-standard) or unlableled
  - Parity of information (usually reflects binary versus ASCII data)

  After you figure this much out, you are ready to figure out the
  data contents. Here the best bet is to find out what entity wrote
  the tape (e.g, a Fortran program, a copy command, an archiving 
  utility, a Cobol program, etc.). This also helps define the data
  and record structure one might have to cope with. 

  <general information about examining data format deleted>

  Sorry I can't give you an instant solution but I fear there is
  no simple solution with so little information.

From fuller@nye.nscee.edu Mon Apr 29 23:50:47 1991

  Check SIMTEL20 (26.2.0.74, anon ftp) ... I picked up a tool called "magtape"
  (or something like that) that purports to read NOS SI ("SCOPE Internal")
  format tapes on a Unix box.  NOTE: I've built it, but haven't tried it 
  yet (it's only been a year or so ... I'll get to it!).  I seem to recall 
  the name of the directory that it's in as "PD2:<UNIX-TAPES>".

From fty@sunvis.rtpnc.epa.gov Tue Apr 30 09:12:37 1991

  Hmmmm.  We used to have trouble reading CDC tapes on CDC machines ...
 
  Was it written by NOS or NOS/BE?  Is is binary or text?  I think the only
  way you're going to be able to deal with it is to dd it off the tape
  to disk and try and figure the data type and blocking factor.  Oh, and
  it may be labeled/unlabeled if it's labeled just skip over the first
  file on the tape.

From billm@prism.gatech.edu Tue Apr 30 09:50:47 1991

   It's going to depend on whether the tape was written as a backup tape or
 deliberately created to be read on other systems.  If the former is true, it
 will depend on which operating system (NOS or NOS/VE) the machine was running.
 In the latter case, it depends on the parameters that were specified when
 writing out the tape.
   If the tape was written for reading off of a cdc machine, then you should
 just be able to use dd in some form or another, once you find out what the
 block size, record length, etc. are, asuming the person who wrote the tape out
 knew what they were doing.
   Hope this helps.

From ddh@hare.udev.cdc.com Tue Apr 30 10:02:01 1991
 
  Give me a call, <tel. no. deleted>, and I'll see if I can at least narrow
  down the field of things you have to look at.  However, I can assure
  you that if you're looking at system dump tapes, you're 92% SOL.

From adams@hare.udev.cdc.com Tue Apr 30 14:47:07 1991

  Dave,
  1)  I am not a CDC tape expert!
  2)  What I do know:

  There are several operating systems, and a number of backup utilities that
  might be involved on any particular tape.  My usual starting point is any
  written information on the tape reels, or paperwork attached to the tapes.
  If I see something like "mt$1600", it would indicate NOS/VE operating system;
  "D=GE F=I LB=KL" sends me to a NOS operating system machine.  "DUMPPF" is
  a backup utility used by NOS.  The available list is pretty long.
  
  No results promised, but, if you send me a few lines of whatever printing
  you can find, I'll look and/or run it past my sources.

From trevc@mips.com Tue Apr 30 18:38:01 1991

  There is a package called MAG that will

  " reads a Control Data Cyber system tape (SI-format)
       as written under NOS/BE or SCOPE by COPY, COPYBF or COPYBR.
       Each Cyber record is written to a separate file. The EOR-
       levels appear on standard output..."

  and various other things.

(Dave's note: This looks like the same as the package above, so I'll
 hold off on giving out the contact information)

From dmb@bigd.cray.com Wed May  1 13:02:09 1991
David Bowen

    CDC Cyber systems had a bunch of tape formats and without more details
  you are probably in for a lot of trial and error.  The first question is
  whether the tape was written by a 60-bit machine or a 64-bit machine.  I
  believe the new machines use ASCII internally, but the old Cybers used a
  6-bit console display code packed ten characters per word.  For the old
  machines there were four formats.  Two (S and L) allowed any any sort of
  blocking the user chose.  The other two (SI and KI) were very similar.
  Block size was 5120 characters (or was it 1280) with short blocks having
  an extra 48 bit trailer.  CDC had a real elaborate file structure hierarchy
  and the 48 bits provided information on that.  The console display code
  mapped A-Z to 1-26, 0-9 to 27-36 and the special characters went after that.
  The zero character was used as an end of line character under some systems,
  while others used two zero characters at the end of the word for end of line
  and a single zero character as a colon.  My memory of further details has
  faded from disuse.

From mimsy!widener.UUCP!emory!dragon!cts@uunet.UU.NET Sun May  5 06:16:45 1991
Charles Smith
cts@dragon.uucp  or  cts%dragon@mathcs.emory.edu

  CDC systems could create a lot of tape formats.  If its a CDC
 "backup" (PFDUMP) tape, it can be just about impossible to read
 without a pretty detailed knowlege of the internals of the CDC
  system.

  I have some programs that handle some of the CDC formats running under
  VMS.  If you can either get some additional info on the format, or 
  send me a bit-for-bit copy of the tape, I can take a look at it.

From: woolsey@netcom.COM (Jeff Woolsey)

  The hardware you listed is capable of it.  The software, however....

  CDC machines could be told to write tapes that other machines were
  likely to be able to read (such as blocked formats).  If the user
  didn't take the trouble to write such formats, you got CDC
  internal-format tapes.

  The former, anybody can read.  The latter, I can read.
  I can do just about anything with CDC-written tapes on Suns, provided
  the tapes are old enough.  If they're new, you don't need me.

---
Thanks also to cain@geomag.gly.fsu.edu and sbk@cbnewsl.att.com for
providing additional information.

I was able to get blocking information and tape density through a
few phone calls.  I now have 3 tapes to read, not two, although it
turns out that the two important ones were written in ascii, so 
I've gotten good results with dd.  My only unresolved problem is
that there are no CR/LF's in the data, so the first file I read had
823 words on one line.  If I can't find a reasonable way to break the
lines up it will become a problem for the guy who wants the data, as
I am not going to plow through gazillions of files sticking in new lines.

Thanks for your support--I've gotten my feet wet, and some results
with your help.  Thanks also to those who offered to help me 
personally with this issue, it was and unexpected, pleasant surprise.

-Dave-- 
----------------------------------------------------------------------------
Dave Owczarek, Operations Team  dave@seas.gwu.edu or uunet!seas.gwu.edu!dave
The George Washington University Engineering Computing Facility,  Wash. D.C.
----------------------------------------------------------------------------

berger@clio.sts.uiuc.edu (Mike Berger) (05/12/91)

dave@seas.gwu.edu (David M. Owczarek) writes:
>I was able to get blocking information and tape density through a
>few phone calls.  I now have 3 tapes to read, not two, although it
>turns out that the two important ones were written in ascii, so 
>I've gotten good results with dd.  My only unresolved problem is
>that there are no CR/LF's in the data, so the first file I read had
>823 words on one line.  If I can't find a reasonable way to break the
>lines up it will become a problem for the guy who wants the data, as
>I am not going to plow through gazillions of files sticking in new lines.
*----
Did anybody bother to mention that CDC denotes end-of-line with 12 bits
of 0 at the end of a 60 bit word?  I believe Compass even had a command
to detect that.
--
	Mike Berger
	Department of Statistics, University of Illinois
	AT&TNET     217-244-6067
	Internet    berger@atropa.stat.uiuc.edu

woolsey@netcom.COM (Jeff Woolsey) (05/14/91)

In article <1991May11.211134.21449@ux1.cso.uiuc.edu> berger@clio.sts.uiuc.edu (Mike Berger) writes:
>Did anybody bother to mention that CDC denotes end-of-line with 12 bits
>of 0 at the end of a 60 bit word?  I believe Compass even had a command
>to detect that.

Well, it's actually between 12 and 66 bits of zero right-justified in 
a word or two.  And if you're in the 64-character set, you get to worry
about preserving colons at the ends of lines...  What fun!
-- 
-- 
Jeff Woolsey	Microtec Research, Inc	+1 408 980-1300
woolsey@netcom.COM	...!amdcad!sun0!woolsey