[comp.sys.apollo] more SR10 questions

krowitz@RICHTER.MIT.EDU (David Krowitz) (12/02/88)

Can a node running SR9.7 do backups for nodes running SR10 that are on
the same ringnet? The SR10 transition class I took could not come upt
with a conclusive answer. SR9.7 nodes are supposed to be able to have
file access to SR10 nodes and vice-versa, but the class supervisor wasn't
certain whether the SR9.7 version of WBAK/RBAK could handle the SR10 ACL's
correctly, and he could not find out whether there was a version of the
SR10 WBAK/RBAK that had been compiled to run under SR9.7. I have not
found anything in the SR10 release notes (as far as I've read so far) that
gives me any hints, my tape drive is attached to a DSP80 which can not be
upgraded to run SR10. Can anyone give me a definitive answer as to whether
I'll be able to run backups on my SR10 master node?


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

donp@CAEN.ENGIN.UMICH.EDU (Don Peacock) (12/02/88)

	Can a node running SR9.7 do backups for nodes running SR10 that are on
	the same ringnet? The SR10 transition class I took could not come upt
	with a conclusive answer. SR9.7 nodes are supposed to be able to have
	file access to SR10 nodes and vice-versa, but the class supervisor wasn't
	certain whether the SR9.7 version of WBAK/RBAK could handle the SR10 ACL's
	correctly, and he could not find out whether there was a version of the
	SR10 WBAK/RBAK that had been compiled to run under SR9.7. I have not
	found anything in the SR10 release notes (as far as I've read so far) that
	gives me any hints, my tape drive is attached to a DSP80 which can not be
	upgraded to run SR10. Can anyone give me a definitive answer as to whether
	I'll be able to run backups on my SR10 master node?
	 
	 
	 -- David Krowitz

	We have 400 sr97 nodes and maybe 6-10 sr10 nodes.  Our backups are done
exclusively on sr97 and we have noticed NO problems.

Don Peacock
University of Michigan
donp@caen.engin.umich.edu

lnz@LUCID.COM (Leonard N. Zubkoff) (12/02/88)

I have been having backups done by a SR9.7 node of my SR10 nodes for several
months now without any problems.  Should a restore be necessary, however, the
ACLs created on the SR10 node would not be quite correct; the SR10 required
entries for person, group, and organization would not be set and the SR9.7 tape
ACL would show up as an SR10 extended ACL.

My understanding is that the SR9.7 node only "sees" SR9.7 format ACLs; when 9.7
accesses 10 or vice versa appropriate mappings take place so each node is happy
examining the others ACLs.

The bottom line is that the data will be preserved just fine, but the ACL
information will need some fixup on a restore.

	Leonard

wicinski@nrl-cmf.UUCP (Tim Wicinski) (12/02/88)

Will the 4.3 that Apollo ships with SR10 include "dump" and "rdump"? 

Will they ever fix their compiler (re NFS) or will we be forced to abandon
them for other vendors?

remember, their "4.2" was anyting but...

tim

jec@iuvax.cs.indiana.edu (James E. Conley) (12/02/88)

	You'll have to abandon them for another vendor... SR10 does not include
dump, restore, or their remote versions.  This makes some sense if you allow
for Apollos different file system formats, but it is still a pain since wbak
and rbak are pretty primative (and slow).

	I'm still waiting for Mach (not from Apollo, I hope!).

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 855-7729
 U   I   U		U.S. Mail:  James E. Conley
 U   I   U			    Indiana University 
  UUUIUUU			    Dept. of Computer Science
     I				    004 Lindley Hall
    III				    Bloomington, IN. 47405

dclemans.falcon@mntgfx.mentor.com (Dave Clemans) (12/03/88)

From article <152@nrl-cmf.UUCP>, by wicinski@nrl-cmf.UUCP (Tim Wicinski):
> Will they ever fix their compiler (re NFS) or will we be forced to abandon
> them for other vendors?
> 

Presumably you are talking about the ability to run Apollo binaries stored
on non-Apollo disks via NFS (or something similar)

The problem is not the compiler, the "problem" is the high degree of intelligence
in the program loader.  In contrast to typical Unix systems, the Apollo loader
just pages in the program from the disk on the remote node.  The system is
unable to do virtual memory paging using NFS, thus the program can't be loaded.
Other problems involve file typing (something that doesn't exist in NFS).

The only way I can think of for this ability to be implemented is:

    if while going through directories looking for a program to execute
    you cross a NFS boundary,

        don't check the Apollo file type; just assume that it is a COFF format file

        create a temporary file on the local node; copy the file from the remote
        node to the local temporary file

        then execute the program from the temporary file, arranging to delete
        the file when the programe exits.

This would let you execute programs from NFS disks, though at a performance cost
proportional to the size of the program.

dgc

krowitz@RICHTER.MIT.EDU (David Krowitz) (12/03/88)

A DSP80 can only have 1.5MB of memory on it, which is not enough
to run SR10 (which won't boot unless you have 2MB or more, according
to my instructions). The DSP80A and the DSP90 can have 3 MB or
(of) memory, so they are ok.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

achille@cernvax.UUCP (achille) (12/04/88)

In article <15471@iuvax.cs.indiana.edu> jec@iuvax.UUCP (James E. Conley) writes:
>
>	You'll have to abandon them for another vendor... SR10 does not include
>dump, restore, or their remote versions.  This makes some sense if you allow
>for Apollos different file system formats, but it is still a pain since wbak
>and rbak are pretty primative (and slow).
>
Actually, I'm using almost daily both (dump+restore) and (r/wbak). Wheter I agree
that w/rbak are painfully slow, they actually allow you to save AND restore your
files in a PREDICTABLE way. The same is not true for dump and restore.

One of the biggest problems with dump is that it does not produce a list of saved
files, how do you know a few weeks after the backup, which file is on which tape ?
Also, dump stores files in 'random' order on tape (I think is disk block ordering),
if you want to restore 2 files that were in the same directory, chances are they are
NOT in the same tape (yes, we have dumps that span up to 20 IBM 3480 cartridges).
Can you imagine the pleasure of mounting 20 cartridges to find out the file you
want on the last tape ? With the wbak output, I can just pick up the right tape
and read only that one !

I care about the backup speed, but I also think that speed is not everything !
Your information should not only be saved, it should also be retrievable !
It should be a easy task to do that, operators should be able to do it !

Do you know that after restoring a full dump, you are supposed to run a.s.a.p
another full dump ? that's what the Cray doc says, not a decent explanation
about this requirement. Why ? 

I think dump and restore are pretty awkward pieces of software, they are just fast,
I would stop using them tomorrow if had w/rbak on the Cray.

Probably Apollo hasn't produced very remarkable software, but w/rbak are real
neat (as functionality if not as speed) and that should be said.

Achille Petrilli
Cray & PWS Operations

pha@CAEN.ENGIN.UMICH.EDU (Paul H. Anderson) (12/04/88)

wbak and rbak are sufficient to daily backup a ring of 450 nodes with
around 80 gigs of disk storage.  rbak and wbak are hardly "primitive"
slow, yes, but primitive, no way.

Paul Anderson
CAEN

jec@IUVAX.CS.INDIANA.EDU (James E. Conley) (12/05/88)

	Well, by primative I meant that they lacked some very useful features
of dump and restore, namely multiple levels of backup and tape retries if you
put a bad tape on.  It is better than nothing, but I'd prefer dump and restore
any day.  Not to mention the ability to do dumps on other machines (A VAX to
an Alliant, not just from Apollo to Apollo) using rdump and rrestore.  Also,
dump dates aren't updated until the backup finishes.

	And of course, wbak/rbak are slow.

	I am curious though, what method you use to backup all that data.  We
have only about 4GB here and it takes several hours to do an incremental of
that much data even when it only really writes about 1/5 of a tape.  I'll admit
that I'm not fluent in AEGIS, but since they are supposed to run UNIX I would
hope that I wouldn't need to.

krowitz@RICHTER.MIT.EDU (David Krowitz) (12/05/88)

The reason that incremental backups with WBAK are so slow is that you wind
up having to touch every file on the disk to check the date/time it was
last modified. Unfortunately, the Apollo file system does not store the
DTM in the directory entry of the file, so WBAK must open each file in
addition to opening and reading the parent directory. 

One way in which backups can be speed up is the method used by Workstations
Solutions' backup product. They start clients on several nodes which all
feed data back to a server which writes the tape. Since the clients run
independently of each other they can process several disks simultaneously
and send the server buffers of data which have already been formatted for
the backup tape. The only drawback to this approach is that you wind up 
with files from multiple disks all interleaved in a single backup file on
the tape rather than in seperate backups. It is easier to retrieve files
from a backup when you know for certain which tape it is on. Incremental
backups, however, are frequently done which several disks all on a single
tape, in which case the method used by Workstation Solutions gives the
same results a whole lot faster.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

markley@celece.ucsd.edu (Mike Markley) (12/06/88)

In article <8812051455.AA06261@richter.mit.edu> krowitz@RICHTER.MIT.EDU (David Krowitz) writes:
>One way in which backups can be speed up is the method used by Workstations
>Solutions' backup product. They start clients on several nodes which all
>feed data back to a server which writes the tape. Since the clients run
>independently of each other they can process several disks simultaneously
>and send the server buffers of data which have already been formatted for
>the backup tape. The only drawback to this approach is that you wind up 
>with files from multiple disks all interleaved in a single backup file on
>the tape rather than in seperate backups. It is easier to retrieve files
>from a backup when you know for certain which tape it is on. Incremental
>backups, however, are frequently done which several disks all on a single
>tape, in which case the method used by Workstation Solutions gives the
>same results a whole lot faster.
>
>
> -- David Krowitz
>

I have read in the SR10 documentation that rbak/wbak will
write to a file so it would be possible to create a backup
directory on every node and then run wbak as a server that
writes to the backup directory.  You could set it up so
that wbak ran at some odd hour and then every morning it would
only be necessary to copy the backup directories to tape.
This would be faster since you could always do a full backup
and then wbak would not have to check the dates on the files.
This is the strategy that I plan to impliment when I upgrade
to SR10 some time in the future.

Mike Markley
markley@celece.ucsd.edu

jec@iuvax.cs.indiana.edu (James E. Conley) (12/06/88)

	That sounds like an excellent idea (backing up to files).  I think it
will probably solve most of my objections about speed of backups (since doing
a distributed search will undoubtably speed things up), but there is still
the problem that wbak writes slowly to the tape.  I believe 4.3 dump speed
should be achievable for a single file as far as keeping a streaming tape
drive busy to at least occassionally stream.

	I guess with a fast network you could always just rcp these files to
some machine with a faster tape system.

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 855-7729
 U   I   U		U.S. Mail:  James E. Conley
 U   I   U			    Indiana University 
  UUUIUUU			    Dept. of Computer Science
     I				    004 Lindley Hall
    III				    Bloomington, IN. 47405

mkhaw@teknowledge-vaxc.ARPA (Mike Khaw) (12/06/88)

<15567@iuvax.cs.indiana.edu>, by jec@iuvax.cs.indiana.edu (James E. Conley):
[regarding backup ...]
> 	I guess with a fast network you could always just rcp these files to
> some machine with a faster tape system.

But if SR10 behaves like SR9.x, when you rcp back from the remote system/
tape drive, certain files like binary executables won't have their file
type set to "obj" anymore, so you'll have /com/obty them.

Mike Khaw
-- 
internet: mkhaw@teknowledge.arpa
uucp:	  {uunet|sun|ucbvax|decwrl|ames|hplabs}!mkhaw%teknowledge.arpa
hardcopy: Teknowledge Inc, 1850 Embarcadero Rd, POB 10119, Palo Alto, CA 94303

donp@CAEN.ENGIN.UMICH.EDU (Don Peacock) (12/06/88)

From: krowitz@richter.MIT.EDU
Subject: Re: more SR10 questions

One way in which backups can be speed up is the method used by Workstations
Solutions' backup product. They start clients on several nodes which all
feed data back to a server which writes the tape. Since the clients run
independently of each other they can process several disks simultaneously
and send the server buffers of data which have already been formatted for
the backup tape. The only drawback to this approach is that you wind up
with files from multiple disks all interleaved in a single backup file on
the tape rather than in seperate backups. It is easier to retrieve files
from a backup when you know for certain which tape it is on. Incremental
backups, however, are frequently done which several disks all on a single
tape, in which case the method used by Workstation Solutions gives the
same results a whole lot faster.

As Paul Anderson (pha@caen.engin.umich.edu) stated, we have 450 Apollos
and do daily incrementals (around 2 gigs/day) and weekly full backups. We use
some home grown software to keep up with this mess and I will quickly try to explain
how it works. I have left out most of the specifics but it really does work, and
better than I had expected while designing it.

Incrementals:

1) We have a bank of nodes (6 dn4000's with 329meg formatted disks) which are
used for storing the incremental trees. (more about this later)
2) Each node tries to do an incremental backup every half hour thru cron, What it
actually tries to do is a cpt of the appropriate trees to an incremetal node,
this is where the date time stamp is checked.
3) We have a locking mechanism for limiting the number of nodes cpt'ing to an
incremental node at one time. (currently this is set at 6) Simple math
tells us that this gives us a maximum of 36 concurent cpt's at any given time.
The incremental code also watchs the incremental nodes disk space and aborts
if/when it thinks it can no longer finish and leave a certain amount of disk
space on the incremental node (this is a buffer zone which is needed when the
incremental node later goes to tape with its data, currently 10Megs).
4) Currently backup operators dump these incremental nodes to magtape and by around
10 or 11:00 am we have our incrementals for the day done and to tape. We
currently are completeing all but 6-10 nodes /day and these nodes are not
getting done due to hardware problems etc.
5) We can easily monitor which disks have not done their incrementals because it mails
is a list every morning of the disks that have not done their backups for three
consecutive days. (this morning there was 11 nodes in this category for one reason
or another) There is a backup person responsible for checking these problem nodes
out each morning and responding to the rest of the backup group with the cause and status.

6) we use our full backup code to clean the incremental disks off each day and automatically
deleteing the incremental trees once they are safely put to tape and logged.
Our logging automatically keeps listings of wbaks and creates the labels for the
tapes, So our restore program (rest_req) can easily let a backup operator know
which tapes need mounted etc.

7) We have bought a couple 8mm tapes and are going to automate our incrmentals further. By
allowing the backup operators to simple swap tapes once a day for incrementals, instead
of using 10-20 mag tapes each day.

Full backups

1) A Network wide logging scheme is used (similiar to incrementals) so we can keep track
of ANY node that has not had a Full backup in the last 7 days.
2) To run a full backup a backup operator simply crp's onto a node with a magtape and
runs our backup code. He then simply follows the instructions (ie load tape,
swap tape, label tape with xxx.xx... etc).
3) ALL the logging etc is taken care of automatically.

Now that I have tried to explain how we do our backups I would like to make some
comments that don't necessarily relate to backups but relate to this news group
that I feel sure many of you will take issue with.
1) Although wbak and rbak are slow it is because of what they do and how they intelligently
interact with a VERY ROBUST network file system (NOT NFS).
2) The tools for manageing a LARGE network of Apollos are NOT there, but the underlying
capabilities for creating these tools are available and taken for granted by
the majority of those people that constantly flame Apollo for not being a vanilla
Unix machine. (thank GOD its not because we couldn't keep 450 vanilla Unix
machines happy without at least ten times the effort that it takes us to manage the
APOLLOS)
3) I dont agree with everything Apollo has done over the past couple of years but I do know that
my job is easier because of their capabilities and Apollo's efforts to not be pulled
Backwards into the stone ages by a group of people worshipping an operating system that
was never intended for anything more than a stand alone machine. I do like the Unix
interface but this beauty is only skin deep and needs a STRONG underlying structure
to give us the ability to manage an entire network as a single machine.

Don Peacock
University of Michigan
donp@caen.engin.umich.edu

crgabb@sdrc.UUCP (Rob Gabbard) (12/06/88)

In article <5627@sdcsvax.UCSD.EDU>, markley@celece.ucsd.edu (Mike Markley) writes:
> In article <8812051455.AA06261@richter.mit.edu> krowitz@RICHTER.MIT.EDU (David Krowitz) writes:
> I have read in the SR10 documentation that rbak/wbak will
> write to a file so it would be possible to create a backup
> directory on every node and then run wbak as a server that
> writes to the backup directory.  You could set it up so

This would be a nice solution except for the fact that you would have to
have as much free space on each disk as you have used space (for a complete)
or be sure that you have as much free space as what has been changed since
the last backup (for an incremental).

At the ADUS conference I sat through a VERY interesting talk on NBS, the Apollo
Network Backup System. With NBS Apollo seems to be addressing all of these
backup complaints and much more.  It has its own scheduling language and is
designed to live in a heterogeneous world. I'm not sure about release info.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rob Gabbard (uunet!sdrc!crgabb)                 _    /|
Workstation Systems Programmer                  \'o.O'
Structural Dynamics Research Corporation        =(___)=   
                                                   U
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (12/13/88)

In article <8812041914.AA20493@umix.cc.umich.edu> jec@IUVAX.CS.INDIANA.EDU (James E. Conley) writes:
>
>	Well, by primative I meant that they lacked some very useful features
>of dump and restore, namely multiple levels of backup and tape retries if you
>put a bad tape on.  It is better than nothing, but I'd prefer dump and restore
>any day.  Not to mention the ability to do dumps on other machines (A VAX to
>an Alliant, not just from Apollo to Apollo) using rdump and rrestore.  Also,
>dump dates aren't updated until the backup finishes.

I assume what you are looking for here is a set of levels of dumps.  You can do
this with wbak.  Simply do you full backup just as you would your level 0
with dump.  Then do each incremental with the -nhi switch.  The wbak facility
will forget that it did the incremental so next time wbak does an incremental
backup, it will do the incremental since the last full backup.

You can get much more sophisticated by manipulating the backup_history files.
For example, you can get muliple levels of dumps by saving multiple 
backup_history files.  Though this is more complicated, it can be handled 
through scripts.

>	And of course, wbak/rbak are slow.

It is not wbak that is slow, it is hardware.  If you had a tape drive connected
which would stream, you would see much faster backups of the local disk.  Many
sites I have been to will not use rdump because it is too slow to backup over
the network.  The wbak program is also bound by the speed of the network.

It has already been brought up that one can backup to a file.  The file can be
over the network.  Finally, there is the new product which was mentioned.
-- 
UUCP: uunet!hi-csc!giebelhaus         UUCP: tim@apollo.uucp
ARPA: hi-csc!giebelhaus@umn-cs.arpa   ARPA: tim@apollo.com
Tim Giebelhaus, Apollo Computer, Regional Software Support Specialist.
My comments and opinions have nothing to do with work.

ross@sword.ulowell.edu (Ross Miller) (12/16/88)

> will probably solve most of my objections about speed of backups (since doing
> a distributed search will undoubtably speed things up), but there is still
> the problem that wbak writes slowly to the tape.  I believe 4.3 dump speed
> should be achievable for a single file as far as keeping a streaming tape
> drive busy to at least occassionally stream.
> 
> 	I guess with a fast network you could always just rcp these files to
> some machine with a faster tape system.


What I find will work is that if one runs wbak on a machine that is very
lightly loaded and the right configuration.  I can occasionally stream
a cypher 6250 9track off of a dsp80 if I run from the single user shell.
The tape will not stop moving otherwise, but if I attempt to run that
node with spm, ethernet, llbd, and other stuff, then the 3.0 Mb system
just won't keep up and the drive stops moving.

						Ross