[comp.sys.apollo] Diskless boot control

bonnetf@apo.esiee.fr (bonnet-franck) (02/13/91)

Hi,

I have a problem with diskless nodes.

Here we have about 100 nodes and only 40 disked nodes,
very often some students reboot the diskless nodes anywhere
on the ring with the "DI N XXXX" command. 

This is not good because we have 3 ( 10.1/2/3) versions of DOMAIN_OS
and some softwares are not running under any versions ( eg : MENTOR GRAPHICS )
this also create a lot of /sys/node_data.xxxx for nothing.

In a word we want to CONTROL on which disked node a diskless node MUST boot.

Does somebody has done this before ?

I think to a similar mechanism that is controlling "crp" with the 
/sys/node_data/spm_control file.

This would be smart.

Thanks by advance for any help !

-------------------------------------------------------------------------------|
bonnetf@apo.esiee.fr                     |                                     |
Frank Bonnet                             | Surfing ...                         |
E.S.I.E.E                                |                                     |
BP99 93162 Noisy le Grand cedex.FRANCE.  | the rest is details !               |
Fax   : 33 1 45 92 66 99                 |                                     |
-------------------------------------------------------------------------------|

thompson@PAN.SSEC.HONEYWELL.COM (John Thompson) (02/14/91)

> First a brief description of diskless node booting ... 
> In order to boot a diskless machine, the partner machine must be running
> /sys/net/netman. "netman" only handles booting requests. After the diskless
> ....   It may get drafted into booting service whether or not it likes it!
True.  See pp 3-39 and 3-40 in _Managing_Aegis_System_Software_ (010852-A00).

> Now there used to be a way to control this in a very general method ...
> prior to SR10.0, ACL's not only had a user, group, and project field ...
> they also had a 4th field which was the node ID! Thus, file access could
> be restricted (or allowed) according to the node which requested access
> to the file. Presumably you could have ACL'd the /sauN directory according
> to which nodes you wanted to allow to boot from each partner.
Well, you can't set it up quite as flexibly as sr9 allowed you to for
node-access, but there is a '-lao' (and '-nolao') switch on edacl.  (There's
an equivalent switch in chacl, Unix fans.)  This switch prevents the object
from being opened by a remote node.  I personally vote against this sort of 
thing, because I like the wide-open network of disks concept (although I
still protect the system software....)  Setting the /sauX directory to
local-access only would prevent ANY node from booting diskless off it.  You
wouldn't be able to allow node 1234 and 5678 access, but no-one else.

> Now as to what you can do under SR10.x ... one thing comes to mind ...
> When "netman" services a boot request it executes /sys/net/netman.rc,
> which is a link to either "netman.bin_sh" or "netman.com_sh". These
> shell scripts set up the /sys/node_data.NODE_ID directory for the
> diskless node. One of the arguments to the shell script is the node ID
> of the diskless node. You could edit the shell script to explicitly....
I'd do it in the script, if I were you.

-- jt --
John Thompson
Honeywell, SSEC
Plymouth, MN  55441
thompson@pan.ssec.honeywell.com

Me?  Represent Honeywell?  You've GOT to be kidding!!!

dbfunk@ICAEN.UIOWA.EDU (David B Funk) (02/14/91)

In posting <9102131559.AA02437@apo.esiee.fr>, bonnetf@apo.esiee.fr (bonnet-franck) says:

>> Here we have about 100 nodes and only 40 disked nodes,
>> very often some students reboot the diskless nodes anywhere
>> on the ring with the "DI N XXXX" command. 
>> 
  [stuff deleted]
>> 
>> In a word we want to CONTROL on which disked node a diskless node MUST boot.


In posting <9102132043.AA17329@richter.mit.edu>, krowitz@richter.mit.edu (David Krowitz)
replies with a good description of diskless node booting and ends with:

> Now as to what you can do under SR10.x ... one thing comes to mind ...
> When "netman" services a boot request it executes /sys/net/netman.rc,
> which is a link to either "netman.bin_sh" or "netman.com_sh". These
> shell scripts set up the /sys/node_data.NODE_ID directory for the
> diskless node. One of the arguments to the shell script is the node ID
> of the diskless node. You could edit the shell script to explicitly
> check the node ID before continuing on to create the diskless partner's
> directory. By refusing to create the `node_data directory, you could
> abort the attempt to boot the diskless node.

David has an excelent idea that is easily implemented. Here is a simple addition
to the "netman.com_sh" shell script that will provide the suggested control.
The first argument passed to the shell script is the node ID. By comparing this
with a list of authorized nodes, it is possible to control the boot process.
If the shell script returns with an "error" status, then netman will abort the
boot process and the diskless node will then fail in its boot up attempt.
My approach is to check the node ID against the contents of the "diskless_list"
file, and if the node ID is not found, then return with "error". Thus this script
will succeed for those nodes that it should support (the ones explicitly listed in
"diskless_list") and fail for those 'intruders' who are trying to force partner.
Here is the contents of the first few lines from my "/sys/net/netman.com_sh" script:

  #!/com/sh
  eon
  #
  #  NETMAN.RC - shell script run by netman to setup `node_data for a diskless node
  #
  #  ^1 = NODE_ID
  #  ^2 = TYPE
  #
  # First check to see if the diskless node is one that we've been authorized
  # to provide boot service for. IE it's in our "diskless_list" file. (dbf 2/13/91)
  #
  if ( /com/fpat -i "%^1" "% *^1" < /sys/net/diskless_list > /dev/null )
    then #OK, this guy's in our diskless_list
      args "Doing boot for node: ^1"
    else #choke, we don't know this one
      args "Invalid boot request by node: ^1"
      return -e
  endif
  #
  #  The following remote paging file size must agree with what the
  #  remote node actually maps (in /os/ker/ast.pas: ast_$activate)!!!! 

Note, the effects of this "error" return on the diskless node are a bit alarming
if unexpected. The diskless node will start its normal boot up procedure, it
will load the kernal, display its kernal revision number & date, and then give
a crash message with a "F0001" crash status code. This is the way that netman
aborts the diskless node boot process it is NOT a system failure. Just be aware
that you may get a few panic calls from users the first time that they get
caught by this. This will not prevent the running of other "SAU" tools by the
diskless nodes (such as "calendar" & "self_test") it will only control the OS
booting. Obviously the /sys/net directory, "diskless_list" & "netman.com_sh"
files will need to be protected from world write access to complete the
picture. When ever netman invokes this shell script, its standard out is
directed into the file "/sys/node_data/systmp/netman.out" This file can be used
for debugging modifications to the script or to check its actions.
  For this script to be used, the link "/sys/net/netman.rc" must resolve to
"netman.com_sh" and you must have Aegis ("/com") loaded on your system.
If you are a Unix-only shop, you will need to make an equivalent modification
to the script "netman.bin_sh".

  An additional modification to this script can also address a security
loop-hole that was pointed out in a previous posting by Frank: When a diskless
node boots, its `node_data/etc directory is created with open ACLs thus allowing
the world to mofiy its contents at will, including the "rc" scripts which are run
as "root" at boot time. There is also a simple cure for this problem.
First create a template directory tree "/etc/node_data" which contains
directories like "etc", "dev", "cron", and such. Then set the ACLs on these
directories as you would have them look for a properly running system. For
example, the ACL on "/etc/node_data/etc" might be:

  $ acl /etc/node_data/etc
  Acl for /etc/node_data/etc:
  Required entries
   root.%.%                         prwx-
   %.staff.%                        -r-x-
   %.%.none                         [ignored]
   %.%.%                            -r-xk
   Extended entry rights mask:      -----
(Note the "k" bit for the world)

Note that the ACLs on this template MUST be set up correctly for a properly
running system. In particular, "node_data" must be world writeable or lots of
things will break. Basically this template should have as much of a set of
directories as you want to have control on the ACLs for.
Now modify the "netman.com_sh" script and add a line:

  #  {----------- Create NODE_DATA directory and setup acls.------------}
  #
  DIR := "/sys/node_data.^1"
  if existf ^DIR then
     /etc/ulkob ^DIR -f
  else
     /com/crd ^DIR -open
  endif
  #
  # Add copy of our ACL template directory to the new target (dbf 2/13/91)
  /com/cpt /etc/node_data ^DIR -md -sacl


Because of the line "/com/cpt /etc/templates ^DIR/etc -md -sacl" that IS in
the standard Apollo supplied netman.com_sh script, you will also need to make
sure that /etc/templates has the same ACL as /etc/node_data/etc.

Dave Funk