[comp.sys.apollo] questions on DDS network interfaces and name service

pha@caen.engin.umich.edu (Paul H. Anderson) (08/24/90)

I have several questions about putting one Apollo host on two
different networks via two different ring cards.  The intention
is to provide greater availability for the fileserver, and to
avoid fileserver traffic across a gateway.  The node is not
routing traffic itself, although it does have two interfaces.

Each interface is given a unique name, to allow the nameserver
to hold both.  The nameserver might have the following registered
in the ns_helper database:

	bigfs	1.9876
	bigfs_2	2.9876

Therefore, nodes may access node 9876 through one interface
or the other, depending on which one it uses the very first
time (see question 2, below).  Thereafter, any reference to
either bigfs or bigfs_2 goes through the interface specified
the very first time.

Q1:  It appears that when one the network on one interface
goes down, that access to the fileserver on the other interface
also gets wedged, even though the second network is still up.
Am I doing something wrong, or is there no way around this?

This is a potential problem, because we would like to implement
an alternate network for large fileservers for robustness, but
if the alternate net can't access the alternate interface just
because one of the primary rings goes down, then we don't benefit.

Q2: Currently, when a new client comes on the net, then looks
at the fileserver through one interface, it uses that interface
ever after, even with the alternate interface name.

I am able to change the interface that a client resolves to by
removing the /sys/node_data/hint_file, then rebooting the node,
then touching the interface that I want to use, first.  After
this is done, the client machine always uses that interface.

I tried various combinations of ctnode and modifying things
in the name server, but nothing did what I wanted.

Why do I want to do this?

Two scenarios come to mind:  1) we move a fileserver from one
ring to another, and change the network adress, and 2) we have
two interface cards on a fileserver, and for some reason want
to be able to 'point' a client to one interface or the other.

In the first case, I believe that I must reboot every client
machine that wishes to see that particular fileserver, and
this can be a monumentally expensive task with 500 clients,
any one of which might access the fileserver.  The second case
still requires a reboot of the client node.

Is there a safe way for me to dynamically modify the hint_file to
use a different network number for a given node id?

I've spent the last day or so experimenting with different
configurations, and I'd like to figure out exactly what the
limits are to the things I'm trying.

Thanks in advance for any answers.

Paul Anderson
CAEN Systems Programmer
University of Michigan

ced@apollo.HP.COM (Carl Davidson) (08/25/90)

From article <1990Aug23.175212.29600@caen.engin.umich.edu>, by pha@caen.engin.umich.edu (Paul H. Anderson):
> 
> I have several questions about putting one Apollo host on two
> different networks via two different ring cards.  The intention
> is to provide greater availability for the fileserver, and to
> avoid fileserver traffic across a gateway.  The node is not
> routing traffic itself, although it does have two interfaces.
> 
> Each interface is given a unique name, to allow the nameserver
> to hold both.  The nameserver might have the following registered
> in the ns_helper database:
> 
> 	bigfs	1.9876
> 	bigfs_2	2.9876
> 
> Therefore, nodes may access node 9876 through one interface
> or the other, depending on which one it uses the very first
> time (see question 2, below).  Thereafter, any reference to
> either bigfs or bigfs_2 goes through the interface specified
> the very first time.
> 
> Q1:  It appears that when one the network on one interface
> goes down, that access to the fileserver on the other interface
> also gets wedged, even though the second network is still up.
> Am I doing something wrong, or is there no way around this?
> 

No, you're not doing anything wrong. I assume both interfaces are Apollo 
Token Ring (ATR) boards. That is the only case I currently know of where 
this occurs. The problem caused by the fact that the transmit code in the
ATR driver is responsible for maintaining the health of the ring (i.e., the
MAC layer is implemented in the driver). When one of the rings goes down
because the ring is broken the next call to the ATR driver's send routine
results in the driver attempting to insert a new token and make the ring 
healthy again. The other interface is affected because of the way Aegis 
(yes, Virginia, this part of the kernel is STILL really Aegis :-)) 
handles mutexing around network transmits. The currently shipping 
implementation allows exactly one network transmit to be in process at a time.
since the send routine of the unhealthy ring is busy trying to heal that 
broken ring, the other network cannot get "ring_$xmit_lock" to send anything,
hence your symptom.

This is being changed for SR11 to allow multiple "simultaneous" transmits. 
That should make this symptom go away.

> 
> Q2: Currently, when a new client comes on the net, then looks
> at the fileserver through one interface, it uses that interface
> ever after, even with the alternate interface name.
> 
> I am able to change the interface that a client resolves to by
> removing the /sys/node_data/hint_file, then rebooting the node,
> then touching the interface that I want to use, first.  After
> this is done, the client machine always uses that interface.
> 
> I tried various combinations of ctnode and modifying things
> in the name server, but nothing did what I wanted.
> 
> Why do I want to do this?
> 
> Two scenarios come to mind:  1) we move a fileserver from one
> ring to another, and change the network adress, and 2) we have
> two interface cards on a fileserver, and for some reason want
> to be able to 'point' a client to one interface or the other.
> 
> In the first case, I believe that I must reboot every client
> machine that wishes to see that particular fileserver, and
> this can be a monumentally expensive task with 500 clients,
> any one of which might access the fileserver.  The second case
> still requires a reboot of the client node.
> 
> Is there a safe way for me to dynamically modify the hint_file to
> use a different network number for a given node id?
> 

The behavior you describe results from the fact that once a node has
an association between the node portion of a UID and an Internet address 
(net.node_id) cached in its hint file, it will send network traffic pertaining
to all objects whose UID contains that same node ID to the Internet address
cached in the hint file. You probably already knew this. In the currently 
shipping imlementation, there is no safe way to modify the hint file "on
the fly". In fact, there is no mechanism to allow the user to modify the 
hint file at all, other than to delete it and reboot. Rebooting is necessary,
of course, because the kernel has the hint file mapped and locked and it
won't actually be deleted until the node is shut down. At that point, the
system releases its lock on the hint file and the file system deletes it,
since it was marked for deletion when you invoked the command 

    "dlf -du /sys/node_data/hint_file"

which tells the system to delete the file once it has been unlocked. When the
system boots, the kernel detects the absence of a hint file and creates a 
new, empty one. Your first reference to the fileserver then cached the desired 
hint in the hint file and things work the way you would like.

I would like to make the system more pliable to your wishes at SR11. Whether
I will be completely successful or not remains to be seen. The naming and 
directory managers depend heavily on hints not disappearing once they have
been cached in the hint file. One undesirable consequence of deleting hints
"on the fly", especially in an Internet, can be that the naming server becomes
unable to tell where your current working directory is. There are other 
potential complication also that I am not fully prepared to go into at this
time. Suffice it to say that the hint mechanism is one of the less 
straightforward areas of the system.

> I've spent the last day or so experimenting with different
> configurations, and I'd like to figure out exactly what the
> limits are to the things I'm trying.
> 
> Thanks in advance for any answers.

I hope these explanations have helped. If you have more questions, please 
feel free to post more questions or to send me mail (ced@apollo.hp.com) 
directly. I'll do my best to respond, although I occasionally have to do some
coding, otherwise none of this stuff will make SR11. :-)

Best Regards,
Carl

> Paul Anderson
> CAEN Systems Programmer
> University of Michigan

Carl Davidson  (508) 256-6600 x4361    | In the High and Far-Off Time, the
The Apollo Systems Divison of          | Elephant, Oh Best Beloved, had no
The Hewlett-Packard Company            | trunk.
DOMAIN: ced@apollo.HP.COM              |  -- Rudyard Kipling, Just So Stories