toddb@tekcrl.UUCP (Todd Brunhoff) (02/18/86)
4 bug fixes follow. The most important is one I did out of sheer frustration: when a server host goes down, the client kernel marks the connection as closing and refuses to start up another connection until all the processes which have used that host have exited. I now think that this is ridiculous requirement. And from the hassles I have received from users and those you have told me you get from your users, I surmise that I am not alone in ridiculing my own implementation. The fix simply entailed not waiting for processes using the connection to exit. The result is that any process that has open file descriptors or a working directory on the remote host, suddenly loses them... but then, they were lost anyway. Most important, though, the connection will restart (if the other host is up). Two files changed in the kernel, remote/rmt_io.c and remote/rmt_generic.c. First, here are the changes to remote/rmt_io.c: *************** *** 11,17 * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $ * * $Log: rmt_io.c,v $ * Revision 2.3 86/02/04 10:26:43 toddb - --- 11,17 ----- * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_io.c,v 2.4 86/02/17 17:43:21 toddb Exp $ * * $Log: rmt_io.c,v $ * Revision 2.4 86/02/17 17:43:21 toddb *************** *** 14,19 * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $ * * $Log: rmt_io.c,v $ * Revision 2.3 86/02/04 10:26:43 toddb * Added a missing rmt_unuse() call when it is too soon to call a host * that has failed a connection. Contributed by Dennis Rockwell @ csnet. - --- 14,25 ----- * $Header: rmt_io.c,v 2.4 86/02/17 17:43:21 toddb Exp $ * * $Log: rmt_io.c,v $ + * Revision 2.4 86/02/17 17:43:21 toddb + * If the host is closing, remote_getconnection() goes right ahead and + * closes and then tries to reopen. This provides a pretty good recover + * mechanism, except for the fact that current working directories are + * lost and open files are lost. + * * Revision 2.3 86/02/04 10:26:43 toddb * Added a missing rmt_unuse() call when it is too soon to call a host * that has failed a connection. Contributed by Dennis Rockwell @ csnet. *************** *** 88,93 return(rp->r_openerr); } /* * We may already have a connection */ else if (rp->r_sock) - --- 94,104 ----- return(rp->r_openerr); } /* + * If it is closing, do it here and get it over with. + */ + else if (rp->r_close) + rmt_closehost(rp); + /* * We may already have a connection */ else if (rp->r_sock) *************** *** 103,108 rp->r_failed = FALSE; } rp->r_opening = TRUE; /* * pseudo loop to avoid labels... - --- 114,120 ----- rp->r_failed = FALSE; } rp->r_opening = TRUE; + rp->r_close = FALSE; /* * pseudo loop to avoid labels... *************** *** 205,211 /* * get a connection. */ ! if ((so = rp->r_sock) == NULL) if (error = remote_getconnection(system)) { (void) m_freem(*m); return(error); - --- 217,223 ----- /* * get a connection. */ ! if ((so = rp->r_sock) == NULL || rp->r_close) if (error = remote_getconnection(system)) { (void) m_freem(*m); return(error); Changes to remote/rmt_generic.c: *************** *** 11,17 * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $ * * $Log: rmt_generic.c,v $ * Revision 2.3 86/02/17 14:36:37 toddb - --- 11,17 ----- * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_generic.c,v 2.4 86/02/17 17:46:07 toddb Exp $ * * $Log: rmt_generic.c,v $ * Revision 2.4 86/02/17 17:46:07 toddb *************** *** 14,19 * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $ * * $Log: rmt_generic.c,v $ * Revision 2.3 86/02/17 14:36:37 toddb * Fix so that the kernel properly reacts to nonexistant hosts * on the generic mount point: sockargs() was being called in the - --- 14,23 ----- * $Header: rmt_generic.c,v 2.4 86/02/17 17:46:07 toddb Exp $ * * $Log: rmt_generic.c,v $ + * Revision 2.4 86/02/17 17:46:07 toddb + * We no longer stop processing in isremote() if the connection is in + * the middle of closing. + * * Revision 2.3 86/02/17 14:36:37 toddb * Fix so that the kernel properly reacts to nonexistant hosts * on the generic mount point: sockargs() was being called in the *************** *** 441,448 } /* ! * We have a remote mount point. However, it may be in the midst of ! * shutting down from some error. This mount point may also be * a "generic" mount point, in which case, we must figure out * the host. */ - --- 445,451 ----- } /* ! * We have a remote mount point. This mount point may be * a "generic" mount point, in which case, we must figure out * the host. */ *************** *** 452,462 debug8("isremote: can't map path %s\n", path); return(TRUE); } - - } - - if (rp->r_close) { - - debug8("isremote: host %s is closing\n", rp->r_mntpath); - - u.u_error = ENOREMOTEFS; - - return(TRUE); } u.u_error = EISREMOTE; remote_sysindex = sysnum; - --- 455,460 ----- debug8("isremote: can't map path %s\n", path); return(TRUE); } } u.u_error = EISREMOTE; remote_sysindex = sysnum; Dennis Rockwell @ csnet found one of the counting bugs in the RFS bookkeepping. I had forgotten a call to rmt_unuse() if it is not time for a retry. Here are the changes to the kernel file remote/rmt_io.c: *************** *** 11,17 * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_io.c,v 2.2 86/01/21 11:05:41 toddb Exp $ * * $Log: rmt_io.c,v $ * Revision 2.2 86/01/21 11:05:41 toddb - --- 11,17 ----- * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $ * * $Log: rmt_io.c,v $ * Revision 2.3 86/02/04 10:26:43 toddb *************** *** 14,19 * $Header: rmt_io.c,v 2.2 86/01/21 11:05:41 toddb Exp $ * * $Log: rmt_io.c,v $ * Revision 2.2 86/01/21 11:05:41 toddb * Missing argument to sleep() in rmt_getconnection(). * - --- 14,23 ----- * $Header: rmt_io.c,v 2.3 86/02/04 10:26:43 toddb Exp $ * * $Log: rmt_io.c,v $ + * Revision 2.3 86/02/04 10:26:43 toddb + * Added a missing rmt_unuse() call when it is too soon to call a host + * that has failed a connection. Contributed by Dennis Rockwell @ csnet. + * * Revision 2.2 86/01/21 11:05:41 toddb * Missing argument to sleep() in rmt_getconnection(). * *************** *** 92,98 * If we have failed previously, it may be time to try again. */ else if (rp->r_failed) { ! if (rp->r_age > time.tv_sec) return(rp->r_openerr); rp->r_failed = FALSE; } - --- 96,103 ----- * If we have failed previously, it may be time to try again. */ else if (rp->r_failed) { ! if (rp->r_age > time.tv_sec) { ! rmt_unuse(rp, system); return(rp->r_openerr); } rp->r_failed = FALSE; *************** *** 94,99 else if (rp->r_failed) { if (rp->r_age > time.tv_sec) return(rp->r_openerr); rp->r_failed = FALSE; } rp->r_opening = TRUE; - --- 99,105 ----- if (rp->r_age > time.tv_sec) { rmt_unuse(rp, system); return(rp->r_openerr); + } rp->r_failed = FALSE; } rp->r_opening = TRUE; Bill Sommerfeld found a bug in the kernel half of the name server. If some invalid host was implied by the pathname /net/host, the server properly returned a null internet address, but the kernel still called sockargs() to allocate space for it. Here are the changes to the kernel file remote/rmt_generic.c (my appologies, but you will probably have to do these context diffs by hand, because they contain the code for #ifdef BSD4_3 which my installation script removes). *************** *** 11,17 * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_generic.c,v 2.2 86/01/02 13:52:32 toddb Exp $ * * $Log: rmt_generic.c,v $ * Revision 2.2 86/01/02 13:52:32 toddb - --- 11,17 ----- * may be copied, modified or used in any way, without fee, provided this * notice remains an unaltered part of the software. * ! * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $ * * $Log: rmt_generic.c,v $ * Revision 2.3 86/02/17 14:36:37 toddb *************** *** 14,19 * $Header: rmt_generic.c,v 2.2 86/01/02 13:52:32 toddb Exp $ * * $Log: rmt_generic.c,v $ * Revision 2.2 86/01/02 13:52:32 toddb * Ifdef'ed calls to sockargs() for the differences in 4.2 vs. 4.3. * - --- 14,25 ----- * $Header: rmt_generic.c,v 2.3 86/02/17 14:36:37 toddb Exp $ * * $Log: rmt_generic.c,v $ + * Revision 2.3 86/02/17 14:36:37 toddb + * Fix so that the kernel properly reacts to nonexistant hosts + * on the generic mount point: sockargs() was being called in the + * remotename() system call (case NM_NAMEIS) even if uap->name was null. + * Bill Sommerfeld (wesommer@athena.mit.edu). + * * Revision 2.2 86/01/02 13:52:32 toddb * Ifdef'ed calls to sockargs() for the differences in 4.2 vs. 4.3. * *************** *** 328,333 m_free(m); /* free extra mbuf */ m = remote_ns.rn_name = NULL; } #ifdef BSD4_3 error = sockargs(&m, uap->name, uap->namelen, MT_SONAME); #else BSD4_3 - --- 334,340 ----- m_free(m); /* free extra mbuf */ m = remote_ns.rn_name = NULL; } + if (uap->name) { #ifdef BSD4_3 error = sockargs(&m, uap->name, uap->namelen, MT_SONAME); #else BSD4_3 *************** *** 329,335 m = remote_ns.rn_name = NULL; } #ifdef BSD4_3 ! error = sockargs(&m, uap->name, uap->namelen, MT_SONAME); #else BSD4_3 error = sockargs(&m, uap->name, uap->namelen); #endif BSD4_3 - --- 336,342 ----- } if (uap->name) { #ifdef BSD4_3 ! error = sockargs(&m, uap->name, uap->namelen, MT_SONAME); #else BSD4_3 error = sockargs(&m, uap->name, uap->namelen); #endif BSD4_3 *************** *** 331,337 #ifdef BSD4_3 error = sockargs(&m, uap->name, uap->namelen, MT_SONAME); #else BSD4_3 ! error = sockargs(&m, uap->name, uap->namelen); #endif BSD4_3 if (error == 0) { cp = mtod(m, char *) + m->m_len; - --- 338,344 ----- #ifdef BSD4_3 error = sockargs(&m, uap->name, uap->namelen, MT_SONAME); #else BSD4_3 ! error = sockargs(&m, uap->name, uap->namelen); #endif BSD4_3 if (error == 0) { cp = mtod(m, char *) + m->m_len; *************** *** 333,339 #else BSD4_3 error = sockargs(&m, uap->name, uap->namelen); #endif BSD4_3 ! if (error == 0) { cp = mtod(m, char *) + m->m_len; if (error = copyin((caddr_t)uap->path, (caddr_t)cp, MIN(R_MNTPATHLEN, MLEN - m->m_len))) { - --- 340,346 ----- #else BSD4_3 error = sockargs(&m, uap->name, uap->namelen); #endif BSD4_3 ! if (error == 0) { cp = mtod(m, char *) + m->m_len; if (error = copyin((caddr_t)uap->path, (caddr_t)cp, MIN(R_MNTPATHLEN, MLEN - m->m_len))) { *************** *** 341,346 m = NULL; } debug13("nmsrv: %s@%x\n", cp, m); } remote_ns.rn_name = m; wakeup(&remote_ns.rn_name); - --- 348,354 ----- m = NULL; } debug13("nmsrv: %s@%x\n", cp, m); + } } remote_ns.rn_name = m; wakeup(&remote_ns.rn_name); Joe Othmer may have found the bug that caused a command like ls /host1/host2/file to hang. The sentry server (which does only nameserver functions for "/net/...", and listens for connections) had the proper call to the system call "remoteoff(NULL)" so that remote access is denied for the server. This sets the SNOREMOTE bit in the proc structure p_flag. Unfortunately, the gateway server (forked by the sentry server) and any of the other servers (forked by the gateway server), get this bit cleared by a fork(). The effect was that the server was actually allowing the hops to happen. Here are the changes so that any server is sure to have the SNOREMOTE bit set. The changes go in the server file remote/fileserver.c: *************** *** 12,17 * notice remains an unaltered part of the software. * * $Log: fileserver.c,v $ * Revision 2.0 85/12/07 18:21:14 toddb * First public release. * - --- 12,22 ----- * notice remains an unaltered part of the software. * * $Log: fileserver.c,v $ + * Revision 2.1 86/02/17 15:11:03 toddb + * The SNOREMOTE bit is cleared in the children of the server, so a call + * to remoteoff(NULL) was added to server() and to become_server(). + * Joe Othmer (vax135!jo). + * * Revision 2.0 85/12/07 18:21:14 toddb * First public release. * *************** *** 16,22 * First public release. * */ ! static char *rcsid = "$Header: fileserver.c,v 2.0 85/12/07 18:21:14 toddb Rel $"; #include <errno.h> #include <stdio.h> #include <ctype.h> - --- 21,27 ----- * First public release. * */ ! static char *rcsid = "$Header: fileserver.c,v 2.1 86/02/17 15:11:03 toddb Exp $"; #include <errno.h> #include <stdio.h> #include <ctype.h> *************** *** 177,182 /* * Ok, now be a server! */ for(;;) { if (i_am_gateway && ! i_have_control) - --- 182,188 ----- /* * Ok, now be a server! */ + remoteoff(NULL); for(;;) { if (i_am_gateway && ! i_have_control) *************** *** 342,347 * our pid, etc. Also, try to dup stderr so that flock will work. * If we can't do it, we are in big trouble. */ current_ppid = current_pid; current_pid = getpid(); if (i_am_gateway) - --- 348,354 ----- * our pid, etc. Also, try to dup stderr so that flock will work. * If we can't do it, we are in big trouble. */ + remoteoff(NULL); current_ppid = current_pid; current_pid = getpid(); if (i_am_gateway) ------- End of Blind-Carbon-Copy