brian@ucsd.EDU (Brian Kantor) (07/08/89)
Occasionally we here at UCSD seem to suffer from connectivity problems that I think are a result of lost routing information. The symptoms are that we stop being able to reach some networks or they us. To be more specific about it, our campus Ethernet is connected via a Proteon router to the San Diego Supercomputer Center's Ethernet and to several other networks around California - "CERFnet". We rarely have trouble reaching those networks. However, from time to time, some networks don't seem to be reachable from our campus network, but can be reached from machines on the SDSC Ether or from other CERFNet members. For example, right at this moment I can't ping any machines on the 192.31.103 network where RELAY.CS.NET and its nameservers live, nor can we ping anything on the Purdue campus. Yet both are quite reachable from SDSC. The NIC was unreachable for more than a day, and we haven't been able to get info from the UK nameservers for more than a week. I don't get network unreachable ICMP messages. Our routing table consists of a few subnet entries and a default route to the SDSC Proteon. SDSC has recently lost their network guru, and whilst they are trying quite hard to help, they're not quite up to speed just yet. What I think is happening is that the reachability information for the UCSD network isn't getting propagated as well as it might be. I suspect that my outgoing pings are probably reaching their destinations, but that the return ping response can't find a route back to our network. How can I test this from here (or elsewhere)? Brian Kantor UCSD Postmaster UCSD Office of Academic Computing (619) 534-6865 UCSD C-010, La Jolla, CA 92093 USA fax: 619 534 7018 brian@ucsd.edu BRIAN@UCSD ucsd!brian
Gene.Hastings@BOOLE.ECE.CMU.EDU (07/09/89)
Brian, forgive me if some of what I say is old hat to you, but I feel it is preferable to give too much information than too little. My first caveat is to distinguish between the statements "I can't reach the Internet." and "I can't reach this group of interesting machines." The reason for this is that the world beyond UCSD and SDSC is not homogeneous, and that it is possible that certain groups of networks may have a specific point of failure (such as SRI-NIC and SIMTEL-20, neither of which are directly connected to NSFNET, but rely on inter-backbone gateways between NSFNET, ARPANET and MILNET). The value of this distinction is that it may provide some hint as to the nature of the failure. The fact that you get no error messages back indicates that routing announcements of your networks are not reaching the far end, and thus the return traffic is dropped (that is, your traffic fails on the return path, not on tha outgoing). This kind of thing is enormously hard to toubleshoot without the aid of someone at another site, preferably the other end of the path you're trying to troubleshoot. What things can you do? A very powerful tool is traceroute, which has been described here before (which is my way of admitting I can't recall all of the pointers), and differs from the other tools in that it does not require special authentication to use, or running a particular protocol on the intervening routers. Other useful tools are in the SGMP/SNMP family (you can query a routing agent as to individual routes, or its entire routing table), which you may be able to use depending upon the nature of your agreements with your regional as to posession of the proper session/community names. Another tool which provides useful information (in the absence of any other, at least) is RIP query.) Even if you do not have personal access to the tools, your regional NOCs should, and may be able to talk you through the tests. Gene
kwe@bu-cs.BU.EDU (kwe@bu-it.bu.edu (Kent W. England)) (07/10/89)
In article <1823@ucsd.EDU> brian@ucsd.EDU (Brian Kantor) writes: >Occasionally we here at UCSD seem to suffer from connectivity problems >that I think are a result of lost routing information. The symptoms are >that we stop being able to reach some networks or they us. >[...] >What I think is happening is that the reachability information for the >UCSD network isn't getting propagated as well as it might be. I suspect >that my outgoing pings are probably reaching their destinations, but >that the return ping response can't find a route back to our network. > >How can I test this from here (or elsewhere)? I think you are right. It is hard for you to troubleshoot this yourself. You need help. The SDSCnet people should be able to deal with these things in response to mail from you as the campus representative, exactly like you posted to tcp-ip. Let SDSCnet or CERFnet have another shot at solving your problem for you. As a local user, you should be able to ask your campus network manager (perhaps that is you) who can call on the regional network operations people who can call on MERIT, the backbone network people. MERIT has the tools and techniques to solve these problems, but they need to limit their interaction to the regional technical people. There are too many people on the net for them to work with everyone directly. In the case of Purdue and CSnet, they were once well served by arpanet, and since the arpanet has evaporated very rapidly, many organizations are scrambling to migrate to new network services, and that means the NSF-Internet. Right now, connectivity to many organizations and for many internetwork connections still takes place using default routes. Default routes tend to break when widespread connectivity changes are made, like taking down the arpanet. The most common default is still the good ol' arpanet, and many a slip twixt Hither and Yon on that old caravan route. (I don't find any purdue nets or the cs.net in routing information from the backbone via jvncnet. It could be temporary, but I think not. My default routing still works. Lucky for me, they can find my in their defaults.) --Kent England
brian@ucsd.EDU (Brian Kantor) (07/11/89)
Well, we found the problem - seems one of the intermediate routers had a default route pointing to a machine which no longer exists and which used to be that site's Arpanet gateway. Once that was fixed things started to flow again. Thanks all for your suggestions; they did help us! Brian Kantor UCSD Office of Academic Computing Academic Network Operations Group UCSD C-010, La Jolla, CA 92093 USA brian@ucsd.edu ucsd!brian BRIAN@UCSD
heker@JVNCA.CSC.ORG (Sergio Heker) (07/13/89)
I tend to agree with Kent that troubleshooting routing problems require the interaction with other Networks. But a more general statement can be made that includes not only routing but End to End service. This means connectivity as well as performance. In this, more general case we need to remember that the Internet is a "network" has distributed management or in other words, each of the Internet components is managed and operated by different (autonomous) entities. These "entities" have different levels of service (hours of operation and type of support, e.g. tools). One of the greatest efforts to put some light into this problem, in my opinion, is the NSFnet backbone. MERIT has been developing the infrastructure to be able to look into problems that affect users across country that use the NSFnet network to pass Inter-regional traffic, and is doing a very good job assisting the Regional Networks to get problems resolved. The Regional Networks have a role in dealing with the regional users and helping them to get the problems outside their campuses resolved. This requires among other things, that Regional Networks be prepared (have the facilities and infrastructure) to help their users. This raises the point of who the users of the Regional Network are. One answer is the institutions connected to it, the other answer is the people that pass traffic. If the Regional Network users are the "Campus" Network Organization (for the Campuses that have one), then they are responsible for assisting their users (the people that send the traffic). The JvNCnet network, like other networks has been dealing with all these issues for the last three years, and is working closely with the Institution members (Campuses), with MERIT and with other Regional Networks to assist users. Consistent with this spirit of cooperation we have met a number of times with the principals of the Regional and State Networks in the North East of the Country (PREPnet, NYSERnet, NEARnet and JvNCnet) to discuss technical issues of Regional to Regional nature. meetings have been very productive, and will continue in the future, in order to provide for the necessary coordination among the peer networks to free the end-user of unnecessary complications. In doing this we have developed a group within the Network Department, called the Network Information Services Group, with the function of providing information to the JvNCnet members (among other things). A Network Operations Group deals with the daily operations of the network. Two other groups sometimes not visible but nevertheless very important in supporting our network are the Network Engineering Group and the Network Installation and Maintenance Group. This organization and the facilities available consitute our infrastructure to be able to support our community of users. A problem that we have encountered, is that some of the end users (or the Campus' users) don't know who to contact when there is a problem on the network. Ocassionally, they call the wrong person, or the person that cannot help them to resolve the problem, or get forwarded a number of times. This only causes frustration for the end users. We are in the process, through the Network Information Services Group of initiating some training to the JvNCnet Member sites so they can assist the end users. This effort will be discussed in the next JvNCnet Regional Network Meeting in September. If anyone is interested in getting more information about JvNCnet please contact our Network Coordinator or myself at "nisc@nisc.jvnc.net" or by phone at (609) 520-2000. -- Sergio ----------------------------------------------------------------------------- | John von Neumann National Supercomputer Center | | Sergio Heker tel: (609) 520-2000 | | Director for Networking fax: (609) 520-1089 | | Internet: "heker@jvnca.csc.org" Bitnet: "heker@jvnc" | -----------------------------------------------------------------------------
schoff@SOLBOURNE.NYSER.NET ("Marty Schoffstall") (07/15/89)
I wish the problems were only routing, the reality of many situations is that they are caused by a myriad of problems: 1) the diameter of the Internet continues to grow, Ultrix systems out of the box which are configured with a "low" TTL's are having lots of problems right now since there are 10's of gateway hops now between many facilities. This is especially true during a failure where the redundant multiple path capability kicks in, but over a much "longer" path. This week within NYSERNet a T1 failed in NYC and for two days RockefellerUniv communicated with CUNY (both in NYC) through upstate NY. 2) networks break for periods of time and the word doesn't really get out. For instance both the NYSERNet and Merit/NSFNet NOCs saw truelly horrible reachability problems into the MILNET this week. Why? We don't know. 3) networks run out of bandwidth, almost nothing gets through to some very important hosts like SRI-NIC.ARPA with its ARPANET and MILNET only connections during much of the day. 4) our backup connections are mere straws in comparison to the fire hoses we normally use. A T1 connection to NEARNET (of which CSNET has connectivity through) has been very flakey of late, when it doesn't work traffic backs off onto 56kbps ARPANET. 5) and then there is routing: string, chewing gum, glue and people pushing ISO "solutions".. Good Luck, just don't lay the blame on one cause or one group. We're all at fault. Marty -------------------- Occasionally we here at UCSD seem to suffer from connectivity problems that I think are a result of lost routing information. The symptoms are that we stop being able to reach some networks or they us. To be more specific about it, our campus Ethernet is connected via a Proteon router to the San Diego Supercomputer Center's Ethernet and to several other networks around California - "CERFnet". We rarely have trouble reaching those networks. However, from time to time, some networks don't seem to be reachable from our campus network, but can be reached from machines on the SDSC Ether or from other CERFNet members. For example, right at this moment I can't ping any machines on the 192.31.103 network where RELAY.CS.NET and its nameservers live, nor can we ping anything on the Purdue campus. Yet both are quite reachable from SDSC. The NIC was unreachable for more than a day, and we haven't been able to get info from the UK nameservers for more than a week. I don't get network unreachable ICMP messages. Our routing table consists of a few subnet entries and a default route to the SDSC Proteon. SDSC has recently lost their network guru, and whilst they are trying quite hard to help, they're not quite up to speed just yet. What I think is happening is that the reachability information for the UCSD network isn't getting propagated as well as it might be. I suspect that my outgoing pings are probably reaching their destinations, but that the return ping response can't find a route back to our network. How can I test this from here (or elsewhere)? Brian Kantor UCSD Postmaster UCSD Office of Academic Computing (619) 534-6865 UCSD C-010, La Jolla, CA 92093 USA fax: 619 534 7018 brian@ucsd.edu BRIAN@UCSD ucsd!brian