[comp.protocols.tcp-ip] network monitoring

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (10/16/87)

I have just started to keep statistics and generate reports from our
cisco gateways.  I had been waiting for HEMP to finish, and cisco to
implement some whizbang ASN.1 monster.  Then I realized that really
that is unnecesary.  A simple program can connect to a gateway and by
issuing "show" commands get just about any piece of data I could ever
want.  The issue is not getting data.  I can easily get so much data
that I drown in paper.  The issue is what to do with it once I have
it.  So the question is, does anybody have enough experience with
network monitoring to know what kind of statistics it is useful to
collect and what kinds of reports it is useful to produce.  For the
moment, I'm collecting data hourly, and producing daily reports on
errors and other items of comparatively short-term interest.  (Of
course we don't wait for the daily report to know that a line is down.
We have monitoring tools that ping gateways and selected hosts
regularly, so we know when something is down very soon.)  I am also
collecting packet counts for all the gateways, as well as counts of
some events that might indicate that the gateways are overloaded (if
they ever happened, which they don't seem to).  From this I plan to
produce usage reports weekly or monthly, and generate long-term
trends.  (Of course we all know what the graphs will look like, but
administrators like to see graphs showing that the stuff they have
paid for is getting growing usage.)  Also I will probably try to pull
out some specific numbers like the busiest hour, and usage vs. time of
day.  But there are zillions of things like this I could do.  Does
anyone have any suggestions which ones turn out to be useful?

For your amusement, here's one of my daily error reports.  (This is
done more or less entirely in awk, by the way.)  [In case anybody
actually looks at it, a couple of comments:
  The reloads were to bring up new software.
  The large number of resets on some interfaces are mostly typical
	of 3Com Multibus Ethernet cards.  It doesn't seem to
	indicate anything wrong.  The Interlan cards on our
	newer boxes don't seem to do this.
  "lo-input" means an hour in which there was less than 10 packets
	input.  This could indicate that something has stopped
	hearing the network.  In this case it happens to be
	interfaces whose networks aren't completely in service yet.
]

 Path: topaz.rutgers.edu!aramis.rutgers.edu!hedrick
 From: hedrick@aramis.rutgers.edu
 Newsgroups: ru.netlog
 Subject: gateway errors
 Message-ID: <1907@aramis.rutgers.edu>
 Date: 16 Oct 87 04:13:08 GMT
 Sender: root@aramis.rutgers.edu
 Lines: 55

Errors for lcsr-gw

Thu 1987 Oct 15 02:55:05 reload 1
Thu 1987 Oct 15 02:55:06 Ethernet0 state up
Thu 1987 Oct 15 02:55:06 Ethernet1 state up
Thu 1987 Oct 15 02:55:06 Ethernet2 state up
Thu 1987 Oct 15 02:55:06 Ethernet3 state up

interface   address         in-errs out-errs resets hangs in-hangs lo-input

Ethernet0   128.6.4.1             0        0      0     0        0        0
Ethernet1   128.6.5.41            0       39     11     0        0        0
Ethernet2   128.6.13.1            0        0      0     0        0        0
Ethernet3   128.6.21.3            0        0      0     0        0        0

Errors for nb-gw

Thu 1987 Oct 15 02:55:17 reload 1
Thu 1987 Oct 15 02:55:17 Ethernet1 state up
Thu 1987 Oct 15 02:55:17 Ethernet2 state up
Thu 1987 Oct 15 02:55:17 Ethernet3 state up
Thu 1987 Oct 15 02:55:17 Ethernet0 state up
Thu 1987 Oct 15 02:55:17 Serial0 state up
Thu 1987 Oct 15 02:55:17 DDN-18220 state up

interface   address         in-errs out-errs resets hangs in-hangs lo-input

Serial0     128.6.254.1           3        0      0     0        0        0
DDN-18220   10.1.0.89             4        0      0     0        0        3
Ethernet0   128.6.13.39           0        0      1     0        0        0
Ethernet1   128.6.21.1           11      206     18     0        0        0
Ethernet2   128.6.4.27           33     1448     85     0        0        0
Ethernet3   128.6.7.1             0      249     15     0        0        0

Errors for eng-gw

interface   address         in-errs out-errs resets hangs in-hangs lo-input

Ethernet0   128.6.21.2            0        0      0     0        0        0
Ethernet1   128.6.3.13            0        0      0     0        0        0
Ethernet2   128.6.14.1            0        0      0     0        0        0
Ethernet3   128.6.22.1            0        0      0     0        0       23

Errors for ccis-gw

Thu 1987 Oct 15 10:55:35 Serial0 state down
Thu 1987 Oct 15 18:55:54 Serial0 state up

interface   address         in-errs out-errs resets hangs in-hangs lo-input

Serial0     128.6.253.2          11        0      0     0        0       15
Serial1     128.6.252.2           0        0      0     0        0        0
Ethernet0   128.6.7.2             0        0      0     0        0        0
Ethernet1   128.6.21.7            0        0      0     0        0        0
Ethernet3   128.6.18.1           11       12      1     0        0        0