swatt@noc.net.yale.edu (Alan S. Watt) (08/21/90)
After several sets of correspondence with cisco, I have now confirmed
there is a bug in the integer encoding of SNMP variables. This bug
affects 8.1, and presumably 8.0. There were other bugs in prior
versions. This bug was confirmed after the 8.1(19) maintenance release,
so I assume it will be fixed for the next maint. release, or 8.2.
The symptoms are bad values on and immediately following a 32-bit
integer overflow. The implied value on the overflow interval is
significantly larger than average, and several sampling periods later
what appears to be a totally random value appears.
Greg Satz of cisco confirms the problem occurs when a value crosses a
0xff000000 boundary. I don't have any more details on this, so the
following might be wrong, but it appears the data can be fixed by
throwing out the values between the last one with the 0xff000000 bits
set and the apparently bogus one. That is, assume the second
discontinuity is the real overflow and the ones in between are in
error. You can then replace these values with ones calculated by
averaging the total increment over the interval.
The other problems with 7.1 releases show up as an entirely bad sample
(that is, all the variable appear to be wrong). I have found if I just
throw that sample away, the next sample is a reasonable continuation of
the previous one.
Following are (1) an example of the problem, and (2) an "awk" script I
use to check for discontinuities in my snmppoll log files. It works
with the snmppoll format of NYSER SNMP distribution 3.0; I assume it
can be adapted to work with other versions.
- Alan S. Watt
High Speed Networking, Yale University
Computing and Information Systems
Box 2112 Yale Station
New Haven, CT 06520-2112
(203) 432-6600 X394
Watt-Alan@Yale.Edu
Disclaimer: It is a violation of federal law to use this article in
a manner inconsistent with this disclaimer.
======================================================================
Fri Aug 10 18:09:18 1990 ifOutOctets_2 0xff508f2a 4283469610
Fri Aug 10 18:24:23 1990 ifOutOctets_2 0xff7d315a 4286394714
Fri Aug 10 18:39:30 1990 ifOutOctets_2 0x00c060c8 12607688
Fri Aug 10 18:54:36 1990 ifOutOctets_2 0x00df8442 14648386
Fri Aug 10 19:09:42 1990 ifOutOctets_2 0x00f63868 16136296
Fri Aug 10 19:24:49 1990 ifOutOctets_2 0x00346748 3434312
Fri Aug 10 19:39:53 1990 ifOutOctets_2 0x0051fbdc 5372892
Fri Aug 10 19:55:01 1990 ifOutOctets_2 0x005cfb52 6093650
If you look at the increments at each interval, you get:
record 2: diff = 2925104
record 3: diff = 21180270
record 4: diff = 2040698
record 5: diff = 1487910
record 6: diff = 4282265312
record 7: diff = 1938580
record 8: diff = 720758
The increment between samples 2 and 3 (when the overflow occurred) is
within ethernet capacity, but is so much higher than the average for
the entire day (not shown) that it is likely to be an error. If sample
3 is a genuine overflow, the value at sample 6 is obviously wrong.
Following is the "corrected" sample, done by assuming the real overflow
occurred at record 6, and that records 3-5 should be replaced by a
calculated average increment between records 2 and 6:
Fri Aug 10 18:09:18 1990 ifOutOctets_2 0xff508f2a 4283469610
Fri Aug 10 18:24:23 1990 ifOutOctets_2 0xff7d315a 4286394714
Fri Aug 10 18:39:30 1990 ifOutOctets_2 0xffaafed5 4289396437
Fri Aug 10 18:54:36 1990 ifOutOctets_2 0xffd8cc50 4292398160
Fri Aug 10 19:09:42 1990 ifOutOctets_2 0x000699cb 432587
Fri Aug 10 19:24:49 1990 ifOutOctets_2 0x00346748 3434312
Fri Aug 10 19:39:53 1990 ifOutOctets_2 0x0051fbdc 5372892
Fri Aug 10 19:55:01 1990 ifOutOctets_2 0x005cfb52 6093650
Which gives increments of:
record 2: diff = 2925104
record 3: diff = 3001723
record 4: diff = 3001723
record 5: diff = 3001723
record 6: diff = 3001725
record 7: diff = 1938580
record 8: diff = 720758
======================================================================
#! /bin/sh
#
# Format of snmppoll data:
#
# Sun Aug 19 02:02:09 1990 _mgmt_mib_system_sysUpTime_0 \
# 0xcb58b9c 213224348
#
# $1 Day of week ("Mon", "Tue", ...)
# $2 Month ("Jan", "Feb", ...)
# $3 Day of month (1, 2, 3, ...)
# $4 Time of day (HH:MM:SS [24-hour clock])
# $5 Year
# $6 variable name
# $7 value in hex
# $8 value in decimal
#
awk '
BEGIN {
params["bogus"] = 0
}
{
# reset NR when filename changes
# also clear out stored previous values
if (FILENAME != prevFILENAME) {
NR = 1
prevFILENAME = FILENAME
for (var in params)
params[var] = 0
}
# Get the variable name and value
varName = $6
varValue = $8
varTime = $2 " " $3 " " $4
if (varValue < params[varName]) {
printf "discontinuity: '%s':%d; at %s; for '%s'\n", \
FILENAME, NR, varTime, varName
}
params[varName] = varValue
}
' "$@"
======================================================================
This script, when run on the original sample data above, produces:
discontinuity: sample.dat:3; at Aug 10 18:39:30; for ifOutOctets_2
discontinuity: sample.dat:6; at Aug 10 19:24:49; for ifOutOctets_2