swatt@noc.net.yale.edu (Alan S. Watt) (08/21/90)
After several sets of correspondence with cisco, I have now confirmed there is a bug in the integer encoding of SNMP variables. This bug affects 8.1, and presumably 8.0. There were other bugs in prior versions. This bug was confirmed after the 8.1(19) maintenance release, so I assume it will be fixed for the next maint. release, or 8.2. The symptoms are bad values on and immediately following a 32-bit integer overflow. The implied value on the overflow interval is significantly larger than average, and several sampling periods later what appears to be a totally random value appears. Greg Satz of cisco confirms the problem occurs when a value crosses a 0xff000000 boundary. I don't have any more details on this, so the following might be wrong, but it appears the data can be fixed by throwing out the values between the last one with the 0xff000000 bits set and the apparently bogus one. That is, assume the second discontinuity is the real overflow and the ones in between are in error. You can then replace these values with ones calculated by averaging the total increment over the interval. The other problems with 7.1 releases show up as an entirely bad sample (that is, all the variable appear to be wrong). I have found if I just throw that sample away, the next sample is a reasonable continuation of the previous one. Following are (1) an example of the problem, and (2) an "awk" script I use to check for discontinuities in my snmppoll log files. It works with the snmppoll format of NYSER SNMP distribution 3.0; I assume it can be adapted to work with other versions. - Alan S. Watt High Speed Networking, Yale University Computing and Information Systems Box 2112 Yale Station New Haven, CT 06520-2112 (203) 432-6600 X394 Watt-Alan@Yale.Edu Disclaimer: It is a violation of federal law to use this article in a manner inconsistent with this disclaimer. ====================================================================== Fri Aug 10 18:09:18 1990 ifOutOctets_2 0xff508f2a 4283469610 Fri Aug 10 18:24:23 1990 ifOutOctets_2 0xff7d315a 4286394714 Fri Aug 10 18:39:30 1990 ifOutOctets_2 0x00c060c8 12607688 Fri Aug 10 18:54:36 1990 ifOutOctets_2 0x00df8442 14648386 Fri Aug 10 19:09:42 1990 ifOutOctets_2 0x00f63868 16136296 Fri Aug 10 19:24:49 1990 ifOutOctets_2 0x00346748 3434312 Fri Aug 10 19:39:53 1990 ifOutOctets_2 0x0051fbdc 5372892 Fri Aug 10 19:55:01 1990 ifOutOctets_2 0x005cfb52 6093650 If you look at the increments at each interval, you get: record 2: diff = 2925104 record 3: diff = 21180270 record 4: diff = 2040698 record 5: diff = 1487910 record 6: diff = 4282265312 record 7: diff = 1938580 record 8: diff = 720758 The increment between samples 2 and 3 (when the overflow occurred) is within ethernet capacity, but is so much higher than the average for the entire day (not shown) that it is likely to be an error. If sample 3 is a genuine overflow, the value at sample 6 is obviously wrong. Following is the "corrected" sample, done by assuming the real overflow occurred at record 6, and that records 3-5 should be replaced by a calculated average increment between records 2 and 6: Fri Aug 10 18:09:18 1990 ifOutOctets_2 0xff508f2a 4283469610 Fri Aug 10 18:24:23 1990 ifOutOctets_2 0xff7d315a 4286394714 Fri Aug 10 18:39:30 1990 ifOutOctets_2 0xffaafed5 4289396437 Fri Aug 10 18:54:36 1990 ifOutOctets_2 0xffd8cc50 4292398160 Fri Aug 10 19:09:42 1990 ifOutOctets_2 0x000699cb 432587 Fri Aug 10 19:24:49 1990 ifOutOctets_2 0x00346748 3434312 Fri Aug 10 19:39:53 1990 ifOutOctets_2 0x0051fbdc 5372892 Fri Aug 10 19:55:01 1990 ifOutOctets_2 0x005cfb52 6093650 Which gives increments of: record 2: diff = 2925104 record 3: diff = 3001723 record 4: diff = 3001723 record 5: diff = 3001723 record 6: diff = 3001725 record 7: diff = 1938580 record 8: diff = 720758 ====================================================================== #! /bin/sh # # Format of snmppoll data: # # Sun Aug 19 02:02:09 1990 _mgmt_mib_system_sysUpTime_0 \ # 0xcb58b9c 213224348 # # $1 Day of week ("Mon", "Tue", ...) # $2 Month ("Jan", "Feb", ...) # $3 Day of month (1, 2, 3, ...) # $4 Time of day (HH:MM:SS [24-hour clock]) # $5 Year # $6 variable name # $7 value in hex # $8 value in decimal # awk ' BEGIN { params["bogus"] = 0 } { # reset NR when filename changes # also clear out stored previous values if (FILENAME != prevFILENAME) { NR = 1 prevFILENAME = FILENAME for (var in params) params[var] = 0 } # Get the variable name and value varName = $6 varValue = $8 varTime = $2 " " $3 " " $4 if (varValue < params[varName]) { printf "discontinuity: '%s':%d; at %s; for '%s'\n", \ FILENAME, NR, varTime, varName } params[varName] = varValue } ' "$@" ====================================================================== This script, when run on the original sample data above, produces: discontinuity: sample.dat:3; at Aug 10 18:39:30; for ifOutOctets_2 discontinuity: sample.dat:6; at Aug 10 19:24:49; for ifOutOctets_2