cranston@guru.dec.com (Scott Cranston) (11/06/90)
This query is posted on both unix.admin and unix.ultrix. Please feel free to respond directly to myself or to the net. I am interested in finding out what system administrators and other users expect of system error logging. To this end I have several questions for your consideration and reply. Please expand beyond these is you like. Thank you for your time, Scott 1. How is the error log used? - Diagnose to a specific problem on a specific device or system component? - Isolate a problem to a failing subsystem? - High level monitoring of the systems health? - System crash debugging? - Are tools like grep and awk used to further reduce the data and/or generate custom reports? 2. Who uses the error log? - System manager - Programmers - General users 3. What information should the error log contain - Only summary information For example: /dev/ra189: unrecoverable hard error /dev/ra189: bad block replacement, LBN 123456 Uncorrectable memory error, phys adrs: 0x123456 panic: duplicate inode - Detailed error information. For example device or controller register contents, error message packets, stack traces, etc. Should this detailed data simply be the octal, or hex representation of the data? Or, does this detailed information need to have a descriptive translation of the individual bits done? - System context, such as time stamp, system ID, hardware type, operating system type and version. - Do different users (such as those in #2 above) have different requirements? 4. What format should the error log data be in? - Only Plain ASCII text - Only Binary data which requires a separate bit-to-text report generator tool. - Separate error logs...summary info in plain ASCII text, highly detailed in binary with report generator. 5. Compatibility with other systems? - Is syslog a defacto standard? - What are the system/vendor interoperability requirements of error logging? 6. What requirements would you make of an error log system if you were designing it?