zs01#@ANDREW.CMU.EDU (Zalman Stern) (03/08/87)
Dear Abby: Explanation of results in an expert system should be viewed as a method of communication between intelligent entities. Conventional groups of human experts tend to fail very badly when nobody tells anybody else what is going on. If you expect anything different to happen with artificial experts, you are very disillusioned. I think explanation facillities must be designed into the standard interface a program uses to communicate with humans and other expert systems. Of course teling too much tends to bore people also... Why not view AI as a chance to fix some of the bugs in human communication? Analysis of unknown data: I guess the idea here is to come up with an expert version of the UNIX file program. (file is a program which is executed like "file core" and it tells you "core: core file from 'loseprog'") The file program is written using very ad hoc techniques. It knows about all the magic numbers commonly used in a UNIX system, about keywords for common languages, patterns that occur in various kinds of text... As you can guess, it assumes a lot. One of the first things to realize is that there are files for which your system is not going to be able to come up with any useful information. Try feeding it 156MB of perfectly random numbers for example. One must also figure out what kind of explanations this system is going to give. In the organization category do you want explanations of the form "The file is columnized data." or "This file is in the proper format of a doctoral disertation in Computer Science at Carnegie Mellon University?" Once the program has figured out what the file is, it can easily extract the "representation, organization, and content" of the file using information from its knowledge base. So the problem has become one of designing a pattern matcher, and coming up with a knowledge base that knows about all kinds of files. Optionally, the program could try and deduce all the information desired from the file, but I think that would be much more difficult to do. Here is one way to approach this problem: Design a number of representations of a file. Examples of these are: - ASCII text in line format. (i.e. like your favorite editor does it). - A numerical dump of the file. Also, there are many formats specific to certain programs. For these, the representation is derived from firing up the appropriate program on the file. For example, if you are trying to classify a system executable, you will want to run the system debugger (or disassembler) on the file. There is an assumption here that files don't exist in a vacuum. If they did, they would be useless. Now that this is done, you are ready to start building a knowledge base. To do this you want to have a driver program that allows an expert to examine files and enter information into the system. The driver progam will need enoug "intelligence" to ask the expert why he did certain things. Of course you can have humans analyze the experts answers and encode them appropriately. Then just get a bunch of experts, and a large file system and let them hack at it... I think this may even be doable, but I doubt it would be worthwhile. Have I made too many assumptions? Is this general enough? Is this what you consider automated? Sincerely, Zalman Stern ARPA: zs01#@andrew.cmu.edu Disclaimer: I am not involved in any kind of AI research and never have been.
dave@MIMSY.UMD.EDU.UUCP (03/09/87)
>I guess the idea here is to come up with an expert version of the UNIX file >program. The problem with the `file' approach is that it assumes one has already a knowledge of the "files" he is attacking. So, this technique might become more and more useful, but only "might". >One of the first things to realize is that there are files for >which your system is not going to be able to come up with any >useful information. Try feeding it 156MB of perfectly random >numbers for example. Testing for randomness might be the first test; sure would save a lot of subsequent computing if it were random. >files. Optionally, the program could try and deduce all the information >desired from the file, but I think that would be much more difficult to do. Yep. It would be nice to take a goal-driven, top-down approach, but sometimes data-driven inference, e.g., auto-correlation, is what there is. >representation is derived from firing up the appropriate program on the file. >For example, if you are trying to classify a system executable, you will want >to run the system debugger (or disassembler) on the file. There is an >assumption here that files don't exist in a vacuum. If they did, they would >be useless. Their uselessness and whether they exist in a vacuum is an assumption. -- Dave Stoffel (703) 790-5357 seismo!mimsy!dave dave@Mimsy.umd.edu Amber Research Group, Inc.