sher@rochester.UUCP (11/17/85)
In a recent discussion of my work with a famous and capable vision researcher the issue of bottom up vs top down processing came up. I was surprized to find myself saying that I don't think the issue was relevant to my research. To make this article intelligible to others perhaps I should define terms. Bottom up processing is taking a set of observed data and applying a serries of not individually intelligent transformations to the data to get an interesting result. An example of this is taking an an image, applying a first derivative to it, and then doing shape from shading relaxation to derive a field of surface orientations on the image. Top down processing is a process of making hypotheses about the high level structure of an image and verifying or eliminating the hypotheses based on data in the image. LL1 parsing is a simplistic form of top down processing. I think that I have got these definitions correct but I may have screwed them up somehow. Feel free to correct me by mail or news posting. My research is on applying a variety of models for image structure any of which can apply to the image and several can apply in different parts of the image. I would then use the information in these models and information on the reliability of these models to arrive at an interpretation of the image. An example of two models are one that assumes that surfaces have uniform reflectance within a region while another model may have textured reflectance within a region. An image may have regions with uniform reflectance and regions with textured reflectance within them. Now I have the context built I can discuss the issues. It has been a bone of contention in the vision community whether different aspects of visual processing are bottom up or top down. There are two issues in fact: 1) how does the brain do it? 2) what is the best way to do it? I believe that in low to intermediate level vision the bottom uppers dominate but in high level vision the top downers are dominant. I would like to put foward a contention: I feel that the issue of bottom up to top down has the same relationship to vision research as the issue of control structure has to theorem proving. This means that much research can be done without even considering the issue of top down to bottom up. Any research on the mathematical relationship between an image and its structure can be done without the issue of bottom up to top down coming up. I would like to see some discussion of this. I wouldn't mind seeing how this relates to peoples research. If I confused people simply tell me and I will try to explain myself more fully. -David Sher sher@rochester seismo!rochester!sher -- -David Sher sher@rochester seismo!rochester!sher
sher@rochester.UUCP (11/17/85)
From: David Sher <sher> In a recent discussion of my work with a famous and capable vision researcher the issue of bottom up vs top down processing came up. I was surprized to find myself saying that I don't think the issue was relevant to my research. To make this article intelligible to others perhaps I should define terms. Bottom up processing is taking a set of observed data and applying a serries of not individually intelligent transformations to the data to get an interesting result. An example of this is taking an an image, applying a first derivative to it, and then doing shape from shading relaxation to derive a field of surface orientations on the image. Top down processing is a process of making hypotheses about the high level structure of an image and verifying or eliminating the hypotheses based on data in the image. LL1 parsing is a simplistic form of top down processing. I think that I have got these definitions correct but I may have screwed them up somehow. Feel free to correct me by mail or news posting. My research is on applying a variety of models for image structure any of which can apply to the image and several can apply in different parts of the image. I would then use the information in these models and information on the reliability of these models to arrive at an interpretation of the image. An example of two models are one that assumes that surfaces have uniform reflectance within a region while another model may have textured reflectance within a region. An image may have regions with uniform reflectance and regions with textured reflectance within them. Now I have the context built I can discuss the issues. It has been a bone of contention in the vision community whether different aspects of visual processing are bottom up or top down. There are two issues in fact: 1) how does the brain do it? 2) what is the best way to do it? I believe that in low to intermediate level vision the bottom uppers dominate but in high level vision the top downers are dominant. I would like to put foward a contention: I feel that the issue of bottom up to top down has the same relationship to vision research as the issue of control structure has to theorem proving. This means that much research can be done without even considering the issue of top down to bottom up. Any research on the mathematical relationship between an image and its structure can be done without the issue of bottom up to top down coming up. I would like to see some discussion of this. I wouldn't mind seeing how this relates to peoples research. If I confused people simply tell me and I will try to explain myself more fully. -David Sher sher@rochester seismo!rochester!sher -- -David Sher sher@rochester seismo!rochester!sher
ht@epistemi.UUCP (Henry Thompson) (11/20/85)
I don't know about vision, but in natural language and/or speech processing work, I find it more useful to consider the instructive versus selective interaction question as opposed to the top-down versus bottom-up one. Making that shift places the emphasis where I think it belongs, namely on what the relationship is between various 'levels' of processing, and avoids the implication often conveyed by the use of the appelation 'bottom-up' that no 'higher level' processing is involved at all. If I understand your final point correctly, it is that you don't see any difference in outcome turning on the question of what sort of cross-level interaction is employed. I don't know of any counter-examples, but it still seems to me that more than just efficiency/resource issues are involved - the intellectual style of the two positions is quite different, and this encourages different sorts of exploration...
sher@rochester.UUCP (12/16/85)
In this message I would like to present the discussians that resulted from my posting on the issue of top down vs bottom up. I would like to thank all those who responded to my message as the resulting discussion helped me get some ideas straight about the issue. Anyway here it is: The original message was: In a recent discussion of my work with a famous and capable vision researcher the issue of bottom up vs top down processing came up. I was surprized to find myself saying that I don't think the issue was relevant to my research. To make this article intelligible to others perhaps I should define terms. Bottom up processing is taking a set of observed data and applying a serries of not individually intelligent transformations to the data to get an interesting result. An example of this is taking an an image, applying a first derivative to it, and then doing shape from shading relaxation to derive a field of surface orientations on the image. Top down processing is a process of making hypotheses about the high level structure of an image and verifying or eliminating the hypotheses based on data in the image. LL1 parsing is a simplistic form of top down processing. I think that I have got these definitions correct but I may have screwed them up somehow. Feel free to correct me by mail or news posting. My research is on applying a variety of models for image structure any of which can apply to the image and several can apply in different parts of the image. I would then use the information in these models and information on the reliability of these models to arrive at an interpretation of the image. An example of two models are one that assumes that surfaces have uniform reflectance within a region while another model may have textured reflectance within a region. An image may have regions with uniform reflectance and regions with textured reflectance within them. Now I have the context built I can discuss the issues. It has been a bone of contention in the vision community whether different aspects of visual processing are bottom up or top down. There are two issues in fact: 1) how does the brain do it? 2) what is the best way to do it? I believe that in low to intermediate level vision the bottom uppers dominate but in high level vision the top downers are dominant. I would like to put foward a contention: I feel that the issue of bottom up to top down has the same relationship to vision research as the issue of control structure has to theorem proving. This means that much research can be done without even considering the issue of top down to bottom up. Any research on the mathematical relationship between an image and its structure can be done without the issue of bottom up to top down coming up. I would like to see some discussion of this. I wouldn't mind seeing how this relates to peoples research. If I confused people simply tell me and I will try to explain myself more fully. -David Sher sher@rochester seismo!rochester!sher I received in responce from Paul Chou at the University of Rochester: From: Paul Chou <chou> Subject: topdown vs bottomup topdown vs. bottomup Vision tasks can not be easily classified by these two words. There are other issues related to these two words but cannot be treated as interpretations of this concept: data/demand driven prior/observed information serial/parallel processing Any vision system can be characterized by these four issues. In most time, topdown/demand-driven/prior-info/serial-processing are coupled together, and same for bottomup/data-driven/observed-info/parallel-processing. But other combinations are also plausible. Whether a vision system should have a clean cut for each of the issues? Connectionist formalism seems mess everything up in a big hirachical network while most working computer vision systems have a clear boundary. Can researches be done without worrying about these issues? The answer is obvious. But how much contribution can this kind of researches give to the computer vision community is not clear. I think the problem is how we can advance the vision research more efficiently. I replied as follows: From sher Sun Nov 17 20:28:09 1985 To: Paul Chou <chou> Subject: Re: topdown vs bottomup Status: RO Interesting. I would say that top-down vs bottom up and demand driven are control issues while serial vs parallel is implementational issue and prior vs observed is a functional issue in so far as it makes any sense at all. Any real vision system will have to deal with all these issues at least in some token way. The question that interests me is how much can these issues be addressed separately. I intend to address the functional issues and finesse the control and implementational issues for the most part. I think that these issues can be safely ignored until I get the functional foundations in place. Are you sure that this actually categorizes the important features of computer vision systems. It seems like a real hack to me. Have fun: -David He replied to this with: From: Paul Chou <chou> Subject: Re: topdown vs bottomup No, I don't agree. I don't consider them as control/implementation issues. Consider the preattentive vision and the serial searching (focus attention, eye movement), those are the mechanisms that a vision system tries to maximize its utility. It is a result from the given environment and the limitation of the vision hardware plus millions of years evolution. In short, those issues are not "very" seperable in a vision system. I don't understand what you mean by "functional issue". In a vision system, it makes a lot of differences how the prior knowledge comes in. It can be hadrwired in the low level modules, so some system is more sensitive in some circumstances and poor in others; it can come from high level knowledge base to direct/hypothesize the low level processing. A vision system, I believe, should take all the issues into account at the same time. Optimizing each issue individually can not guarantee achieving In order to continue the discussion, I think we should formalize the problem, define the issues, so that we can understand each other. Cheers! p. chou Here I respond to Paul (He has not read this yet and I hope he reads this far!) for the first time: From: David Sher Subject: Re: Topdown vs Bottomup? I feel that there are three levels at which a vision problem can be viewed: 1) The relationship between the observed input and the desired result ignoring computational issues: That is: Given an input image or other kinds of sense data what would you like to output if you had an arbitrarily powerful computer and an arbitrarilly long time to compute it with. A large percentage of my research lives at this level. 2) The function of the observed image you will compute taking into account current limitations on hardware and time: That is: Accepting the fact that we only have finite time and computational power available what function of the input will we compute in an attempt to aproximate the output we would really like if we had the time to compute it. 3) How the function that we are computing of the input image is computed: That is: What is the precise algorithm to calculate the function of the image we wish to calculate and what are the problems with its accuracy and implementation. Practically none of my research addresses this issue. The categories you presented are and the level I would put them at are: Category Level top down/bottom up 2 data/demand driven 3 prior/observed information 1 serial/parallel processing 3 I am not sure of this however since both mine and your classification scheme is rather fuzzy. Anyway more food for thought -David I also entered into a dialogue with Mr Larry West at the Institute for Cognitive Science in UC San Diego: [ I seem to have lost his letter authorizing me to publish this letter. It may be that I never recieved one. If so I apologize for this unauthorized publishing ] From: west@nprdc.arpa (Larry West) Subject: Re: Top down vs Bottom up? Organization: UC San Diego: Institute for Cognitive Science In article <13220@rochester.UUCP> you write: > I think that I have got these definitions correct but I may have > screwed them up somehow. Feel free to correct me by mail or news > posting. Sounded fine. You could have thrown in a few buzzwords like "frames" or "schemata", but these would be more in the way of explanation than definition. > Now I have the context built I can discuss the issues. It has been a > bone of contention in the vision community whether different aspects > of visual processing are bottom up or top down. There are two issues > in fact: > 1) how does the brain do it? > 2) what is the best way to do it? > > I believe that in low to intermediate level vision the bottom uppers > dominate but in high level vision the top downers are dominant. This is what you would expect of course: the low-to-intermediate vision people are dealing with the question (approximately): "How does the brain turn this array of photons into usable information"? which is naturally a bottom-up question. Similarly for the high-level vision people (with whom I am not familiar). In other words, the viewpoints are related to the problems being addressed. > I would like to put foward a contention: I feel that the issue of > bottom up to top down has the same relationship to vision research as > the issue of control structure has to theorem proving. This analogy doesn't convey much to me. Perhaps it will upon more reflection. > This means that much research can be done without even considering the > issue of top down to bottom up. Any research on the mathematical > relationship between an image and its structure can be done without > the issue of bottom up to top down coming up. (or coming down?) I disagree here, and on a fundamental level. I think. You speak of the mathematical relationship as a simple thing. What if it is a dynamic process which must involve both bottom up and top-down communication? Have you looked at the book ``Parallel Models of Associative Memory'' edited by Geoff Hinton and James A. Anderson (1981, Lawrence Erlbaum)? Or other works in the Connectionist line? My point would be that both top-down and bottom-up are occurring in vision, perhaps more of one in some places, e.g., more bottom-up in the low-level area. But I think there would still be some top-down communication except perhaps at the retina (even then, I suppose saccades ... well, never mind). Anyway, think of the connections to the Lateral Geniculate body from the primary visual cortex (area 17). There are more "backwards" connections (from area 17 to LGN) than "forwards". And the LGN is fairly early in visual processing (in some obvious sense). To say that this communication is relatively unimportant is to say that the brain is spending a lot of effort (long-distance connections) doing unimportant things. I don't think that the brain is perfect, but I tend to assume that its implementation of {vision, intelligence, hearing...} is pretty efficient. But perhaps I am misunderstanding the problem you are addressing...? > -David Sher > sher@rochester > seismo!rochester!sher Larry West USA+619-452-6771 Institute for Cognitive Science non-business hrs: 452-2256 UC San Diego (mailcode C-015) La Jolla, CA 92093 USA UUCP: {ucbvax,ihnp4,sdcrdcf,decvax,gatech}!sdcsvax!sdcsla!west or {sun,ulysses}!sdcsla!west ARPA: <west@nprdc.ARPA> or <west@ucsd.ARPA> DOMAIN: <west@nprdc.mil> or <west@csl.ucsd.edu> I replied to this friendly missive with: From sher Sun Nov 24 16:52:57 1985 Subject: Re: Top down vs Bottom up? Here in Jerry Feldman's land I could not in any way avoid seeing a considerable bit of connectionist work. However the human vision problem is not the one I am adressing. I am sensitive to human vision issues, since I also believe that the human brain processes output in quite efficient ways. I am interested in systems whose limitations may be different than the human visual system. To be more clear, I am interested in working out how a basically unlimmitted system would evaluate visual data and then seeing the best way to approximate this evaluation using the available computational resources. One of the points I was trying to make is when one is looking for optimal solutions in an unlimitted systems the issue of top down vs bottom up ceases to be relevant. I also dialogued (who says you can't noun verbs?) with Henry Thompson from the Edinburgh University >From: ht@epistemi.UUCP (Henry Thompson) Organization: Epistemics, Edinburgh U., Scotland I don't know about vision, but in natural language and/or speech processing work, I find it more useful to consider the instructive versus selective interaction question as opposed to the top-down versus bottom-up one. Making that shift places the emphasis where I think it belongs, namely on what the relationship is between various 'levels' of processing, and avoids the implication often conveyed by the use of the appelation 'bottom-up' that no 'higher level' processing is involved at all. If I understand your final point correctly, it is that you don't see any difference in outcome turning on the question of what sort of cross-level interaction is employed. I don't know of any counter-examples, but it still seems to me that more than just efficiency/resource issues are involved - the intellectual style of the two positions is quite different, and this encourages different sorts of exploration... My reply to this article was a simple expression of ignorance: From: David Sher <sher> Subject: This is my reply to the previous article I am afraid that I don't recognize the terms : instructive versus selective interaction Can you please define these terms for me because they may suggest a useful paradigm for my work. Also where did they come from? (AI? philosophy? linguistics?). If you think that my ignorance is a wide spread phenomenom then you may want to post an answer to net.ai. Thank you And this is his graceful reply to me: From: Henry Thompson <seismo!epistemi.ed.ac.uk!mcvax!ht> Subject: Re: Top down vs Bottom up? I think I first encountered the selective/instructional distinction in biology/philosophy (in the works of Umberto Maturana, who is a co-author with Letvin and Pitts on 'What the Frog's eye ...'). The basic idea is, to put it informally, can processes dependent on 'higher level' information actually guide/instruct/modify 'lower level' processes, or are they restricted to selecting among alternatives at that level. Examples from biology: Bone cells are long elipsoids. It is observed that the long axis is oriented parallel to the major stress axis of the bone. Is this because during bone growth cells are somehow sensitive to stress and grow their long axes accordingly (an instructive account)? Answer (I'm told): No, cells are initially randomly distributed in orientation, but those not lined up with stress lines disappear. The immune system, long assumed to be instructional (the antigen somehow directs the production of appropriate antibodies) is now known to be selective (all possible antibodies are already present in the body - an antigen simply encourages differential reproduction of the 'right' antibody(s)). The classic line in speech recognition was that syntactic and/or semantic expectations could be used to 'tell the front end' to look for certain acoustic features (an instructional account). This turned out to be very hard to manage in practice. We are now more inclined to try selective approaches, especially as the amount of computing power available has grown a lot. Hope this helps. ht I also received this interesting commentary from Jay Glicksman at ADS: From: glick@aids-unix.arpa (Jay Glicksman) Subject: TD vs. BU processing in Vision I agree that "much research can be done without even considering the issue of top down to bottom up." However this does leave out an essential component of the vision system. Moreover, strictly TD or BU systems do not cover the entire range of possibilities. You state: I believe that in low to intermediate level vision the bottom uppers dominate but in high level vision the top downers are dominant. It is my impression that in "high level" vision (aka "model-based vision), systems that rely on feedback are common. They may take the form of hypothesize and test, constraint satisfaction, cycle of perception, etc. but might be generally categorized as Middle Out. That is not to say that there aren't strictly TD model-based systems (Bolles' Verification Vision is the stereotype for that). As someone whose research often concentrates on control for vision systems (eg. see J. Glicksman, Procedural Adequacy in an Image Understanding System, Proc. 5th CSCSI/SCEIO Conf., London, Ont. 1984, pp. 44-49) I will agree with the original statement (above) but think that the issue of control is an important one in dealing with the "vision problem". Jay Glicksman (glick@aids-unix) I also received this bit of interesting commentary from Mr. Randy Boys: From: Randy_Boys <boys%ti-eg.csnet@CSNET-RELAY.ARPA> Subject: Hypotheses vs. Bottom-Up Some thoughts that your VisionList posting stimulated... Although the distinction you make between top-down and bottom-up processing, top-down being more or less equivalent to hypothesis testing, is a bit dichotic, it certainly gets one in the ballpark. My only complaint is that sometimes the information known about an object or scene is definite, yet not low-level. As an example, one can "know" that a scene is a natural landscape or that an object is a quadrupedal mammal. These "facts" are not hypotheses, yet they are not necessarily always built up from a number of low level primitives. They fall between your distinction of top-down and bottom-up. Where exactly that they lie, however, is of little importance to the actual use of the information, which I think is what you were getting at. As to the overall discussion of the need (by some researchers) to consider all vision processing in terms of T-D vs. B-U, I agree that such discussions are not always useful. I would contend that such a discrimination is useful when researchers care to share differing orientations to a problem, as this may be their only common ground...at first. More often than not, however, these terms are just being thrown around as buzzwords by those who do not have full insight as to the workings of the system under consideration. Sometimes, the terms are used with undue connotations, such as those who contend that their system is "more robust" because of its "high-level coverage." Just rambling, Randy Boys And from Saumya Debray of AI list fame who I hope won't mind my posting of his message since he didn't get back to me of SUNY Stonybrook I received this interesting response: From: seismo!philabs!sbcs!debray (Saumya Debray) Subject: Re: Top down vs Bottom up? I find it conceptually easier to think of bottom-up reasoning as forward chaining, i.e. repeatedly applying a set of inference rules to a set of axioms until the goal has been achieved. This goal might be a theorem to be proved (in theorem proving), or a relation to be computed (in database systems). On the other hand, top-down reasoning corresponds to backward chaining, i.e. working back from a goal (by applying inference rules "backwards"), generating subproblems that have to be solved, until all subproblems can be solved directly from the axioms of the system. Ok, I've defined the terms in a setting that seems natural to me, which is theorem proving. What does this have to do with vision research? Well, any computation can (from Church's thesis) be described within a formal system, e.g. any Turing Machine computation can be described in the lambda calculus. Thus, for example, the permissible initial states can correspond to the axioms of the formal system, the state transition rules correspond to the rules of inference, and the states generated by applying the transition rules to permissible initial states correspond to theorems of the system. Now, given an arbitrary state S, we can either start at an initial state and repeatedly apply transition rules and try to generate the state S (bottom up); or we can start at S and try to work backwards, trying to find an initial state from which S can ultimately be obtained (top down). Does this make sense? -Saumya Debray ... philabs!sbcs!debray SUNY at Stony Brook debray@sbcs.csnet I replied to this in this rather confused message: From: sher Subject: Re: Top down vs Bottom up? I am not sure how it relates. In vision being goal directed generally has to do with what data is ignored (any that doesn't help settle the goal). I would guess that Church's thesis is a blind alley since any program can be expressed in the lambda calculus an infinite number of ways (I think) and perhaps could be expressed as either top down or bottom up. Mind if I print your response along with a collection of responses I am constructing? I am hoping for some cross fertilization. -David And so, this is the discussion resulting from my message on top down vs bottom up. I hope others find this kind of discussion as stimulating as I do. I thank all participants and the entire net for making this forum possible and invite response. I will repost with permission all responses to me (and even maybe the net?) in the same manner (or a different manner if a better way is suggested to me) the response on this issue. If you have something to say do not be shy, say it! Have fun! -David Sher sher@rochester seismo!rochester!sher -- -David Sher sher@rochester seismo!rochester!sher