[net.ai] top down vs bottom up?

sher@rochester.UUCP (11/17/85)

In a recent discussion of my work with a famous and capable vision researcher
the issue of bottom up vs top down processing came up.  I was
surprized to find myself saying that I don't think the issue was
relevant to my research.  To make this article intelligible to others
perhaps I should define terms.  

Bottom up processing is taking a set of observed data and applying
a serries of not individually intelligent transformations to the data
to get an interesting result.  An example of this is taking an 
an image, applying a first derivative to it, and then doing shape from
shading relaxation to derive a field of surface orientations on the

Top down processing is a process of making hypotheses about the high
level structure of an image and verifying or eliminating the
hypotheses based on data in the image.  LL1 parsing is a simplistic
form of top down processing.  

I think that I have got these definitions correct but I may have
screwed them up somehow.  Feel free to correct me by mail or news

My research is on applying a variety of models for image structure any
of which can apply to the image and several can apply in different
parts of the image.  I would then use the information in these models
and information on the reliability of these models to arrive at an
interpretation of the image.  An example of two models are one that
assumes that surfaces have uniform reflectance within a region while
another model may have textured reflectance within a region.  An image
may have regions with uniform reflectance and regions with textured
reflectance within them.  

Now I have the context built I can discuss the issues.  It has been a
bone of contention in the vision community whether different aspects
of visual processing are bottom up or top down.  There are two issues
in fact: 
1)	how does the brain do it?
2)	what is the best way to do it?

I believe that in low to intermediate level vision the bottom uppers
dominate but in high level vision the top downers are dominant.  

I would like to put foward a contention:  I feel that the issue of
bottom up to top down has the same relationship to vision research as
the issue of control structure has to theorem proving.

This means that much research can be done without even considering the
issue of top down to bottom up.  Any research on the mathematical
relationship between an image and its structure can be done without
the issue of bottom up to top down coming up.  

I would like to see some discussion of this.  I wouldn't mind seeing
how this relates to peoples research.  If I confused people simply
tell me and I will try to explain myself more fully.  

-David Sher

-David Sher

sher@rochester.UUCP (11/17/85)

From: David Sher  <sher>

In a recent discussion of my work with a famous and capable vision researcher
the issue of bottom up vs top down processing came up.  I was
surprized to find myself saying that I don't think the issue was
relevant to my research.  To make this article intelligible to others
perhaps I should define terms.  

Bottom up processing is taking a set of observed data and applying
a serries of not individually intelligent transformations to the data
to get an interesting result.  An example of this is taking an 
an image, applying a first derivative to it, and then doing shape from
shading relaxation to derive a field of surface orientations on the

Top down processing is a process of making hypotheses about the high
level structure of an image and verifying or eliminating the
hypotheses based on data in the image.  LL1 parsing is a simplistic
form of top down processing.  

I think that I have got these definitions correct but I may have
screwed them up somehow.  Feel free to correct me by mail or news

My research is on applying a variety of models for image structure any
of which can apply to the image and several can apply in different
parts of the image.  I would then use the information in these models
and information on the reliability of these models to arrive at an
interpretation of the image.  An example of two models are one that
assumes that surfaces have uniform reflectance within a region while
another model may have textured reflectance within a region.  An image
may have regions with uniform reflectance and regions with textured
reflectance within them.  

Now I have the context built I can discuss the issues.  It has been a
bone of contention in the vision community whether different aspects
of visual processing are bottom up or top down.  There are two issues
in fact: 
1)	how does the brain do it?
2)	what is the best way to do it?

I believe that in low to intermediate level vision the bottom uppers
dominate but in high level vision the top downers are dominant.  

I would like to put foward a contention:  I feel that the issue of
bottom up to top down has the same relationship to vision research as
the issue of control structure has to theorem proving.

This means that much research can be done without even considering the
issue of top down to bottom up.  Any research on the mathematical
relationship between an image and its structure can be done without
the issue of bottom up to top down coming up.  

I would like to see some discussion of this.  I wouldn't mind seeing
how this relates to peoples research.  If I confused people simply
tell me and I will try to explain myself more fully.  

-David Sher

-David Sher

ht@epistemi.UUCP (Henry Thompson) (11/20/85)

I don't know about vision, but in natural language and/or
speech processing work, I find it more useful to consider the
instructive versus selective interaction question as opposed
to the top-down versus bottom-up one.

Making that shift places the emphasis where I think it belongs,
namely on what the relationship is between various 'levels' of
processing, and avoids the implication often conveyed by the use
of the appelation 'bottom-up' that no 'higher level' processing
is involved at all.

If I understand your final point correctly, it is that you
don't see any difference in outcome turning on the question of
what sort of cross-level interaction is employed.  I don't know of
any counter-examples, but it still seems to me that more than just
efficiency/resource issues are involved - the intellectual style
of the two positions is quite different, and this encourages different
sorts of exploration...

sher@rochester.UUCP (12/16/85)

In this message I would like to present the discussians that resulted 
from my posting on the issue of top down vs bottom up.  I would like to
thank all those who responded to my message as the resulting discussion
helped me get some ideas straight about the issue.  Anyway here it is:

The original message was:

In a recent discussion of my work with a famous and capable vision researcher
the issue of bottom up vs top down processing came up.  I was
surprized to find myself saying that I don't think the issue was
relevant to my research.  To make this article intelligible to others
perhaps I should define terms.  

Bottom up processing is taking a set of observed data and applying
a serries of not individually intelligent transformations to the data
to get an interesting result.  An example of this is taking an 
an image, applying a first derivative to it, and then doing shape from
shading relaxation to derive a field of surface orientations on the

Top down processing is a process of making hypotheses about the high
level structure of an image and verifying or eliminating the
hypotheses based on data in the image.  LL1 parsing is a simplistic
form of top down processing.  

I think that I have got these definitions correct but I may have
screwed them up somehow.  Feel free to correct me by mail or news

My research is on applying a variety of models for image structure any
of which can apply to the image and several can apply in different
parts of the image.  I would then use the information in these models
and information on the reliability of these models to arrive at an
interpretation of the image.  An example of two models are one that
assumes that surfaces have uniform reflectance within a region while
another model may have textured reflectance within a region.  An image
may have regions with uniform reflectance and regions with textured
reflectance within them.  

Now I have the context built I can discuss the issues.  It has been a
bone of contention in the vision community whether different aspects
of visual processing are bottom up or top down.  There are two issues
in fact: 
1)	how does the brain do it?
2)	what is the best way to do it?

I believe that in low to intermediate level vision the bottom uppers
dominate but in high level vision the top downers are dominant.  

I would like to put foward a contention:  I feel that the issue of
bottom up to top down has the same relationship to vision research as
the issue of control structure has to theorem proving.

This means that much research can be done without even considering the
issue of top down to bottom up.  Any research on the mathematical
relationship between an image and its structure can be done without
the issue of bottom up to top down coming up.  

I would like to see some discussion of this.  I wouldn't mind seeing
how this relates to peoples research.  If I confused people simply
tell me and I will try to explain myself more fully.  

-David Sher

I received in responce from Paul Chou at the University of Rochester:
From: Paul Chou  <chou>
Subject: topdown vs bottomup

topdown vs. bottomup

Vision tasks can not be easily classified by these two words. There are
other issues related to these two words but cannot be treated as 
interpretations of this concept:
data/demand driven
prior/observed information
serial/parallel processing

Any vision system can be characterized by these four issues. In most time,
topdown/demand-driven/prior-info/serial-processing are coupled together, and
same for bottomup/data-driven/observed-info/parallel-processing. But other
combinations are also plausible. 

Whether a vision system should have a clean cut for each of the issues? 
Connectionist formalism seems mess everything up in a big hirachical network
while most working computer vision systems have a clear boundary.

Can researches be done without worrying about these issues? The answer is
obvious. But how much contribution can this kind of researches give to the 
computer vision community is not clear. I think the problem is how 
we can advance the vision research more efficiently.

I replied as follows:

From sher Sun Nov 17 20:28:09 1985
To: Paul Chou  <chou>
Subject: Re:  topdown vs bottomup
Status: RO

Interesting.  I would say that top-down vs bottom up and demand driven
are control issues while serial vs parallel is implementational issue
and prior vs observed is a functional issue in so far as it makes any
sense at all.  Any real vision system will have to deal with all these
issues at least in some token way.  The question that interests me is
how much can these issues be addressed separately.  I intend to address
the functional issues and finesse the control and implementational issues
for the most part.  I think that these issues can be safely ignored until
I get the functional foundations in place.  Are you sure that this actually
categorizes the important features of computer vision systems.  It seems
like a real hack to me.  Have fun:

He replied to this with:
From: Paul Chou  <chou>
Subject: Re:  topdown vs bottomup

No, I don't agree. I don't consider them as control/implementation issues.
Consider the preattentive vision and the serial searching (focus attention,
eye movement), those are the mechanisms that a vision system tries to maximize
its utility. It is a result from the given  environment and the limitation of 
the vision hardware plus millions of years evolution. In short, those issues
are not "very" seperable in a vision system.
I don't understand what you mean by "functional issue". In a vision system,
it makes a lot of differences how the prior knowledge comes in.  It can be
hadrwired in the low level modules, so some system is more sensitive in
some circumstances and poor in others; it can come from high level knowledge
base to direct/hypothesize the low level processing. 
A vision system, I believe, should take all the issues into account at the 
same time. Optimizing each  issue individually can not guarantee achieving

In order to continue the discussion, I think we should formalize the problem,
define the issues, so that we can understand each other.


p. chou

Here I respond to Paul (He has not read this yet and I hope he reads this
far!)  for the first time:
From: David Sher
Subject: Re: Topdown vs Bottomup?

I feel that there are three levels at which a vision problem can be viewed:
1)	The relationship between the observed input and the desired result
	ignoring computational issues:
    That is: Given an input image or other kinds of sense data what would
    you like to output if you had an arbitrarily powerful computer and an
    arbitrarilly long time to compute it with.  A large percentage of my
    research lives at this level.

2)	The function of the observed image you will compute taking into 
	account current limitations on hardware and time:
    That is: Accepting the fact that we only have finite time and computational
    power available what function of the input will we compute in an attempt
    to aproximate the output we would really like if we had the time
    to compute it.

3)	How the function that we are computing of the input image is 
    That is: What is the precise algorithm to calculate the function of the
    image we wish to calculate and what are the problems with its accuracy
    and implementation.  Practically none of my research addresses this 

The categories you presented are and the level I would put them at are:
Category				Level
top down/bottom up			2
data/demand driven			3
prior/observed information		1
serial/parallel processing		3

I am not sure of this however since both mine and your classification 
scheme is rather fuzzy.  Anyway more food for thought

I also entered into a dialogue with Mr Larry West 
at the Institute for Cognitive Science in UC San Diego:
[ I seem to have lost his letter authorizing me to publish this letter.
  It may be that I never recieved one.  If so I apologize for this 
  unauthorized publishing ]

From: west@nprdc.arpa (Larry West)
Subject: Re: Top down vs Bottom up?
Organization: UC San Diego: Institute for Cognitive Science

In article <13220@rochester.UUCP> you write:

>  I think that I have got these definitions correct but I may have
>  screwed them up somehow.  Feel free to correct me by mail or news
>  posting.  

Sounded fine.   You could have thrown in a few buzzwords
like "frames" or "schemata", but these would be more in the
way of explanation than definition.

>  Now I have the context built I can discuss the issues.  It has been a
>  bone of contention in the vision community whether different aspects
>  of visual processing are bottom up or top down.  There are two issues
>  in fact: 
>  1)	how does the brain do it?
>  2)	what is the best way to do it?
>  I believe that in low to intermediate level vision the bottom uppers
>  dominate but in high level vision the top downers are dominant.  

This is what you would expect of course: the low-to-intermediate
vision people are dealing with the question (approximately):
	"How does the brain turn this array of photons into
	usable information"?
which is naturally a bottom-up question.   Similarly for
the high-level vision people (with whom I am not familiar).
In other words, the viewpoints are related to the problems
being addressed.

>  I would like to put foward a contention:  I feel that the issue of
>  bottom up to top down has the same relationship to vision research as
>  the issue of control structure has to theorem proving.

This analogy doesn't convey much to me.   Perhaps it will upon
more reflection.

>  This means that much research can be done without even considering the
>  issue of top down to bottom up.  Any research on the mathematical
>  relationship between an image and its structure can be done without
>  the issue of bottom up to top down coming up.  
				  (or coming down?)

I disagree here, and on a fundamental level.   I think.

You speak of the mathematical relationship as a simple thing.
What if it is a dynamic process which must involve both bottom
up and top-down communication?

Have you looked at the book ``Parallel Models of Associative Memory''
edited by Geoff Hinton and James A. Anderson (1981, Lawrence Erlbaum)?
Or other works in the Connectionist line?   

My point would be that both top-down and bottom-up are occurring in
vision, perhaps more of one in some places, e.g., more bottom-up
in the low-level area.   But I think there would still be some
top-down communication except perhaps at the retina (even then,
I suppose saccades ... well, never mind).   Anyway, think of the
connections to the Lateral Geniculate body from the primary visual
cortex (area 17).   There are more "backwards" connections (from
area 17 to LGN) than "forwards".   And the LGN is fairly early
in visual processing (in some obvious sense).

To say that this communication is relatively unimportant is
to say that the brain is spending a lot of effort (long-distance
connections) doing unimportant things.   I don't think that the
brain is perfect, but I tend to assume that its implementation
of {vision, intelligence, hearing...} is pretty efficient.

But perhaps I am misunderstanding the problem you are

>  -David Sher
>  sher@rochester
>  seismo!rochester!sher

Larry West				USA+619-452-6771
Institute for Cognitive Science		non-business hrs: 452-2256
UC San Diego (mailcode C-015)
La Jolla, CA  92093  USA

UUCP:	{ucbvax,ihnp4,sdcrdcf,decvax,gatech}!sdcsvax!sdcsla!west
				or     {sun,ulysses}!sdcsla!west
ARPA:	<west@nprdc.ARPA>	or	<west@ucsd.ARPA>
DOMAIN:	<west@nprdc.mil>	or	<west@csl.ucsd.edu>

I replied to this friendly missive with:

From sher Sun Nov 24 16:52:57 1985
Subject: Re: Top down vs Bottom up?

Here in Jerry Feldman's land I could not in any way avoid seeing
a considerable bit of connectionist work.   However the human
vision problem is not the one I am adressing.  I am sensitive
to human vision issues, since I also believe that the human brain
processes output in quite efficient ways.  I am interested in systems
whose limitations may be different than the human visual system.

To be more clear, I am interested in working out how a basically
unlimmitted system would evaluate visual data and then seeing
the best way to approximate this evaluation using the available
computational resources.  One of the points I was trying to make
is when one is looking for optimal solutions in an unlimitted
systems the issue of top down vs bottom up ceases to be relevant.

I also dialogued (who says you can't noun verbs?) with Henry Thompson
from the Edinburgh University 

>From: ht@epistemi.UUCP (Henry Thompson)
Organization: Epistemics, Edinburgh U., Scotland

I don't know about vision, but in natural language and/or
speech processing work, I find it more useful to consider the
instructive versus selective interaction question as opposed
to the top-down versus bottom-up one.

Making that shift places the emphasis where I think it belongs,
namely on what the relationship is between various 'levels' of
processing, and avoids the implication often conveyed by the use
of the appelation 'bottom-up' that no 'higher level' processing
is involved at all.

If I understand your final point correctly, it is that you
don't see any difference in outcome turning on the question of
what sort of cross-level interaction is employed.  I don't know of
any counter-examples, but it still seems to me that more than just
efficiency/resource issues are involved - the intellectual style
of the two positions is quite different, and this encourages different
sorts of exploration...

My reply to this article was a simple expression of ignorance:
From: David Sher  <sher>
Subject: This is my reply to the previous article

I am afraid that I don't recognize the terms :
instructive versus selective interaction
Can you please define these terms for me because they may suggest
a useful paradigm for my work.  Also where did they come from?
(AI? philosophy? linguistics?).   If you think that my ignorance 
is a wide spread phenomenom then you may want to post an answer to
net.ai.  Thank you

And this is his graceful reply to me:
From: Henry Thompson <seismo!epistemi.ed.ac.uk!mcvax!ht>
Subject: Re: Top down vs Bottom up?

I think I first encountered the selective/instructional
distinction in biology/philosophy (in the works of Umberto Maturana,
who is a co-author with Letvin and Pitts on 'What the Frog's eye ...').

The basic idea is, to put it informally, can processes dependent
on 'higher level' information actually guide/instruct/modify 'lower level' processes,
or are they restricted to selecting among alternatives at that level.

Examples from biology:
Bone cells are long elipsoids.  It is observed that the long axis is
oriented parallel to the major stress axis of the bone.
Is this because during bone growth cells are somehow sensitive to stress
and grow their long axes accordingly (an instructive account)?
Answer (I'm told): No, cells are initially randomly distributed in orientation,
but those not lined up with stress lines disappear.

The immune system, long assumed to be instructional (the antigen somehow
directs the production of appropriate antibodies) is now known to be selective
(all possible antibodies are already present in the body - an antigen simply
encourages differential reproduction of the 'right' antibody(s)).

The classic line in speech recognition was that syntactic and/or semantic
expectations could be used to 'tell the front end' to look for certain
acoustic features (an instructional account).  This turned out
to be very hard to manage in practice.  We are now more inclined to
try selective approaches, especially as the amount of computing power
available has grown a lot.

Hope this helps.

I also received this interesting commentary from Jay Glicksman at ADS:
From: glick@aids-unix.arpa (Jay Glicksman)
Subject: TD vs. BU processing in Vision

I agree that "much research can be done without even considering the
issue of top down to bottom up."  However this does leave out an
essential component of the vision system.

Moreover, strictly TD or BU systems do not cover the entire range of

You state:

	I believe that in low to intermediate level vision the bottom
	uppers dominate but in high level vision the top downers are

It is my impression that in "high level" vision (aka "model-based
vision), systems that rely on feedback are common.  They may take the
form of hypothesize and test, constraint satisfaction, cycle of
perception, etc. but might be generally categorized as Middle Out.

That is not to say that there aren't strictly TD model-based systems
(Bolles' Verification Vision is the stereotype for that).

As someone whose research often concentrates on control for vision
systems (eg. see J. Glicksman, Procedural Adequacy in an Image
Understanding System, Proc. 5th CSCSI/SCEIO Conf., London, Ont. 1984,
pp. 44-49) I will agree with the original statement (above) but think
that the issue of control is an important one in dealing with the
"vision problem".

	Jay Glicksman (glick@aids-unix)

I also received this bit of interesting commentary from Mr. Randy Boys:
From: Randy_Boys <boys%ti-eg.csnet@CSNET-RELAY.ARPA>
Subject:  Hypotheses vs. Bottom-Up

Some thoughts that your VisionList posting stimulated...

   Although the distinction you make between top-down and bottom-up processing,
top-down being more or less equivalent to hypothesis testing, is a bit 
dichotic, it certainly gets one in the ballpark.  My only complaint is that 
sometimes the information known about an object or scene is definite, yet not
low-level.  As an example, one can "know" that a scene is a natural landscape
or that an object is a quadrupedal mammal.  These "facts" are not hypotheses, 
yet they are not necessarily always built up from a number of low level 
primitives.  They fall between your distinction of top-down and bottom-up.
Where exactly that they lie, however, is of little importance to the actual use
of the information, which I think is what you were getting at.

   As to the overall discussion of the need (by some researchers) to consider
all vision processing in terms of T-D vs. B-U, I agree that such discussions 
are not always useful.  I would contend that such a discrimination is useful
when researchers care to share differing orientations to a problem, as this may
be their only common ground...at first.  More often than not, however, these
terms are just being thrown around as buzzwords by those who do not have full
insight as to the workings of the system under consideration.  Sometimes, the
terms are used with undue connotations, such as those who contend that their
system is "more robust" because of its "high-level coverage."

					Just rambling,

							Randy Boys

And from Saumya Debray of AI list fame who I hope won't mind my posting
of his message since he didn't get back to me of SUNY Stonybrook I received
this interesting response:
From: seismo!philabs!sbcs!debray (Saumya Debray)
Subject: Re: Top down vs Bottom up?

I find it conceptually easier to think of bottom-up reasoning as forward
chaining, i.e. repeatedly applying a set of inference rules to a set of
axioms until the goal has been achieved.  This goal might be a theorem to
be proved (in theorem proving), or a relation to be computed (in database
systems).  On the other hand, top-down reasoning corresponds to backward
chaining, i.e. working back from a goal (by applying inference rules
"backwards"), generating subproblems that have to be solved, until all
subproblems can be solved directly from the axioms of the system.

Ok, I've defined the terms in a setting that seems natural to me, which is
theorem proving.  What does this have to do with vision research?  Well,
any computation can (from Church's thesis) be described within a formal
system, e.g. any Turing Machine computation can be described in the lambda
calculus.  Thus, for example, the permissible initial states can correspond
to the axioms of the formal system, the state transition rules correspond
to the rules of inference, and the states generated by applying the
transition rules to permissible initial states correspond to theorems of
the system.  Now, given an arbitrary state S, we can either start at an
initial state and repeatedly apply transition rules and try to generate the
state S (bottom up); or we can start at S and try to work backwards, trying
to find an initial state from which S can ultimately be obtained (top down).

Does this make sense?

-Saumya Debray					... philabs!sbcs!debray
SUNY at Stony Brook				debray@sbcs.csnet

I replied to this in  this rather confused message:
From: sher
Subject: Re: Top down vs Bottom up?

I am not sure how it relates.  In vision being goal directed generally
has to do with what data is ignored (any that doesn't help settle the
goal).   I would guess that Church's thesis is a blind alley since
any program can be expressed in the lambda calculus an infinite number
of ways (I think) and perhaps could be expressed as either top down or bottom
Mind if I print your response along with a collection of responses
I am constructing?  I am hoping for some cross fertilization.

And so, this is the discussion resulting from my message on top down vs bottom
up.  I hope others find this kind of discussion as stimulating as I do.  
I thank all participants and the entire net for making this forum possible and
invite response.  I will repost with permission all responses to me
(and even maybe the net?) in the same manner (or a different manner if a better
way is suggested to me) the response on this issue.  If you have something
to say do not be shy, say it!  Have fun!

-David Sher

-David Sher