[comp.arch] Teradata

david@cbnewsh.att.com (David Appell) (08/01/90)

   I'd like to read information on Teradata and Teradata-like architectures,
and I'd appreciate any advice on where to start. E-mail please. Thanks.



*                               *=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
*	-- David                                                  *=*
*              	 		...att!cbnewsh!david                *
*				david@cbnewsh.att.com		    *
*=*                                                                 *
*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=                            * 

zed@mdbs.uucp (Bill Smith) (10/07/90)

This reply pushed one of my buttons.  I'll try hard to be civil.

>This is not necessarily clear.  There are various penalties
>associated with shared memories, including performance and cost.

There are various penalties involving performance and cost associated 
with *everything*.  That's why design is an interesting problem.

>>Sharing allows some nice programming techniques, and makes a
>>difference when trying to port a lot of software to the new machine.

>Granted, most parallel software was written under the shared memory
>model, and if the main constraint is porting existing software, then
>shared memory machines win. This argument was used in the past
>on various innovations.

What ever happened to software engineering?  If "most parallel software
was written under the shared memory model", it is, by definition, not
portable.  If it is not portable then, in the long term, the software will
cost more to keep than to toss it today and start over with a better
software model.  Why don't sponsors of large projects demand, in the 
requirements, that the project be developed in a fashion that will port 
to advances in architecture technology that might be reasonably expected 
in the next 10 (or even 5) years?  It doesn't make any sense considering how 
expensive software is.   The Kale's Chare Kernel is one current research 
project that I am familiar with that doesn't care what what the architecture 
of the computing engine is.  I'm sure there are more.  It's taken for 
granted that we don't write software that depends on whether the processor 
was implemented in CMOS, TTL, E2L or I2L.  Why can't this principle be 
extended to the next, IMHO, obvious level?

>While sharing at the page level across a network works, increasing
>the number of workstations to say, 1000, will seriously impede
>performance.

Well, then find a better way to get 1000 nodes to cooperate!

>I disagree.  Shared memory automatically creates contention at
>for frequently used data.  Caching is a possible solution, but 
>cache coherence protocols for fine grained machines become incredibly
>complex, and further impact performance.

Again, find a better way!  If you get too much contention on frequently
used data, design algorithms that don't have any frequently used data.
If cache coherence protocols are incredibly complex and impact
performance, design an architecture that doesn't need multiple caches 
on a single piece of physical memory.

>Further, shared memory does not obey the natural laws of physics,
>in a sense.  

A law of physics is "natural" if it's been around long enough that we've
forgotten how much turmoil its discovery caused.

>The abstract shared memory model pretends that there
>is a large number of communication paths to a single point. This
>simply isn't true, and additional hardware is required to provide
>this illusion to the programmer. This hardware costs time and
>dollars.

>Parallel programs designed to reflect the underlying physical
>reality will operate faster than those that work in a virtual
>model. I suppose an IMHO is in order here.

At first, I thought these two paragraphs said valid points, but after
rereading them, I realized that they deny the entire history of
operating system research.  The whole point of a software engineering
environment is to hide as many details of the underlying physical reality
as was possible to hide at the time the environment was developed.  Consider
demand paged memory, NFS, Internet, pre-emptive process scheduling, 
standard device driver interfaces, etc.   The performance issues are not
only denigrated, but the whole point is that if the computer does more
work, the humans won't have to.  The essence of Computer Science is
illusions for programmers and users.

>Mike Bolotski          Artificial Intelligence Laboratory, MIT
>misha@ai.mit.edu       Cambridge, MA

Zed
sawmill!mdbs!zed
Standard disclaimer.

cgy@cs.brown.edu (Curtis Yarvin) (10/26/90)

In article <1990Oct7.131537.23798@mdbs.uucp> zed@mdbs.uucp (Bill Smith) writes:
>This reply pushed one of my buttons.  I'll try hard to be civil.
And yours pushed one of mine.  But I'll try too.

>>Granted, most parallel software was written under the shared memory
>>model, and if the main constraint is porting existing software, then
>>shared memory machines win. This argument was used in the past
>>on various innovations.
>
>Why don't sponsors of large projects demand, in the requirements, that the
>project be developed in a fashion that will port to advances in architecture
>technology that might be reasonably expected in the next 10 (or even 5)
>years?
...

>It's taken for granted that we don't write software that depends on whether
>the processor was implemented in CMOS, TTL, E2L or I2L.  Why can't this
>principle be extended to the next, IMHO, obvious level?

Abstraction costs.  It costs in speed and it costs in memory.  And not all
abstractions cost the same - thus your analogy is entirely bogus.  Cross-
architectural abstractions can be especially expensive.  And who cares if
you can write a program which will work on both a Butterfly and a Connection
Machine, if you can do the same job just as fast on a 386? I'm not arguing
that it can't be done, and done well.  But certainly nobody has done it yet.

>>While sharing at the page level across a network works, increasing
>>the number of workstations to say, 1000, will seriously impede
>>performance.
>
>Well, then find a better way to get 1000 nodes to cooperate!
...
>Again, find a better way!  If you get too much contention on frequently
>used data, design algorithms that don't have any frequently used data.
>If cache coherence protocols are incredibly complex and impact
>performance, design an architecture that doesn't need multiple caches 
>on a single piece of physical memory.

"WAAAAAHHHHH!  I waaaaanntt it faaasster!  MOMMY!"  

Seriously, architecture isn't quite that trivial.  If architects could build
an infinitely perfect architecture, then "software engineers" could write
infinitely slow programs.  Unfortunately, life just ain't that simple.

>>The abstract shared memory model pretends that there
>>is a large number of communication paths to a single point. This
>>simply isn't true, and additional hardware is required to provide
>>this illusion to the programmer. This hardware costs time and
>>dollars.
>The essence of Computer Science is illusions for programmers and users.

If wishes were horses, beggars would ride.
And if illusions were free, we'd all be using Smalltalk.

		-Curtis

"I tried living in the real world
 Instead of a shell
 But I was bored before I even began." - The Smiths

twl@cs.brown.edu (Ted "Theodore" W. Leung) (10/26/90)

>>>>> On 26 Oct 90 00:18:06 GMT, cgy@cs.brown.edu (Curtis Yarvin) said:

cgy> In article <1990Oct7.131537.23798@mdbs.uucp> zed@mdbs.uucp (Bill Smith) writes:
zed> This reply pushed one of my buttons.  I'll try hard to be civil.
cgy> And yours pushed one of mine.  But I'll try too.
Perhaps some gentle discussion is in order then.....

zed> Why don't sponsors of large projects demand, in the requirements, that the
zed> project be developed in a fashion that will port to advances in architecture
zed> technology that might be reasonably expected in the next 10 (or even 5)
zed>years?
This is hardly a reasonable thing to do.  If one looks at a recent 10
year window of advances in sequential processor architecture, one sees
that there has been a notable shift away from CISC architectures
toward RISC architectures.  The people that invented the RISC concept
gave CISC architecture a real heart attack with their results.  I
don't recall what large projects you were referring to, but since you
are talking about parallel architecture, I'll assume that these are
research projects.  Part of research involves throwing something away
when you've undertood that you've done it wrong.  Sometimes you don't
understand that you've done it wrong until you actually build one.
I think that compatability assurances should be made by vendors not
research teams. 

zed>It's taken for granted that we don't write software that depends on whether
zed>the processor was implemented in CMOS, TTL, E2L or I2L.  Why can't this
zed>principle be extended to the next, IMHO, obvious level?
The reason that current software doesn't care about semiconductor
device technology is that different variations in technology all
support the same abstract machine model (architectural description).
The problem with moving that up to the obvious level right now is that
there is no standard abstract model for parallel computers.  There are
a large number of people out there who will argue for a particular
model, but no single one has been proven to be better than another.
In fact, everyone's criteria of better seems to be different.  If a
common model can be settled upon, then that will be the time to create
new abstractions (or eliminate non conforming architectures), right
now, most people are still trying to find a general purpose abstract
parallel machine -- and some people believe it can't be done.

cgy> Abstraction costs.  It costs in speed and it costs in memory.  And not all
cgy> abstractions cost the same - thus your analogy is entirely bogus.  Cross-
cgy> architectural abstractions can be especially expensive.  And who cares if
cgy> you can write a program which will work on both a Butterfly and a Connection
cgy> Machine, if you can do the same job just as fast on a 386? I'm not arguing
cgy> that it can't be done, and done well.  But certainly nobody has done it yet.
It is true that abstraction costs, but just because this is true does
not mean that his analogy is bogus.  Just as sequential architects
experimented with a variety of concepts and have discarded a large
number of them, parallel architects will probably do the same.  Again,
it's a matter of not understanding the problem well enough to know
what to throw away.  Just because nobody has done it yet, doesn't mean
it can't be done.  In this case, it just means that things are a lot
harder than people originally thought.

zed> Well, then find a better way to get 1000 nodes to cooperate!
There are a number of people working on just this problem.  Some of
them are even building their machines, so we may have some numbers and
experience to benefit from fairly shortly.

zed> Again, find a better way!  If you get too much contention on frequently
zed> used data, design algorithms that don't have any frequently used data.
zed> If cache coherence protocols are incredibly complex and impact
zed> performance, design an architecture that doesn't need multiple caches 
zed> on a single piece of physical memory.

cgy> "WAAAAAHHHHH!  I waaaaanntt it faaasster!  MOMMY!"  
Actually, zed's complaint here is somewhat reasonable.  There is the
emerging area of concurrent data structures, which allow high degrees
of concurrent access to the structure.  See Bill Dally's PhD thesis
out of Caltech for more details....  The problem with all this of
course, is that cooperating programs need a medium for cooperation.
Most people consider that to be shared memory of one kind or another,
so some of the problems of shared memory will be with us for some time.

cgy> If wishes were horses, beggars would ride.
cgy> And if illusions were free, we'd all be using Smalltalk.
If people understood what made illusions expensive, then maybe we
would all be using Smalltalk.  See Dave Ungar's work on SOAR and Self
for good examples.  Pooh poohing abstraction because it's expensive
won't work forever.  Software systems are getting too expensive and
complicated to keep on doing it.

Ted
--
--------------------------------------------------------------------
Internet/CSnet: twl@cs.brown.edu 	| Ted "Theodore" Leung
BITNET: twl@BROWNCS.BITNET		| Box 1910, Brown University
UUCP: uunet!brunix!twl			| Providence, RI 02912

Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (10/29/90)

>>>>> On 7 Oct 90 13:15:37 GMT, zed@mdbs.uucp (Bill Smith) said:
Bill> What ever happened to software engineering?  If "most parallel
Bill> software was written under the shared memory model", it is, by
Bill> definition, not portable.

Considering multi-processor architectures account for a small portion (<1%,
I'd guess) of _existing_ computers, _anything_ depending on parallelism is,
by definition, non-portable.  If your objective is maximum portability,
parallelism doesn't make it out of the starting gate.

Also, the prevalence of software for the shared memory model is no
accident; most new development of <=20K$ multi-processor systems involves
loosely coupled shared memory architectures.  This may change, but it
certainly appears to be the current trend.

Bill> If it is not portable then, in the long term, the software will cost
Bill> more to keep than to toss it today and start over with a better
Bill> software model.

Not necessarily.  I, for one, would guess it costs less to rewrite as
needed (and _if_ needed) than to attempt to anticipate the lowest common
denominator of every "reasonable" architecture that may be invented over
the next 5 years.  Quantitative oriented (a.k.a. RISC) engineering is
pushing single processor design throughput ever nearer to the physical
limits of switching speed.  In the pursuit of ever higher throughput, I
_expect_ the emergence of a plethora of new parallel designs and the
re-emergence of old parallelism ideas no longer impractical due to
subsequent technology advances.  Further, I _expect_ all this in just the
next 5 years.

The best a software engineer can do, IMHO, is to use a language that allows
the programmer the option of abstracting parallelism (e.g. Concurrent
C/C++, etc.) and leave it to compiler designers and hardware designers, in
tandem, to implement concurrent HLL's as efficiently as possible (and then,
for compiler designers to find and exploit high level parallelization even
when concurrancy is _not_ explicitly marked by the programmer).


>While sharing at the page level across a network works, increasing
>the number of workstations to say, 1000, will seriously impede
>performance.

Bill> Well, then find a better way to get 1000 nodes to cooperate!

>I disagree.  Shared memory automatically creates contention at
>for frequently used data. ...

Bill> Again, find a better way!  If you get too much contention on
Bill> frequently used data, design algorithms that don't have any
                                                              ^^^
Bill> frequently used data.

This isn't _always_ possible!  (Some OLTP DBMS applications immediately
come to mind.)  That said, one approach is to encapsulate the data a la
Object Oriented Design and use a client-server model to reduce network
traffic.  That is, the remote client only performs the most abstract of
manipulations (hopefully reducing network overhead) while the server does
all the heavy I/O, low level manipulations and provides enforcement of data
integrity.  X Windows is an example.  Display PostScript, in its various
forms, is even more so.  This is _not_ a panacea, but it often works.

Bill> The whole point of a software engineering environment is to hide as
Bill> many details of the underlying physical reality as was possible to
Bill> hide at the time the environment was developed.  Consider demand
Bill> paged memory, NFS, Internet, pre-emptive process scheduling, standard
Bill> device driver interfaces, etc.  The performance issues are not only
Bill> denigrated, but the whole point is that if the computer does more
Bill> work, the humans won't have to.  The essence of Computer Science is
Bill> illusions for programmers and users.

Well put.  Hmmm.  Perhaps we're in agreement after all.  :-)  I'd add the
caveat, "Adopt no illusion before its time".  (e.g. using Lisp for OLTP:-)

#include <std/disclaimer.h>
--
Chuck Phillips  MS440
NCR Microelectronics 			chuck.phillips%ftcollins.ncr.com
2001 Danfield Ct.
Ft. Collins, CO.  80525   		...uunet!ncrlnk!ncr-mpd!bach!chuckp

pcg@cs.aber.ac.uk (Piercarlo Grandi) (10/31/90)

Mike Bolotski (misha@ai.mit.edu) wrote:

misha> The abstract shared memory model pretends that there is a large
misha> number of communication paths to a single point. This simply
misha> isn't true, and additional hardware is required to provide this
misha> illusion to the programmer. This hardware costs time and dollars.
misha> Parallel programs designed to reflect the underlying physical
misha> reality will operate faster than those that work in a virtual
misha> model. I suppose an IMHO is in order here.

IMNHO too, by Jove! The reader will note that the "designed to reflect
the underlying physical reality" does not mean "designed at the level of
the underlying physical reality".

On 7 Oct 90 13:15:37 GMT, zed@mdbs.uucp (Bill Smith) said:

zed> This reply pushed one of my buttons.  I'll try hard to be civil.

I'll try hard too!

zed> At first, I thought these two paragraphs said valid points, but
zed> after rereading them, I realized that they deny the entire history
zed> of operating system research.  The whole point of a software
zed> engineering environment is to hide as many details of the
zed> underlying physical reality as was possible to hide at the time the
zed> environment was developed.

Here we disagree (euphemism). This is IMNHO a facetious view of the
object of software engineering, which is to deliver cost effective
solutions. If not, we would all be programming Turing machines :-).

The whole point of a software engineering environment is to make it
possible to use currently available technology in an abstract way, which
is quite different from making currently available technology irrelevant
in program design (or else we would all be functional programmers).
Indeed a lot of current OS research is still dealing with the basic and
regrettable fact that there are still several orders of magnitude in
speed between central memory and mass storage. Indeed a lot of research
in compilers and languages is based on the regrettable fact that there
are huge differences in bandwidth between registers, cache, and central
memory.

These regrettable facts may be ignored, if you are willing to run your
codes several times slower than codes with hardware dependent design
choices. Some of us, e.g. Herman Rubin, spend a lot of money in CPU time
each year, and even a 10% difference may mean big bucks.

zed> Consider demand paged memory, NFS, Internet, pre-emptive process
zed> scheduling, standard device driver interfaces, etc.  The
zed> performance issues are not only denigrated, but the whole point is
zed> that if the computer does more work, the humans won't have to.  The
zed> essence of Computer Science is illusions for programmers and users.

And here you shoot yourself in the foot. The fact that a lot of people
think like you, mistakenly, has given us wonders like zillions of
programs with poor locality of reference, zillions of poorly tuned NFS
networks, etc...

If you have the illusion of infinite avaiable memory this does not mean
that you have infinite available memory; you still have to be very
careful about memory access patterns if you want to make use of your
hardware in a cost effective way, or else you can join the GNU project
:-).  If you have the illusion of having local disks, you still must be
careful about not overloading the shared network you are using, if you
want to be cost effective, or else you can join my department's support
group :-).  Compilers generate good assembler code for you, but you
still must write well thought out and efficient programs. Aggressive
optimizers may give you the illusion that you can write bad code and it
will be turned magically into good code, but that means that they are
becoming program generators.

Abstraction is there to make your work easier and more portable, not to
make hard thinking irrelevant. Pragmatics are damn important, not just
semantics ("works well" is more important that "it works"...).

Architecture design is the difficult task of combining pragmatics with
semantics, like architecture of buildings; to make things that are both
elegant and cost effective, usable within the constraints of current
technology, working around or even exploiting them.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk