[comp.arch] Sort Co-Processors

dave@sdeggo.UUCP (David L. Smith) (10/09/87)

Has anyone out there ever run across a sorting co-processor?  I was doodling
around the other day and it seemed to me that a lot of computers spend a
lot of time sorting things out.  An add-on box that you could stuff things
into and pull them back out of in sorted order would be real handy.

Is there any research being done in this, or does it sound like a useless
idea?


-- 
David L. Smith
{sdcsvax!amos,ihnp4!jack!man, hp-sdd!crash, pyramid}!sdeggo!dave
sdeggo!dave@amos.ucsd.edu 
"How can you tell when our network president is lying?  His lips move."

mat@amdahl.amdahl.com (Mike Taylor) (10/09/87)

In article <112@sdeggo.UUCP>, dave@sdeggo.UUCP (David L. Smith) writes:
> Has anyone out there ever run across a sorting co-processor?  I was doodling

I think somebody makes one for Suns.  The original prototype for the Teradata
RDBMS machine was a sorting machine, and a large part of the machine is a
sorting network. I don't think any one has ever got rich from the idea, though.
It's a nice idea because sorts can be readily parallel-ized.

-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]

reiter@endor.harvard.edu (Ehud Reiter) (10/09/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>Has anyone out there ever run across a sorting co-processor?  I was doodling
>around the other day and it seemed to me that a lot of computers spend a
>lot of time sorting things out.  An add-on box that you could stuff things
>into and pull them back out of in sorted order would be real handy.

Lots of people have designed various VLSI sorting chips (this seems to be a
popular master's thesis these days).  I believe that one or two of the
database machine companies may use special sort chips, although I'm not
certain.

However, the main problem with a sort co-processor is that sorting tends to
be an I/O intensive task.  If you're trying to sort 100 MB of data in 1 MB
of RAM, what takes all the time is swapping the data in and out of main memory
(remember, the big IBM mainframes used for commercial tasks tend to have
ridiculously small amounts of memory, like 32MB in a $1 million machine - and
most of the 32MB is probably taken up by MVS ...).  In fact, I believe that
traditionally, database people have completely ignored CPU time and looked only
at I/O time when computing the expected times for sorts, joins, etc.

Some researchers have looked at special disk hardware to speed up database
operations, but I don't believe this has been very successful (I can dig up
some references if people are really interested).

Another problem with a sort co-processor is that it would probably have to
be closely integrated with a lot of memory (a uniprocessor sorter which used
the main memory bus probably would be of only marginal benefit).  This brings
up the standard problem of either having to transfer massive amounts of data
into and out of the sorter's memory, or of having the sorter's memory
somehow be usable as main memory by the host.

So, sorting coprocessors are possible, but the problem goes far beyond
designing clever VLSI comparison circuits, which is what most people seem
to do.

					Ehud Reiter
					reiter@harvard	(ARPA,BITNET,UUCP)
					reiter@harvard.harvard.EDU  (new ARPA)

bcase@apple.UUCP (Brian Case) (10/09/87)

In article <112@sdeggo.UUCP>, dave@sdeggo.UUCP (David L. Smith) writes:
> Has anyone out there ever run across a sorting co-processor?  I was doodling

Advanced Micro Devices makes a chip called the CADM (Content Addressable
Data Manager or something; we used to call it the Sort Buffer internally).
Basically, it is a chip with a CAM and an 8-bit processor; it reportedly
sorts really fast; it only sorts what's on the chip; it can be cascaded
up to 256 times (total of 256 chips in parallel).

philip@amdcad.AMD.COM (Philip Freidin) (10/09/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>Has anyone out there ever run across a sorting co-processor?  I was doodling
>around the other day and it seemed to me that a lot of computers spend a
>lot of time sorting things out.  An add-on box that you could stuff things
>into and pull them back out of in sorted order would be real handy.
>
>Is there any research being done in this, or does it sound like a useless
>idea?
>
>
>-- 
>David L. Smith
>{sdcsvax!amos,ihnp4!jack!man, hp-sdd!crash, pyramid}!sdeggo!dave
>sdeggo!dave@amos.ucsd.edu 
>"How can you tell when our network president is lying?  His lips move."


AMD builds a standard product called the 95C85. Content addressable
data manager. CADM. it can sort and search, and multiple devices can
be cascaded for larger sort memories. can also be used for tag sorts.

Reasonably fast.


Philip Freidin @ AMD SUNYVALE on {favorite path!amdcad!philip)
Section Manager of Product Planning for Microprogrammable Processors
(you know.... all that 2900 stuff...)
"We Plan Products; not lunches" (a quote from a group that has been standing
				 around for an hour trying to decide where
				 to go for lunch)

littauer@amdahl.amdahl.com (Tom Littauer) (10/09/87)

In article <2967@husc6.UUCP> reiter@harvard.UUCP (Ehud Reiter) writes:
>In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>>Has anyone out there ever run across a sorting co-processor?  ...
>However, the main problem with a sort co-processor is that sorting tends to
>be an I/O intensive task.   ...
>                                               In fact, I believe that
>traditionally, database people have completely ignored CPU time and looked only
>at I/O time when computing the expected times for sorts, joins, etc.
>So, sorting coprocessors are possible, but the problem goes far beyond
>designing clever VLSI comparison circuits, which is what most people seem
>to do.

*** Bias warning and disclaimer ***

I've been disliking IBM for 22 years now, and I work for a company that
competes with them (and nicely, too :-). The following opinions are mine,
not (necessarily) Amdahl's.

*** we now return to our program ***

This didn't prevent IBM from ballyhooing (that IS a proper word, isn't
it :-) the Sort Assist Feature on some of their CPUs. Net result: their
sort (assisted) still didn't beat SyncSort (unassisted). More smoke and
mirrors intended to confuse CPU buyers.
-- 
UUCP:  littauer@amdahl.amdahl.com
  or:  {sun,decwrl,hplabs,pyramid,ihnp4,ames,uunet,cbosgd}!amdahl!littauer
DDD:   (408) 737-5056
USPS:  Amdahl Corp.  M/S 330,  1250 E. Arques Av,  Sunnyvale, CA 94086

I'll tell you when I'm giving you the party line. The rest of the time
it's my very own ravings (accept no substitutes).

brian@neptune.AMD.COM (Brian McMinn) (10/09/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>Has anyone out there ever run across a sorting co-processor?

This is not an idle idea, and there have been several sort coprocessors
proposed and a few built.  A few years ago, I was involved in the design
of the Am95C85 Content Addressable Data Manager (CADM for short).  The
CADM "is a microprocessor peripheral device capable of both storing
and managing data, thus relieving the host CPU of many time-consuming
data manipulation and management tasks."  (from the Tech Manual)

Each CADM contains 1K byte of data RAM and a micro control engine.
The CPU interface resembles a floating point co-processor: data and
commands are written to the CADM as an I/O device (or a special
address in memory, or ...).  Since 1K is not sufficient for most data
base applications, the micro control engine knows how to extend sort
and find operations across multiple cascaded CADMs.  This provides
the potential for up to 256 CADMs to operate as a single unit (but
capacitance considerations make 16 CADMs a much better choice for
speed).

For general company hype about the Am95C85, call your local field
sales office.  If that doesn't work well enough, call AMD Austin at
(512)-462-5334 and ask for Reed Borie.  He's our marketing wizard in
charge of CADM things.
-- 
	Brian McMinn		1-(512)-462-5389
	Advanced Micro Devices	domainLand:  brian@neptune.AMD.COM
	Austin, Texas		bangLand:   ...!amdcad!neptune!brian

nobody@ism780c.UUCP (Unprivileged user) (10/10/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
<Has anyone out there ever run across a sorting co-processor?  I was doodling
<around the other day and it seemed to me that a lot of computers spend a
<lot of time sorting things out.  An add-on box that you could stuff things
<into and pull them back out of in sorted order would be real handy.
<
<Is there any research being done in this, or does it sound like a useless
<idea?

I was doing research at NCR in 1972.  My project was to study the feasibility
of a compiling engine as a co-processor.  At the same my boss was studing
a sorting co-processor.  Needless to say neither device was ever built.

		Marv Rubinstein -- Interactive Systems

ching@amd.AMD.COM (Mike Ching) (10/10/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>Has anyone out there ever run across a sorting co-processor?  I was doodling
>around the other day and it seemed to me that a lot of computers spend a
>lot of time sorting things out.  An add-on box that you could stuff things
>into and pull them back out of in sorted order would be real handy.

AMD has a chip called the 95C85 Content Addressable Data Manager
that performs sorts and searches on its contents. It's 1K bytes and
cascadable with programmable sizes for key and pointer fields. The
part is just reaching the sampling stage after a lengthy pregnancy.

mike ching
Disclaimer: I work in an AMD sales office.

ohbuchi@unc.cs.unc.edu (Ryutarou Ohbuchi) (10/11/87)

Many poeple knows about the IBM 3090 VF, a vector processor add-on
to the 370 architecture mainframe. It is a coprocessor. Before 3090VF,
some Japanese mainframe manufacuturer started adding various coprocessors
to their mainframes. In late 1970 (?), Hitachi added 'Integreted vector
processor', a coprocessor to perform vector math, as an extension to
IBM 370 architecture machine. It shares instruction pipeline with 
the main processor (souped-up 370), and add vector operation to it.
(It was a memory-to-memory type architecture, unlike Cray-1 nor 3090VF.)
NEC did the same thing to their machine. Now, last spring, Hitachi came 
up with 'Integrated Database Processor (IDP)'. (Actually, Mitubishi Elec.
have been adding strange coprocessors to their machines for many years,
including 'database acceralator'. But, since I am not knowledgable to 
these, I do not mention more.) IDP is a vector processor for sorting and
searching. It supports operations like merge, search, gather/scatter, 
etc. It is designed to support (mainly) relational database. It seems to
work on attribute-wise, rather than tuple by tuple operation. IDP 
make sorting, searching, etc. faster. It is a commercial product, added 
to their fastest mainframe M680 series. (Rumor is that they Hitachi up 
with another coprocessor, for Prolog execution, for their super-mini.(?) 
It is called something like "IPP (Integrated Prolog Processor)". I am 
not sure if it is out on the market or not.) 


<Standard disclaimor : The opinion stated here (if any) is my own, ...>


Ryutarou Ohbuchi     "Climb now, work later."  "Life's a rock." 
ohbuchi@unc  on csnet

dan@srs.UUCP (Dan Kegel) (10/11/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
> Has anyone out there ever run across a sorting co-processor?  
> Is there any research being done in this, or does it sound like a useless
> idea?

I heard somewhere that, using log N simple sort processors arranged cleverly,
one could perform a sort in N-time.  Has this ever been tried in hardware?
- Dan Kegel   rochester!srs!dan

forrest@blia.BLI.COM (Jon Forrest) (10/12/87)

I thought I'd add a quick note about database machines since
special purpose hardware seems to be the topic of the week.

Database machines are appropriate when the amount of work neccessary
to perform the database task is considered "too high" to be performed
on a general purpose host or when data needs to be shared amoung
heterogenous environments. In the first case, we have seen that
our database machine doesn't do well when it is being used only as
a record server; the I/O time kills us. Most OS's can get the next
record from a sequential file faster that we can get the next tuple
(based on an ordering defined by the user) and send it over one
of the interfaces we support to a host, which then has to treat the
data the same as if it had come straight off a disk. However,
if the users wants data that is the result of a complex join,
perhaps involving a complex qualification, and wants the data sorted
then this is where our (or anyone else's) database machine is useful.
If the database machine could be more closely coupled to a host,
such as through a true co-processor, then the I/O bandwidth from the
database machine to the host would increase, resulting in higher
total "thoughput" being seen by users.

The other advantage to database machines is that data can be shared
by systems that either don't talk to each other cleanly or by systems
that don't have adequate data locking facilities. For example, a
VM/CMS user running an multi-user application might find that using
a database machine to store and maintain their data would be easier
than inventing ad-hoc methods for concurrency control. (I am told
that VM/CMS doesn't provide such a facility. I could be wrong).
In addition, it might be easier for a Unix (or VMS) Vax to share
data with a VM/CMS site by using a database machine than by linking
the two together.

Jon Forrest

ucbvax!mtxinu!blia!forrest
{pyramid|voder}!blia!forrest

rbl@nitrex.UUCP ( Dr. Robin Lake ) (10/14/87)

In article <379@srs.UUCP> dan@srs.UUCP (Dan Kegel) writes:
>In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>> Has anyone out there ever run across a sorting co-processor?  
>> Is there any research being done in this, or does it sound like a useless
>> idea?
>
>I heard somewhere that, using log N simple sort processors arranged cleverly,
>one could perform a sort in N-time.  Has this ever been tried in hardware?
>- Dan Kegel   rochester!srs!dan

Raymond J. Nelson (author of the Automata Theory book and Prof. Emeritus
at CWRU) has a patent on sort hardware that sorts in one bit-time.  Basically
a set of incoming 0/1 bit streams on N input lines are "steered" to N output
lines as they pass thru the device.  Mail to me regarding this will be passed
to Dr. Nelson.

-- 
Rob Lake
{decvax,ihnp4!cbosgd}!mandrill!nitrex!rbl

howard@cpocd2.UUCP (Howard A. Landman) (10/16/87)

In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
>Has anyone out there ever run across a sorting co-processor?  I was doodling
>around the other day and it seemed to me that a lot of computers spend a
>lot of time sorting things out.

The rule of thumb at Amdahl in about 1977 or so was that commercial IBM
computing installations spend 1/2 their CPU time sorting.  Yes, that was
50%.  This created a great opportunity for products like SyncSort, which
ran somewhat faster than the standard IBM sort.  A 10% speedup in sorting
was equivalent to speeding your whole machine up by about 5%, which, on
a $4M+ machine, was worth hundreds of thousands of dollars.

-- 
	Howard A. Landman
	{oliveb,hplabs}!intelca!mipos3!cpocd2!howard	<- works
	howard%cpocd2%sc.intel.com@RELAY.CS.NET		<- recently flaky
	"Unpick a ninny - recall Mecham"

dwc@homxc.UUCP (D.CHEN) (10/16/87)

In article <379@srs.UUCP>, dan@srs.UUCP (Dan Kegel) writes:
> In article <112@sdeggo.UUCP> dave@sdeggo.UUCP (David L. Smith) writes:
> > Has anyone out there ever run across a sorting co-processor?  
> > Is there any research being done in this, or does it sound like a useless
> > idea?
> 
> I heard somewhere that, using log N simple sort processors arranged cleverly,
> one could perform a sort in N-time.  Has this ever been tried in hardware?

there are O(N) sort algorithms for "pyramid" architectures.  i forget
the references.

danny chen
homxc!dwc