[comp.arch] Unix machines for large databases

billo@cmx.npac.syr.edu (Bill O) (05/03/88)

Help! By Friday we need to know if there is a Unix-based box that
can work as a very high-performance data-base server.  Yes, there are a
million Unix boxes out there, but a data-base server has to be able to
cope with concurrent access by multiple (possibly several hundred)
users.  Simple file locking isn't good enough -- users should never
have to read "file locked" error messages.  Also, the disk performance
should be very good.

What is the nature of the data base?  Would you believe we're not
sure?  The amount of data will probably be very large, and it may or
may not be based on the relational model.  Why am I asking such a
vague question?  Because we are trying to develop corporate funding
for a project that is still in the jello stage of conception (you can
see it, and it glistens, but you still can't get a good grip on it).
What we want to do is to be able to talk about existing Unix-based
solutions to very large data sharing problems.

Names we know about:
 Gould -- fast disks, but is there data-sharing software?
 Most other Big Unix Boxes in the world -- ditto above comment.
 Tandem -- is this Unix based?
 Stratus -- is this Unix based,  is this a good database machine?
 
I'm nervous about posting this, because I expect every Unix box maker
to tell me about their great machines.  That's fine, but are there
proven very-large data-sharing/data-base applications on those
machines?  Are the disks very high performance?  SCSI ports probably
won't hack it.   We will consider both uniprocessor and multiprocessor
solutions.  Non-Unix solutions, while interesting, are not the topic
of this posting.  Please mail to me, don't post.  If others express an
interest, and if the response is informative, I'll post a summary.

Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University
(billo@cmx.npac.syr.edu)

phil@osiris.UUCP (Philip Kos) (05/05/88)

Bill -

Please get in touch with me if you can.  The return mail path I got
out of news was completely worthless, as usual, and I don't want to
clutter the net with this stuff.  You should be able to get in touch
with me through the uucp paths in my signature - they're the only way,
as far as I know, to get here from anywhere else.

BTW, in case you couldn't guess, we're doing gigabyte-database OLTP
applications on UNIX boxes here at Hopkins...


                                                                 Phil Kos
...!decvax!decuac!\                                   Information Systems
  ...!uunet!mimsy!aplcen!osiris!phil           The Johns Hopkins Hospital
...!allegra!/                                               Baltimore, MD

markd@rtech.UUCP (Mark P. Diamond) (05/05/88)

From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
> Help! By Friday we need to know if there is a Unix-based box that
> can work as a very high-performance data-base server.  

Take a look at the Sequent Symmetry.  This tightly coupled
UNIX multiple processor is an optimum machine for running
Relational Databases.  In a recent project with Relational
Technology a sixteen processor Symmetry achieved 104 Transactions
per second running the Debit Credit Benchmark (TP1) on
a fully sized 1.1G Byte database, at a price performance ratio
at about 1/8 that of Tandem  (all of this was verfied by
the independent Codd & Date consulting group).  This is the  
fastest (by a factor of about three) any UNIX box has achieved
for this benchmark.  RTI, Oracle, Informix and Unify run Sequent
for their in-house applications.

Mark <>

PS  If anyone would like a full write up of this project send me
your postal address.
<>                                  <>                                  <> 
Mark P. Diamond  		 {sun, cbosgd, amdahl, mtxinu}!rtech!markd
from Sequent Computer Systems onsite at Relational Technology 

aglew@urbsdc.Urbana.Gould.COM (05/05/88)

>/* ---------- "Unix machines for large databases" ---------- */
>...
>Names we know about:
> Gould -- fast disks, but is there data-sharing software?
> Most other Big Unix Boxes in the world -- ditto above comment.
> Tandem -- is this Unix based?
> Stratus -- is this Unix based,  is this a good database machine?
> 
>I'm nervous about posting this, because I expect every Unix box maker
>to tell me about their great machines.  

Sorry, couldn't resist, but I'll be brief.

It's nice to see us (Gould) getting the respect we deserve,
at the top of a list :-). I've heard of applications like this,
and will try to get people who know the details to contact you.

Thanks!

Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
    aglew@gould.com     	- preferred, if you have MX records
    aglew@xenurus.gould.com     - if you don't
    ...!ihnp4!uiucuxc!ccvaxa!aglew  - paths may still be the only way
   
My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.

steve@edm.UUCP (Stephen Samuel) (05/05/88)

From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
> Help! By Friday we need to know if there is a Unix-based box that
> can work as a very high-performance data-base server.  Yes, there are a

I think (from the propaganda I've heard) that something like oracle might sorta
fit your bill. One of the ways that they do this is by use of raw disk I/O
rather than putting the data base into the filesytem space.
  I assume that SMD drives are fast enough for you?

-------------
 Stephen Samuel 
  {ihnp4,ubc-vision,vax135}!alberta!edm!steve
  or userzxcv@uqv-mts.bitnet
-- 
-------------
 Stephen Samuel 			Disclaimer: You betcha!
  {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve
  BITNET: USERZXCV@UQV-MTS

UH2@PSUVM.BITNET (Lee Sailer) (05/06/88)

In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) says:
>
>Help! By Friday we need to know if there is a Unix-based box that
>can work as a very high-performance data-base server.  Yes, there are a

Sure there is.  You can run Unix on a Cray.

eric@pyrps5 (Eric Bergan) (05/06/88)

In article <2050@rtech.UUCP> markd@rtech.UUCP (Mark P. Diamond) writes:
>From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
>> Help! By Friday we need to know if there is a Unix-based box that
>> can work as a very high-performance data-base server.  
>
>Take a look at the Sequent Symmetry.  This tightly coupled
>UNIX multiple processor is an optimum machine for running
>Relational Databases.

	Rather a sweeping statement...

>In a recent project with Relational
>Technology a sixteen processor Symmetry achieved 104 Transactions
>per second running the Debit Credit Benchmark (TP1) on
>a fully sized 1.1G Byte database, at a price performance ratio
>at about 1/8 that of Tandem  (all of this was verfied by
>the independent Codd & Date consulting group).  This is the  
>fastest (by a factor of about three) any UNIX box has achieved
>for this benchmark.

	Just to make sure we are comparing apples and apples here, I'm
a little surprised by the 1.1 Gbyte figure. How many tuples were you running
with in account? The "standard" 1,000,000 (with 1000 teller and 100 branch
tuples) or did you scale it up? If scaled up, it is probably not appropriate
to compare against any other tests that have been run, since the decreased
contention on the branch relation will improve performance. Did you run
with just one history relation, or did you split that up, and if so,
into how many pieces? I assume that this was with journaling turned on?

>RTI, Oracle, Informix and Unify run Sequent
>for their in-house applications.

	In general, claiming that a database vendor is using one particular
platform or another for in-house applications is similar to claiming
that the Bell Labs has bought your computer - it's not a very exclusive
club. Oracle has purchased several Pyramid 9840s for their world-wide
sales applications. Several of the database companies use Pyramid's
for their file servers.

	Obviously I am biased, but claiming that any computer is "optimum"
for so broad a range of possible uses as relational database applications
seems a little questionable.

pavlov@hscfvax.harvard.edu (G.Pavlov) (05/06/88)

In article <2050@rtech.UUCP>, markd@rtech.UUCP (Mark P. Diamond) writes:
> ...................... In a recent project with Relational
> Technology a sixteen processor Symmetry achieved 104 Transactions
> per second running the Debit Credit Benchmark (TP1) on
> a fully sized 1.1G Byte database, at a price performance ratio
> at about 1/8 that of Tandem  (all of this was verfied by
> the independent Codd & Date consulting group). ................
> 
> PS  If anyone would like a full write up of this project send me
> your postal address.

  I do not doubt the benchmark and I would appreciate a copy of the write-up.
  But there has been a working relationship of one sort or another between
  the C&D group and RTI for a long time.  So "independent" is overstating 
 things a bit......
 

pavlov@hscfvax.harvard.edu (G.Pavlov) (05/06/88)

In article <3091@edm.UUCP>, steve@edm.UUCP (Stephen Samuel) writes:
> From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
> > Help! By Friday we need to know if there is a Unix-based box that
> > can work as a very high-performance data-base server.  Yes, there are a
> 
> I think (from the propaganda I've heard) that something like oracle might sort
> a fit your bill. One of the ways that they do this is by use of raw disk I/O
> rather than putting the data base into the filesytem space.
> 
  But using raw disk i/o per se doesn't guarantee anything, does it ?  I think
  that the most relevant part of your message was the phrase in the parens.

    greg pavlov, fstrf, amherst, ny 

esf00@amdahl.uts.amdahl.com (Elliott S. Frank) (05/07/88)

In article <41647UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes:
>In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) says:
>>
>>Help! By Friday we need to know if there is a Unix-based box that
>>can work as a very high-performance data-base server.  Yes, there are a
>
>Sure there is.  You can run Unix on a Cray.

Or on an Amdahl 5890/5990 running UTS.  You may have a floor space problem
past several Tb.
-- 
Elliott Frank      ...!{hplabs,ames,sun}!amdahl!esf00     (408) 746-6384
               or ....!{bnrmtv,drivax,hoptoad}!amdahl!esf00

[the above opinions are strictly mine, if anyone's.]
[the above signature may or may not be repeated, depending upon some
inscrutable property of the mailer-of-the-week.]

davek@rtech.UUCP (Dave Kellogg) (05/07/88)

I can understand Eric's skepticism because if someone told me 6 months
ago that INGRES would exceed 100 TPS I might have asked them if they 
bumped their head on the way to the office.

However, I know what Mark Diamond says is true because I was in the room
with him, along with Tom Sawyer from Codd & Date consulting, when INGRES 
hit 104 TPS.

To appease any cynics I'll list the one caveat of the benchmark first:

	* The INGRES system (running on a Sequent Symmetry machine) which
	  hit 104 TPS was running a prototype version of RTI's next release.
	  As part of normal prototyping activity we asked ourselves "Just
	  how fast can this go?"  We convinced Sequent to let RTI use a
	  large Symmetry machine, and we were off...


Eric was surprised about the 1+ Gigabyte database size.  In fact, the 
benchmark was run with a DebitCredit defined 100 TPS sized database.  Before
continuing, a little background on the DebitCredit benchmark is in order.

DebitCredit is a well-defined standard benchmark and was written in the 
late 1970's  by Jim Gray and about 20 other database professionals.  The 
paper was eventually published in DATAMATION under the title "A Measure of 
Transaction Processing" by the authors "Anon et al."  Rumour has it the 
authors wished to remain secret due to flame-ups that occurred after Dave 
DeWitt and Dina Bitton wrote their paper on DBMS benchmarking.

DebitCredit is one of three benchmarks described in the paper, and various
degenerate forms of DebitCredit  have become loosely known in the industry
as "TP1."  The problem with TP1, and the ensuing "TPS" (transactions/second)
measurements, is that most vendors size the databse irregularly (i.e. smaller
than DebitCredit defines).  Thus, as Eric points out, when comparing TPS 
measurements one is often comparing apples and oranges.

For the Silver Bullet benchmarks, to which Mark refers, the database was
sized at 100 TPS, or 10 Million 100 byte account records, 10,000 100 byte
teller records, and 1,000 100 byte account records.  Thus, a real purist
would rob RTI of the 104 TPS (and grant only 100 TPS) because the database 
was sized for 100 TPS.  (If you do the multiplication you'll see that
the account relation alone is 1 gigabyte of data.)

Overall, the benchmark conformed to DebitCredit standards quite well, 
including the submission of tranasctions via a network.  However, there
were a few things we didn't do 100% to the DebitCredit spec.  But then
again, we did a couple to exceed the spec.  In any case, the auditor's 
report is being published tommorrow so all DebitCredit whizzes can take
a look.

In conclusion, I saw one "pop-off" on the net (flame semi-on) which questioned
the integrity of the auditor since "Codd and Date and RTI have always had 
a good working relationship..." I'll reply to that with

	* If we wanted to pay someone to lie we wouldn't have paid
	  Codd and Date's rates!  ;-)

	* Mr. Sawyer was the auditor of Tandem's 208 TPS benchmark.

	* I personally hope that he is not on the net to see this random
	  assault on his character.

	* If you read his report you'll see that he is certainly impartial.



Finally, if you're interested in seeing the benchmark report you can reply
to this message with a postal address and I'll do my best to get you a copy.


Dave Kellogg
ucbvax!rtech!davek (might need a mtxinu before the rtech)


"Hmmm.  We hit 100 TPS, can I go to bed now??"

chuck@amdahl.uts.amdahl.com (Charles Simmons) (05/07/88)

In article <41647UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes:
>In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) says:
>>
>>Help! By Friday we need to know if there is a Unix-based box that
>>can work as a very high-performance data-base server.  Yes, there are a
>
>Sure there is.  You can run Unix on a Cray.

Unless you have a real big need for the vector processor of the Cray,
an Amdahl machine may well provide better performance at a lower cost.

-- Cs

pardo@june.cs.washington.edu (David Keppel) (05/08/88)

esf00@amdahl.uts.amdahl.com (Elliott S. Frank) writes:
>UH2@PSUVM.BITNET (Lee Sailer) writes:
>>billo@cmx.npac.syr.edu (Bill O) says:
>>>Help! By Friday we need to know if there is a Unix-based box that
>>>can work as a very high-performance data-base server.  Yes, there are

>>Sure there is.  You can run Unix on a Cray.

>Or on an Amdahl 5890/5990 running UTS.  You may have a floor space
>problem past several Tb.

Check out optical disk drives.  I believe DEC is now selling them
for the VAX line; I'd immagine that most other vendors have similar
products in mind.  They can solve your floor space problems well
beyond "several Tb", and, being write-once-read-many (WORM) are well-
suited to an application requiring a permanent history.  Typically
they are large enough so that you don't fill them very fast even if
you don't care about a permanent record.

    ;-D on  ( Bliss is Bliss, Ignorance is Ignorance, I'm happy )  Pardo

news@edm.UUCP (news software) (05/09/88)

From article <564@hscfvax.harvard.edu>, by pavlov@hscfvax.harvard.edu (G.Pavlov):
# In article <3091@edm.UUCP>, steve@edm.UUCP (Stephen Samuel) writes:
#> From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
#> I think (from the propaganda I've heard) that something like oracle
#> might sort a fit your bill. One of the ways that they do this is by use
#> of raw disk I/O rather than putting the data base into the filesytem space.
#> 
#   But using raw disk i/o per se doesn't guarantee anything, does it ?  I think
It tends to promise that address locality implies spacial locality. This is
a nice assumption to be able to make when you want to improve your speed.
-- 
-------------
 Stephen Samuel 
  {ihnp4,ubc-vision,vax135}!alberta!edm!steve
  or userzxcv@uqv-mts.bitnet

milbery@rtech.UUCP (Jim Milbery) (05/09/88)

In article <2050@rtech.UUCP> markd@rtech.UUCP (Mark P. Diamond) writes:
>From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
>> Help! By Friday we need to know if there is a Unix-based box that
>> can work as a very high-performance data-base server.  
>
>Take a look at the Sequent Symmetry.  This tightly coupled
>UNIX multiple processor is an optimum machine for running
>Relational Databases.

Relational Technology has been working with several unix-based 
multiprocessing machines with the INGRES product.

Significant performance gains can be had using cpu/io improvements
that these vendors offer.

Pyramid is also offering significant features in a balanced fashion
(cpu, disk i/o and terminal i/o) that INGRES can take advantage of.

No selling here, but INGRES is hot, and can take advantage of the
multiprocessing capabilities that Pyramid and Sequent and others offer.

** Opinions are my own and not necessarily RTI's.

jim Milbery, RTI Technical Support Burlington,MA

fox@alice.marlow.reuters.co.uk (Paul Fox) (05/11/88)

In article <428@cmx.npac.syr.edu> billo@cmx.npac.syr.edu (Bill O'Farrell) writes:
>Help! By Friday we need to know if there is a Unix-based box that
>can work as a very high-performance data-base server.  Yes, there are a
>million Unix boxes out there, but a data-base server has to be able to
>cope with concurrent access by multiple (possibly several hundred)
>users.  Simple file locking isn't good enough -- users should never
>have to read "file locked" error messages.  Also, the disk performance
>should be very good.
>
I have no answers for you but I do have some questions ...

Is your database mainly for reads or reads and writes. If its mainly for
reading information, then having a large disk cache, (eg 8MB) will far
outweigh the speed of the disk.

If you really are going to try to support hundreds of users, then one
major problem will be finding a network interface that can reliably
support this many virtual circuits. One of the biggest problems with
all machines seems to be the limit on the number of concurrent sessions.
Not only is a large amount of memory (ie several K) needed per
session, but also one has to consider things like the size of the
ARP tables.

=====================
     //        o      All opinions are my own.
   (O)        ( )     The powers that be ...
  /    \_____( )
 o  \         |
    /\____\__/      
  _/_/   _/_/         UUCP:     fox@alice.marlow.reuters.co.uk

boughter@ghostwheel.UUCP (Ellen Boughter) (05/11/88)

1.1 GB *is* the standard size of the database for Debit_Credit,
according to the original specification. That's without mirroring,
which is optional. If the HISTORY relation is not divided (which it
probably isn't, on the Sequent), I wonder if the tail is being locked?
This has been an issue for us; if the tail is not locked, one runs
the risk of losing history data. If it is locked, there is a big hot
spot.

peterg@murphy (Peter Gutmann) (05/11/88)

In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) writes:
> deleted stuff about "need this yesterday" and record locking.....
> 
> What is the nature of the data base?  Would you believe we're not
> sure?  The amount of data will probably be very large, and it may or
> may not be based on the relational model.  Why am I asking such a
> vague question?  Because we are trying to develop corporate funding
> for a project that is still in the jello stage of conception (you can
> see it, and it glistens, but you still can't get a good grip on it).
> What we want to do is to be able to talk about existing Unix-based
> solutions to very large data sharing problems.

	Without knowing the "nature" of the database there are some 
	questions that must be answered about the logical location
	of the data (two phase commit), how much transaction logging 
	is required, distribution of the users, etc. 
> 
> Names we know about:
>  Gould -- fast disks, but is there data-sharing software?
>  Most other Big Unix Boxes in the world -- ditto above comment.
>  Tandem -- is this Unix based?

	Tandem has a UNIX box. It is based upon the Altos 3068 System V
	using the MC68020 as the processor. The rest of there products
	do not use UNIX.

>  Stratus -- is this Unix based,  is this a good database machine?

	Not a UNIX box, there is a UNIX kernal that runs under there
	VOS operating system. Performance is a question using this
	method
>  
> Deleted fears about being swamped with product stuff....
> 
> Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University
> (billo@cmx.npac.syr.edu)

Based upon the outline provided, there appear to be two approaches.

Hardware - Use a database machine (like britton lee, etc). These
	usually can be accessed thru a network of some sort. The
	machines containing the users would reach the machine thru
	a network.

Software - The software approach, this gets interesting. At this
	point the outline of how your users are connected to the
	data becomes important. If the database can sit on a
	database machine and the users exist on different
	machines. You should look into requester/server designs.
	We are currently doing some development in the SYBASE 
	environment and have found the server to be very well
	hardened. 
	
	The dataserver from SYBASE manages the locks, and the 
	transaction logging for you.

	I know of several organizations that are developing
	applications that need high transaction rates and garenteed
	recovery (all of the major financial exchanges). All of 
	these people are using SYBASE as an database engine. 

These thoughts, ideas, and misstakes are mine and mine alone.

Peter Gutmann		UUCP:	cmcl2!manhat!mancol!murphy!peterg
(212) 227-7706			philabs!pencom!murphy!peterg

pavlov@hscfvax.harvard.edu (G.Pavlov) (05/11/88)

In article <3102@edm.UUCP>, news@edm.UUCP (news software) writes:
> #> I think (from the propaganda I've heard) that something like oracle
> #> might sort a fit your bill. One of the ways that they do this is by use
> #> of raw disk I/O rather than putting the data base into the filesytem space.
> #> 
> #   But using raw disk i/o per se doesn't guarantee anything, does it ?  I think
> It tends to promise that address locality implies spacial locality. This is
> a nice assumption to be able to make when you want to improve your speed.
> -- 

  I did not mean that raw disk i/o can't be put to good use.  But particular 
  techniques and technologies are just that; they may make great copy in an
  advertising campaign, but they are no guarantee of a superior (or even good)
  product.

phil@osiris.UUCP (05/12/88)

Well, I promised this to a couple of people, so here it is.  Excuse the
cross-posting to comp.arch, but since I haven't heard anything from anyone
about "getting it the hell out of here", I'll assume that it's OK to leave
it for now.  Let me know (nicely, please) if this changes.

I posted a followup last week to an article from Bill O'Farrell of Syracuse
University, who was requesting info on maintaining large databases on UNIX
systems.  I mentioned that I was willing to discuss our experiences
privately with Bill.  Since then, several other people have sent me mail
asking that I either post or email more info on our applications, and I
noticed that there has been some interest in this topic from different
people, so here's a quick summary of what we're doing.  This report was
written with significant help from and ultimate approval of Dr. Steve
Kahane, who is currently our application development director.


We support a very heterogenous production environment at JHH.  In 1983 we
chose to network existing sub-systems using Ethernet and chose Sun's RPC
mechanism (using both TCP and XNS) for interprocess communication support.
Significant integration of an IBM MVS/CICS sub-system, several different
types of MUMPS sub-systems, and our UNIX sub-systems has been achieved.

Most of our new development has been in UNIX using C.  More specifically:

	UNIX production hardware: Pyramid 98xe and 9820 computers with lots
		of disk and main memory

	OS: Pyramid OSx (dual port of 4.2BSD and SysV.2)

	DBMS: Relational Technology INGRES (5.0/04a)

	Databases: Several, ranging from several MB to about 1.5 GB, with
		tables as large as 500 MB

	Avg. # of concurrent users: ~35 from UNIX machines (increasing
		rapidly), another 5-6 (and probably more) from non-UNIX
		clients

	UNIX production applications: Patient ID, patient demographics and
		history, emergency room, radiology scheduling, report
		transcription and lookup, radiology film tracking,
		outpatient clinic management and appointment scheduling,
		automated history report generation, and others

	IBM production: Pre-admission, Admission, Transfer & Discharge,
		Pharmacy, several financial sub-systems, and others

	MUMPS production: Laboratory Systems, Surgical Pathology, Oncology
		(OCIS), Library System, and several medical literature
		databases

(The above lists only those production systems that are integrated with our
databases over the network.)

If anyone has any specific questions, feel free to ask and I'll get more
precise info out ASAP.

                                                                 Phil Kos
                                                      Information Systems
...!uunet!pyrdc!osiris!phil                    The Johns Hopkins Hospital
                                                            Baltimore, MD