billo@cmx.npac.syr.edu (Bill O) (05/03/88)
Help! By Friday we need to know if there is a Unix-based box that can work as a very high-performance data-base server. Yes, there are a million Unix boxes out there, but a data-base server has to be able to cope with concurrent access by multiple (possibly several hundred) users. Simple file locking isn't good enough -- users should never have to read "file locked" error messages. Also, the disk performance should be very good. What is the nature of the data base? Would you believe we're not sure? The amount of data will probably be very large, and it may or may not be based on the relational model. Why am I asking such a vague question? Because we are trying to develop corporate funding for a project that is still in the jello stage of conception (you can see it, and it glistens, but you still can't get a good grip on it). What we want to do is to be able to talk about existing Unix-based solutions to very large data sharing problems. Names we know about: Gould -- fast disks, but is there data-sharing software? Most other Big Unix Boxes in the world -- ditto above comment. Tandem -- is this Unix based? Stratus -- is this Unix based, is this a good database machine? I'm nervous about posting this, because I expect every Unix box maker to tell me about their great machines. That's fine, but are there proven very-large data-sharing/data-base applications on those machines? Are the disks very high performance? SCSI ports probably won't hack it. We will consider both uniprocessor and multiprocessor solutions. Non-Unix solutions, while interesting, are not the topic of this posting. Please mail to me, don't post. If others express an interest, and if the response is informative, I'll post a summary. Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University (billo@cmx.npac.syr.edu)
phil@osiris.UUCP (Philip Kos) (05/05/88)
Bill - Please get in touch with me if you can. The return mail path I got out of news was completely worthless, as usual, and I don't want to clutter the net with this stuff. You should be able to get in touch with me through the uucp paths in my signature - they're the only way, as far as I know, to get here from anywhere else. BTW, in case you couldn't guess, we're doing gigabyte-database OLTP applications on UNIX boxes here at Hopkins... Phil Kos ...!decvax!decuac!\ Information Systems ...!uunet!mimsy!aplcen!osiris!phil The Johns Hopkins Hospital ...!allegra!/ Baltimore, MD
markd@rtech.UUCP (Mark P. Diamond) (05/05/88)
From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O): > Help! By Friday we need to know if there is a Unix-based box that > can work as a very high-performance data-base server. Take a look at the Sequent Symmetry. This tightly coupled UNIX multiple processor is an optimum machine for running Relational Databases. In a recent project with Relational Technology a sixteen processor Symmetry achieved 104 Transactions per second running the Debit Credit Benchmark (TP1) on a fully sized 1.1G Byte database, at a price performance ratio at about 1/8 that of Tandem (all of this was verfied by the independent Codd & Date consulting group). This is the fastest (by a factor of about three) any UNIX box has achieved for this benchmark. RTI, Oracle, Informix and Unify run Sequent for their in-house applications. Mark <> PS If anyone would like a full write up of this project send me your postal address. <> <> <> Mark P. Diamond {sun, cbosgd, amdahl, mtxinu}!rtech!markd from Sequent Computer Systems onsite at Relational Technology
aglew@urbsdc.Urbana.Gould.COM (05/05/88)
>/* ---------- "Unix machines for large databases" ---------- */ >... >Names we know about: > Gould -- fast disks, but is there data-sharing software? > Most other Big Unix Boxes in the world -- ditto above comment. > Tandem -- is this Unix based? > Stratus -- is this Unix based, is this a good database machine? > >I'm nervous about posting this, because I expect every Unix box maker >to tell me about their great machines. Sorry, couldn't resist, but I'll be brief. It's nice to see us (Gould) getting the respect we deserve, at the top of a list :-). I've heard of applications like this, and will try to get people who know the details to contact you. Thanks! Andy "Krazy" Glew. Gould CSD-Urbana. 1101 E. University, Urbana, IL 61801 aglew@gould.com - preferred, if you have MX records aglew@xenurus.gould.com - if you don't ...!ihnp4!uiucuxc!ccvaxa!aglew - paths may still be the only way My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.
steve@edm.UUCP (Stephen Samuel) (05/05/88)
From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O): > Help! By Friday we need to know if there is a Unix-based box that > can work as a very high-performance data-base server. Yes, there are a I think (from the propaganda I've heard) that something like oracle might sorta fit your bill. One of the ways that they do this is by use of raw disk I/O rather than putting the data base into the filesytem space. I assume that SMD drives are fast enough for you? ------------- Stephen Samuel {ihnp4,ubc-vision,vax135}!alberta!edm!steve or userzxcv@uqv-mts.bitnet -- ------------- Stephen Samuel Disclaimer: You betcha! {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve BITNET: USERZXCV@UQV-MTS
UH2@PSUVM.BITNET (Lee Sailer) (05/06/88)
In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) says: > >Help! By Friday we need to know if there is a Unix-based box that >can work as a very high-performance data-base server. Yes, there are a Sure there is. You can run Unix on a Cray.
eric@pyrps5 (Eric Bergan) (05/06/88)
In article <2050@rtech.UUCP> markd@rtech.UUCP (Mark P. Diamond) writes: >From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O): >> Help! By Friday we need to know if there is a Unix-based box that >> can work as a very high-performance data-base server. > >Take a look at the Sequent Symmetry. This tightly coupled >UNIX multiple processor is an optimum machine for running >Relational Databases. Rather a sweeping statement... >In a recent project with Relational >Technology a sixteen processor Symmetry achieved 104 Transactions >per second running the Debit Credit Benchmark (TP1) on >a fully sized 1.1G Byte database, at a price performance ratio >at about 1/8 that of Tandem (all of this was verfied by >the independent Codd & Date consulting group). This is the >fastest (by a factor of about three) any UNIX box has achieved >for this benchmark. Just to make sure we are comparing apples and apples here, I'm a little surprised by the 1.1 Gbyte figure. How many tuples were you running with in account? The "standard" 1,000,000 (with 1000 teller and 100 branch tuples) or did you scale it up? If scaled up, it is probably not appropriate to compare against any other tests that have been run, since the decreased contention on the branch relation will improve performance. Did you run with just one history relation, or did you split that up, and if so, into how many pieces? I assume that this was with journaling turned on? >RTI, Oracle, Informix and Unify run Sequent >for their in-house applications. In general, claiming that a database vendor is using one particular platform or another for in-house applications is similar to claiming that the Bell Labs has bought your computer - it's not a very exclusive club. Oracle has purchased several Pyramid 9840s for their world-wide sales applications. Several of the database companies use Pyramid's for their file servers. Obviously I am biased, but claiming that any computer is "optimum" for so broad a range of possible uses as relational database applications seems a little questionable.
pavlov@hscfvax.harvard.edu (G.Pavlov) (05/06/88)
In article <2050@rtech.UUCP>, markd@rtech.UUCP (Mark P. Diamond) writes: > ...................... In a recent project with Relational > Technology a sixteen processor Symmetry achieved 104 Transactions > per second running the Debit Credit Benchmark (TP1) on > a fully sized 1.1G Byte database, at a price performance ratio > at about 1/8 that of Tandem (all of this was verfied by > the independent Codd & Date consulting group). ................ > > PS If anyone would like a full write up of this project send me > your postal address. I do not doubt the benchmark and I would appreciate a copy of the write-up. But there has been a working relationship of one sort or another between the C&D group and RTI for a long time. So "independent" is overstating things a bit......
pavlov@hscfvax.harvard.edu (G.Pavlov) (05/06/88)
In article <3091@edm.UUCP>, steve@edm.UUCP (Stephen Samuel) writes: > From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O): > > Help! By Friday we need to know if there is a Unix-based box that > > can work as a very high-performance data-base server. Yes, there are a > > I think (from the propaganda I've heard) that something like oracle might sort > a fit your bill. One of the ways that they do this is by use of raw disk I/O > rather than putting the data base into the filesytem space. > But using raw disk i/o per se doesn't guarantee anything, does it ? I think that the most relevant part of your message was the phrase in the parens. greg pavlov, fstrf, amherst, ny
esf00@amdahl.uts.amdahl.com (Elliott S. Frank) (05/07/88)
In article <41647UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes: >In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) says: >> >>Help! By Friday we need to know if there is a Unix-based box that >>can work as a very high-performance data-base server. Yes, there are a > >Sure there is. You can run Unix on a Cray. Or on an Amdahl 5890/5990 running UTS. You may have a floor space problem past several Tb. -- Elliott Frank ...!{hplabs,ames,sun}!amdahl!esf00 (408) 746-6384 or ....!{bnrmtv,drivax,hoptoad}!amdahl!esf00 [the above opinions are strictly mine, if anyone's.] [the above signature may or may not be repeated, depending upon some inscrutable property of the mailer-of-the-week.]
davek@rtech.UUCP (Dave Kellogg) (05/07/88)
I can understand Eric's skepticism because if someone told me 6 months ago that INGRES would exceed 100 TPS I might have asked them if they bumped their head on the way to the office. However, I know what Mark Diamond says is true because I was in the room with him, along with Tom Sawyer from Codd & Date consulting, when INGRES hit 104 TPS. To appease any cynics I'll list the one caveat of the benchmark first: * The INGRES system (running on a Sequent Symmetry machine) which hit 104 TPS was running a prototype version of RTI's next release. As part of normal prototyping activity we asked ourselves "Just how fast can this go?" We convinced Sequent to let RTI use a large Symmetry machine, and we were off... Eric was surprised about the 1+ Gigabyte database size. In fact, the benchmark was run with a DebitCredit defined 100 TPS sized database. Before continuing, a little background on the DebitCredit benchmark is in order. DebitCredit is a well-defined standard benchmark and was written in the late 1970's by Jim Gray and about 20 other database professionals. The paper was eventually published in DATAMATION under the title "A Measure of Transaction Processing" by the authors "Anon et al." Rumour has it the authors wished to remain secret due to flame-ups that occurred after Dave DeWitt and Dina Bitton wrote their paper on DBMS benchmarking. DebitCredit is one of three benchmarks described in the paper, and various degenerate forms of DebitCredit have become loosely known in the industry as "TP1." The problem with TP1, and the ensuing "TPS" (transactions/second) measurements, is that most vendors size the databse irregularly (i.e. smaller than DebitCredit defines). Thus, as Eric points out, when comparing TPS measurements one is often comparing apples and oranges. For the Silver Bullet benchmarks, to which Mark refers, the database was sized at 100 TPS, or 10 Million 100 byte account records, 10,000 100 byte teller records, and 1,000 100 byte account records. Thus, a real purist would rob RTI of the 104 TPS (and grant only 100 TPS) because the database was sized for 100 TPS. (If you do the multiplication you'll see that the account relation alone is 1 gigabyte of data.) Overall, the benchmark conformed to DebitCredit standards quite well, including the submission of tranasctions via a network. However, there were a few things we didn't do 100% to the DebitCredit spec. But then again, we did a couple to exceed the spec. In any case, the auditor's report is being published tommorrow so all DebitCredit whizzes can take a look. In conclusion, I saw one "pop-off" on the net (flame semi-on) which questioned the integrity of the auditor since "Codd and Date and RTI have always had a good working relationship..." I'll reply to that with * If we wanted to pay someone to lie we wouldn't have paid Codd and Date's rates! ;-) * Mr. Sawyer was the auditor of Tandem's 208 TPS benchmark. * I personally hope that he is not on the net to see this random assault on his character. * If you read his report you'll see that he is certainly impartial. Finally, if you're interested in seeing the benchmark report you can reply to this message with a postal address and I'll do my best to get you a copy. Dave Kellogg ucbvax!rtech!davek (might need a mtxinu before the rtech) "Hmmm. We hit 100 TPS, can I go to bed now??"
chuck@amdahl.uts.amdahl.com (Charles Simmons) (05/07/88)
In article <41647UH2@PSUVM> UH2@PSUVM.BITNET (Lee Sailer) writes: >In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) says: >> >>Help! By Friday we need to know if there is a Unix-based box that >>can work as a very high-performance data-base server. Yes, there are a > >Sure there is. You can run Unix on a Cray. Unless you have a real big need for the vector processor of the Cray, an Amdahl machine may well provide better performance at a lower cost. -- Cs
pardo@june.cs.washington.edu (David Keppel) (05/08/88)
esf00@amdahl.uts.amdahl.com (Elliott S. Frank) writes: >UH2@PSUVM.BITNET (Lee Sailer) writes: >>billo@cmx.npac.syr.edu (Bill O) says: >>>Help! By Friday we need to know if there is a Unix-based box that >>>can work as a very high-performance data-base server. Yes, there are >>Sure there is. You can run Unix on a Cray. >Or on an Amdahl 5890/5990 running UTS. You may have a floor space >problem past several Tb. Check out optical disk drives. I believe DEC is now selling them for the VAX line; I'd immagine that most other vendors have similar products in mind. They can solve your floor space problems well beyond "several Tb", and, being write-once-read-many (WORM) are well- suited to an application requiring a permanent history. Typically they are large enough so that you don't fill them very fast even if you don't care about a permanent record. ;-D on ( Bliss is Bliss, Ignorance is Ignorance, I'm happy ) Pardo
news@edm.UUCP (news software) (05/09/88)
From article <564@hscfvax.harvard.edu>, by pavlov@hscfvax.harvard.edu (G.Pavlov):
# In article <3091@edm.UUCP>, steve@edm.UUCP (Stephen Samuel) writes:
#> From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O):
#> I think (from the propaganda I've heard) that something like oracle
#> might sort a fit your bill. One of the ways that they do this is by use
#> of raw disk I/O rather than putting the data base into the filesytem space.
#>
# But using raw disk i/o per se doesn't guarantee anything, does it ? I think
It tends to promise that address locality implies spacial locality. This is
a nice assumption to be able to make when you want to improve your speed.
--
-------------
Stephen Samuel
{ihnp4,ubc-vision,vax135}!alberta!edm!steve
or userzxcv@uqv-mts.bitnet
milbery@rtech.UUCP (Jim Milbery) (05/09/88)
In article <2050@rtech.UUCP> markd@rtech.UUCP (Mark P. Diamond) writes: >From article <428@cmx.npac.syr.edu>, by billo@cmx.npac.syr.edu (Bill O): >> Help! By Friday we need to know if there is a Unix-based box that >> can work as a very high-performance data-base server. > >Take a look at the Sequent Symmetry. This tightly coupled >UNIX multiple processor is an optimum machine for running >Relational Databases. Relational Technology has been working with several unix-based multiprocessing machines with the INGRES product. Significant performance gains can be had using cpu/io improvements that these vendors offer. Pyramid is also offering significant features in a balanced fashion (cpu, disk i/o and terminal i/o) that INGRES can take advantage of. No selling here, but INGRES is hot, and can take advantage of the multiprocessing capabilities that Pyramid and Sequent and others offer. ** Opinions are my own and not necessarily RTI's. jim Milbery, RTI Technical Support Burlington,MA
fox@alice.marlow.reuters.co.uk (Paul Fox) (05/11/88)
In article <428@cmx.npac.syr.edu> billo@cmx.npac.syr.edu (Bill O'Farrell) writes: >Help! By Friday we need to know if there is a Unix-based box that >can work as a very high-performance data-base server. Yes, there are a >million Unix boxes out there, but a data-base server has to be able to >cope with concurrent access by multiple (possibly several hundred) >users. Simple file locking isn't good enough -- users should never >have to read "file locked" error messages. Also, the disk performance >should be very good. > I have no answers for you but I do have some questions ... Is your database mainly for reads or reads and writes. If its mainly for reading information, then having a large disk cache, (eg 8MB) will far outweigh the speed of the disk. If you really are going to try to support hundreds of users, then one major problem will be finding a network interface that can reliably support this many virtual circuits. One of the biggest problems with all machines seems to be the limit on the number of concurrent sessions. Not only is a large amount of memory (ie several K) needed per session, but also one has to consider things like the size of the ARP tables. ===================== // o All opinions are my own. (O) ( ) The powers that be ... / \_____( ) o \ | /\____\__/ _/_/ _/_/ UUCP: fox@alice.marlow.reuters.co.uk
boughter@ghostwheel.UUCP (Ellen Boughter) (05/11/88)
1.1 GB *is* the standard size of the database for Debit_Credit, according to the original specification. That's without mirroring, which is optional. If the HISTORY relation is not divided (which it probably isn't, on the Sequent), I wonder if the tail is being locked? This has been an issue for us; if the tail is not locked, one runs the risk of losing history data. If it is locked, there is a big hot spot.
peterg@murphy (Peter Gutmann) (05/11/88)
In article <428@cmx.npac.syr.edu>, billo@cmx.npac.syr.edu (Bill O) writes: > deleted stuff about "need this yesterday" and record locking..... > > What is the nature of the data base? Would you believe we're not > sure? The amount of data will probably be very large, and it may or > may not be based on the relational model. Why am I asking such a > vague question? Because we are trying to develop corporate funding > for a project that is still in the jello stage of conception (you can > see it, and it glistens, but you still can't get a good grip on it). > What we want to do is to be able to talk about existing Unix-based > solutions to very large data sharing problems. Without knowing the "nature" of the database there are some questions that must be answered about the logical location of the data (two phase commit), how much transaction logging is required, distribution of the users, etc. > > Names we know about: > Gould -- fast disks, but is there data-sharing software? > Most other Big Unix Boxes in the world -- ditto above comment. > Tandem -- is this Unix based? Tandem has a UNIX box. It is based upon the Altos 3068 System V using the MC68020 as the processor. The rest of there products do not use UNIX. > Stratus -- is this Unix based, is this a good database machine? Not a UNIX box, there is a UNIX kernal that runs under there VOS operating system. Performance is a question using this method > > Deleted fears about being swamped with product stuff.... > > Bill O'Farrell, Northeast Parallel Architectures Center at Syracuse University > (billo@cmx.npac.syr.edu) Based upon the outline provided, there appear to be two approaches. Hardware - Use a database machine (like britton lee, etc). These usually can be accessed thru a network of some sort. The machines containing the users would reach the machine thru a network. Software - The software approach, this gets interesting. At this point the outline of how your users are connected to the data becomes important. If the database can sit on a database machine and the users exist on different machines. You should look into requester/server designs. We are currently doing some development in the SYBASE environment and have found the server to be very well hardened. The dataserver from SYBASE manages the locks, and the transaction logging for you. I know of several organizations that are developing applications that need high transaction rates and garenteed recovery (all of the major financial exchanges). All of these people are using SYBASE as an database engine. These thoughts, ideas, and misstakes are mine and mine alone. Peter Gutmann UUCP: cmcl2!manhat!mancol!murphy!peterg (212) 227-7706 philabs!pencom!murphy!peterg
pavlov@hscfvax.harvard.edu (G.Pavlov) (05/11/88)
In article <3102@edm.UUCP>, news@edm.UUCP (news software) writes: > #> I think (from the propaganda I've heard) that something like oracle > #> might sort a fit your bill. One of the ways that they do this is by use > #> of raw disk I/O rather than putting the data base into the filesytem space. > #> > # But using raw disk i/o per se doesn't guarantee anything, does it ? I think > It tends to promise that address locality implies spacial locality. This is > a nice assumption to be able to make when you want to improve your speed. > -- I did not mean that raw disk i/o can't be put to good use. But particular techniques and technologies are just that; they may make great copy in an advertising campaign, but they are no guarantee of a superior (or even good) product.
phil@osiris.UUCP (05/12/88)
Well, I promised this to a couple of people, so here it is. Excuse the cross-posting to comp.arch, but since I haven't heard anything from anyone about "getting it the hell out of here", I'll assume that it's OK to leave it for now. Let me know (nicely, please) if this changes. I posted a followup last week to an article from Bill O'Farrell of Syracuse University, who was requesting info on maintaining large databases on UNIX systems. I mentioned that I was willing to discuss our experiences privately with Bill. Since then, several other people have sent me mail asking that I either post or email more info on our applications, and I noticed that there has been some interest in this topic from different people, so here's a quick summary of what we're doing. This report was written with significant help from and ultimate approval of Dr. Steve Kahane, who is currently our application development director. We support a very heterogenous production environment at JHH. In 1983 we chose to network existing sub-systems using Ethernet and chose Sun's RPC mechanism (using both TCP and XNS) for interprocess communication support. Significant integration of an IBM MVS/CICS sub-system, several different types of MUMPS sub-systems, and our UNIX sub-systems has been achieved. Most of our new development has been in UNIX using C. More specifically: UNIX production hardware: Pyramid 98xe and 9820 computers with lots of disk and main memory OS: Pyramid OSx (dual port of 4.2BSD and SysV.2) DBMS: Relational Technology INGRES (5.0/04a) Databases: Several, ranging from several MB to about 1.5 GB, with tables as large as 500 MB Avg. # of concurrent users: ~35 from UNIX machines (increasing rapidly), another 5-6 (and probably more) from non-UNIX clients UNIX production applications: Patient ID, patient demographics and history, emergency room, radiology scheduling, report transcription and lookup, radiology film tracking, outpatient clinic management and appointment scheduling, automated history report generation, and others IBM production: Pre-admission, Admission, Transfer & Discharge, Pharmacy, several financial sub-systems, and others MUMPS production: Laboratory Systems, Surgical Pathology, Oncology (OCIS), Library System, and several medical literature databases (The above lists only those production systems that are integrated with our databases over the network.) If anyone has any specific questions, feel free to ask and I'll get more precise info out ASAP. Phil Kos Information Systems ...!uunet!pyrdc!osiris!phil The Johns Hopkins Hospital Baltimore, MD