jas@rtech.UUCP (05/29/87)
Thought I'd post a few thoughts on database machines to try to move the discussion away from the meta-topic of what constitutes unacceptable advertising. In the following paragraphs, "vanilla" means "general purpose" (not "unremarkable"), and "DBMS" means the DBMS proper, not including user interfaces. It seems to me that database machines try to achieve superior performance in two different ways: (1) by designing special-purpose hardware that can do database-intensive things much faster than vanilla hardware; and (2) by eliminating the vanilla operating system, allowing the DBMS software to run directly on the hardware (a.k.a. writing a special-purpose OS that will support only the DBMS software). Personally, I'm a good deal more skeptical about (1) than about (2). Special-purpose hardware has high leverage in some application areas: 4 x 4 matrix multiplication hardware is pretty handy for doing real-time 3-D transformations, for example. I'm not convinced that DBMS software has similar, high-enough-leverage operations. While Britton-Lee tries to develop special-purpose hardware that does some subset of the data manager's job so fast that the ENTIRE data manager runs 10 times faster, the vanilla hardware vendors are building computers that do EVERYTHING faster. My money says the special-purpose hardware people are going to have a hard time keeping up with Sun, DEC, and the rest of the vanilla hardware vendors. (2) seems more promising. Lots of vanilla OS's are lousy platforms for DBMS implementation (UNIX is certainly not an exception). Strictly off the cuff, I could imagine a 100% performance improvement to be had by implementing a DBMS on a special-purpose OS (or maybe on a vanilla OS that provided the right services, if such a beast existed), instead of on UNIX. Comments? (Oh, yes: these are my personal opinions only.) -- Jim Shankland ..!ihnp4!cpsc6a!\ rtech!jas ..!ucbvax!mtxinu!/
mjr@well.UUCP (Matthew Rapaport) (05/29/87)
PICK by Pick Systems (So. Cal.) is a perfect example of the #2 approach to database machines, and there is a very good discussion of the problems and promises (and the promises are not great) of the #1 approach in C.J. Date's "An Introduction to Database Systems" Vol II (Addison Wesley 1983.
forrest@blia.BLI.COM (Jon Forrest) (05/29/87)
In article <863@rtech.UUCP>, jas@rtech.UUCP (Jim Shankland) writes: > Thought I'd post a few thoughts on database machines to try to move the > discussion away from the meta-topic of what constitutes unacceptable > advertising. Thanks. One of the reasons I posted my original message was to stir up activity in this group. > > It seems to me that database machines try to achieve superior > performance in two different ways: > (1) by designing special-purpose > hardware that can do database-intensive things much faster than vanilla > hardware; I'm skeptical about this one too but for non-technical reasons. As new technology comes out we will probably always be able to construct a database machine that uses it. This will reestablish the price/ performance ratio that we probably will have lost due using what has become old technology in the previous version of our database machine. The problem here is that hardware development is terribly expensive so it will be very difficult for us (or anyone else) to make any money following this approach. The approach taken by Sybase will result in machines with less performance than ours but also with lower costs. This could result in a price/performance ration as good (or better) than ours. If their sofware is as good as ours then they will have a very competitive product, all other things being equal. > (2) by eliminating the vanilla operating system, allowing > the DBMS software to run directly on the hardware (a.k.a. writing a > special-purpose OS that will support only the DBMS software). > This is probably the largest difference between us and Sybase. I think your comments were exactly right on this. Our operating system doesn't have to worry about lots of the stuff that Unix or VMS has to look after. I don't know anything about the kernal that runs on the IDM but from what I've heard, it is a very important reason why our performance is so good, especially given the age of the IDM. These opinions are mine only, which should be clear since the first one contradicts our whole approach. I should also add that I make no claim of being a database expert. I'm in charge of our VMS host software. Jon Forrest ucbvax!mtxinu!blia!forrest
vollmer@manta.UUCP (Tom Vollmer) (05/31/87)
We went through a bit a agonizing a year or so ago. There was a heavily loaded VAX 11/780 also running a software RDBMS and complaints about performance were the soup du jour. We were seriously looking at some database machines for the application, when the VAX was upgraded to an 8600 for other reasons. We piggy backed on the upgrade and performance complaints disappeared, and no the 8600 has not been bogged during with increasing load during this year. I do understand there are dedicated applications that make sense for database engines but 1) sometimes its nice to have your applications on general purpose systems and get a 'free' ride on a hardware upgrade and 2) sometimes a faster CPU makes up for a lot of problems with RDBMS and OS design. Tom Vollmer (vollmer@nosc) Computer Sciences Corporation3
miket@blic.UUCP (06/08/87)
Jim Shankland (jas@rtech.UUCP) has suggested the two major differences between a 'traditional' software RDBMS and a data base machine based RDBMS are (1) the presence of special purpose hardware and (2) the use of a special purpose O/S. I agree. He went on to speculate that "the special-purpose hardware people are going to have a hard time keeping up with Sun, DEC, and the rest of the vanilla hardware vendors." My problem with this speculation is that it implies that a vendor of database machine based RDBMSs must use a fixed mix of special purpose hardware and software over time and at all levels of its product line. For a counter example, look at the three biggest U.S. 'database machine' vendors today (Britton Lee, Teradata, and Sybase) and you will find a large varriation special purpose hardware content. Teradata has the most specialized hadware (a group of up to 1024 micros interconnected by a special bus called 'Y-net'). Sybase has no special hadware at all (it runs Suns and Vaxen). Britton Lee's products are in between (a mini computer like design that has been 'biased' in some ways to make it run RDBMS faster.) In Britton Lee's case the amount of special purpose hardware even depends on the product line (higher priced / higher performance BL 700 having more special purpose hardware than the lower priced / lower performance BL 300). Not suprisingly the Teradata is the most expensive, while Sybase is least expensive, and Britton Lee is again in between. This same pattern also occurs over time. In 1980 'N typical foo type transactions' might have required special hardware. In 1987 that same load might be achieveable with normal hardware; but might be cheaper if done with special purpose hardware. By 1990 it may be cheaper if done with vannilla hardware. But even in 1990 there will be other loads (perhaps 2*N or 10*N, or 'foo bar' transactions rather than 'foo' transactions) which do require special purpose hardware. Does that restrict database machines to the high end? Perhaps, we'll just have to wait and see. Does that restrict vendors of database machine based Relational Systems to the high end? Only if they perceieve themseleves as "special-purpose hardware people". If we perceieve ourseleves as "high performance relational database" people then we don't have to compete with DEC and Sun. We buy from others what works well and we make only what we can do better. The exact mix of what we make vs buy depends on the price/performance target for the product and availability of approprate products from other vendors. In summary, perhaps Jim and I do agree about the low end. In the medium end perhaps a mixture of vanilla hardware and some special hardware. In the high end for for some time to come I expect to see special purpose hardware. --miket (Mike Tossy) (I do work for Britton Lee, Inc. However these are only my personal opinions and do not necessarily represent they views of Britton Lee, Inc.)
larry@ingres.Berkeley.EDU (Larry Rowe) (06/17/87)
Several comments on the recent discussion of ``database machines.'' 1. I too am skeptical that custom hardware can be made price/performance competitive with software database systems. While I agree with the folks from Britton-Lee that they can use new technology to build the next generation hardware sooner than a vanilla hardware vendor, I don't think they can sell enough boxes to make a very profitable business. Britton-Lee has had a rough time the past 12-18 months because they are selling products based on 2-5 year old technology (Z8000 + custom processor). The rumors about their new machine are that it is a tightly-coupled, shared memory processor. You can buy the same hardware from a vanilla vendor today and run a software DBMS on it. Examples are: stratus, sequent, encore, mips, etc. The software DBMS's will do the same thing that Britton-Lee will do in terms of shared memory buffer managers, etc so the solutions will be roughly the same. (Of course, at any given time one vendor's product will be ahead or behind another vendor's -- Britton-Lee has done a good job delivering DBMS software.) Now here's the rub. The vanilla hardware vendors will sell several thousand of their boxes. When DEC delivers their tightly-coupled, shared memory processor, they will deliver tens of thousands. Britton-Lee will be lucky to sell a thousand. The vanilla vendors will have more sales over which to amortize their costs. They will drive the cost down on the boxes as they compete and Britton-Lee will have a harder time maintaining their margins. The key advantage that Britton-Lee has is their software. The software DBMS vendors have not directly attacked this market (e.g., by oem'ing hardware and doing more software customization) because the market for sales is much, much greater in the ``run everywhere'' and distributed, heterogeneous DBMS markets. They have 50-100 man-years of development to do to be competitve in that market. The dbmachine market is too small. 2. The above analysis says that Sybase has a creditable strategy because they are doing software customization on a few machines. A good example is their Unix kernel mods for the Sun. They deliver improved performance at a specific cost -- running a nonstandard OS. It remains to be seen if Sybase can deliver a robust system that matches the advertised performance claims in a production environment. Also, they will be pressured into ``running everywhere'' (they've announced a VAX product and their recent venture with Microsoft suggests a lot of work on PC hardware) and they will quickly fall into the morass of customizing the code for N environments (e.g., do you run on the VAX cluster yet, how's your MAP network protocol support, ...). 3. Teradata sells custom hardware and software. However, it is my opinion that 90% of their advantage comes from the fact they are running a distributed DBMS on multiple machines. The only novel feature of their architecture is the Y-Net that they claim gives them big performance improvements. I'd really love to see a benchmark with identical hardware/software except a different network. One advantage that I can see to the Y-Net is the parallel sorting capability. However, Jim Gray wrote a parallel sort package at Tandem that speeded-up sorting to roughly twice the time to read the data (i.e., you must read and write the data at least once -- sorting is overlapped completely with it). So, how does a software parallel sort compare to the Y-Net. Another possible advantage of the Y-Net is response-time and throughput during heavy loads. A loaded LAN can be clogged when many messages are clogging the net. Remembe that distributed DBMS's will ship a lot of data around to answer adhoc queries. Another interesting experiment, how fast an ethernet or token-ring is needed to achieve the same performance. 4. The Tandem high transaction rates come from a vanilla distributed relational DBMS running on vanilla hardware. The big difference is that they have spent 10 years optimizing their storage system, buffer manager, logging system, etc. 5. Another thing. When doing performance comparisons, it is important to compare apples and apples. Numbers ought to be $'s/xact (guess what, a program on an ibm 3094 is faster than a vax!) and/or use identical hardware (2 processors are better than 1). i'm tired of seeing claims that a dbmachine is faster than a loaded central machine. of course it is, the central machine has other things to do. compare the performance/cost to buying a larger central machine or buying a second general purpose processor. ------- Bottom line: 1. Hardware is nice but software is cheaper and probably faster. (Larry's Lament: Hardware companies build bigger valuations faster because of the size of the business (i.e., more revenues and more expenses). Software companies ought to produce higher higher profits....) 2. Distributed DBMS's are a big, big win. Every DBMS vendor better have one by 1990 or they be seriously disadvantaged in the marketplace. So, where are all these vendors going to find captial to fund a 20 man-year project to build a distributed DBMS? 3. Benchmark wars will continue to be fought and they might tell you something, and then again, they might lie.
billc@blia.BLI.COM (Bill Coffin) (06/22/87)
>From larry@ingres.Berkeley.EDU (Larry Rowe) Wed Jun 17 09:19:02 1987 >Several comments on the recent discussion of ``database machines.'' > >1. I too am skeptical that custom hardware can be made price/performance >competitive with software database systems. [ ... ] I may be biting the hand that feeds, but I agree with this. Partly, Britton Lee's approach has historical reasons. When the first BLI box came out, there was no off-the-shelf hardware that could easily be used. That's no longer the case. >2. The above analysis says that Sybase has a creditable strategy [ ... ] > [ ... ] they will quickly fall >into the morass of customizing the code for N environments [ ... ] This is a problem on all distributed and all server architectures. Even on a one-machine server architecture you need host software. > [ ... ] >5. Another thing. When doing performance comparisons, it is important to >compare apples and apples. Numbers ought to be $'s/xact (guess what, a >program on an ibm 3094 is faster than a vax!) and/or use identical hardware >(2 processors are better than 1). i'm tired of seeing claims that a dbmachine >is faster than a loaded central machine. of course it is, the central machine >has other things to do. compare the performance/cost to buying a larger >central machine or buying a second general purpose processor. This is odd. When you compare a loaded front end vs. a dbmachine, you are comparing real-life usages. If you care about db speed, then you must consider the typical work loads. Secondly, buying a bigger central machine may solve a "raw" speed problem, but server architectures still solve the problem of sensitivity to the host work-load. Most people really do care if some host process causes db accesses to slow to a crawl, or if db access causes other important (non-db) processes to wimp out. A faster machine may get things going faster, but the sensitivity is still there. (Server architectures have other benefits as well. I won't go into all the server-architecture arguments here, but you can't ignore these factors.) >2. Distributed DBMS's are a big, big win. [ ... ] Why? I'm convinced that many of the people who THINK they want distributed DBMS's REALLY need server architectures. See Jim Gray's article in the May UNIX REVIEW. I won't elaborate here -- but I would like to see a distribution vs. server discussion on the net. For the record, I'm not "opposed" to distributed DBMS architectures, but I DO think they're being oversold. There are job mixes that make servers look bad, and there are job mixes that will make distributed dbms's look bad. Mr. Natural says, "get the right tool for the right job." >3. Benchmark wars will continue to be fought and they might tell you something, > and then again, they might lie. Or to paraphrase an old quote, "There are lies, there are damned lies, and then there are benchmarks." -- W.H.Coffin. billc@blia.BLI.COM (ucbvax!{mtxinu|ucsfcgl}!blia!billc) >> the usual disclaimer about my employer and my wretched opinions. << >> the usual witticisms that swell netnews to ridiculous proportions. <<
larry@ingres.Berkeley.EDU (Larry Rowe) (06/23/87)
In article <2861@blia.BLI.COM> billc@blia.BLI.COM (Bill Coffin) writes: >>From larry@ingres.Berkeley.EDU (Larry Rowe) Wed Jun 17 09:19:02 1987 >>Several comments on the recent discussion of ``database machines.'' >> >>2. The above analysis says that Sybase has a creditable strategy [ ... ] >> [ ... ] they will quickly fall >>into the morass of customizing the code for N environments [ ... ] > >This is a problem on all distributed and all server architectures. >Even on a one-machine server architecture you need host software. Remeber to distinguish between a hardware server and a software server. again, you can build a software server dbms and run it on conventional hardware. btw, from what i can tell, most software vendors are implementing (software) servers (e.g., rti, sybase, oracle, ...). > >> [ ... ] >>5. Another thing. When doing performance comparisons, it is important to >>compare apples and apples. Numbers ought to be $'s/xact (guess what, a >>program on an ibm 3094 is faster than a vax!) and/or use identical hardware >>(2 processors are better than 1). i'm tired of seeing claims that a dbmachine >>is faster than a loaded central machine. of course it is, the central machine >>has other things to do. compare the performance/cost to buying a larger >>central machine or buying a second general purpose processor. > >This is odd. When you compare a loaded front end vs. a dbmachine, you >are comparing real-life usages. If you care about db speed, then you must >consider the typical work loads. Secondly, buying a bigger central >machine may solve a "raw" speed problem, but server architectures still >solve the problem of sensitivity to the host work-load. Most people >really do care if some host process causes db accesses to slow to a >crawl, or if db access causes other important (non-db) processes to >wimp out. A faster machine may get things going faster, but the >sensitivity is still there. > the point here is that without including cost/performance in your comparison everyone would buy the biggest machine(s) that ran their desired software. personally, i'd love to have a cray performance machine for my personal workstation. but, as we all know, it isn't practical. i agree that a ``back-end'' dbmachine may be the most cost effective and may offer substantial performance benefits. what i don't agree is that it is the only solution and that people should accept without questioning what their problem really is. for example, an equally plausible solution to a heavily loaded central machine is to off-load user-interface/application programs to a personal workstation. from what i can tell, people don't really do these comparisons. btw, another major advantage of all back-end dbservers is the ability to interconnect different host computers. for example, i think 50% of teradata's sales come from the fact that they are the only dbms available today that allows applications on VM and MVS to share access to a database. britton-lee has a similar advantage with pdp-11's and vaxes. over time, all software dbms's will offer similar features. > >>2. Distributed DBMS's are a big, big win. [ ... ] > >Why? I'm convinced that many of the people who THINK they want >distributed DBMS's REALLY need server architectures. See Jim Gray's >article in the May UNIX REVIEW. I won't elaborate here -- but >I would like to see a distribution vs. server discussion on the net. >For the record, I'm not "opposed" to distributed DBMS architectures, >but I DO think they're being oversold. There are job mixes that >make servers look bad, and there are job mixes that will make >distributed dbms's look bad. Mr. Natural says, "get the right tool >for the right job." > i haven't read jim's article, but knowing him and having read some tandem tech reports on the topic, i think i have some additional insights. first, application design for distributed applications is very, very hard. average people don't have the experience and the vendors products do not offer enough help yet to make it easy to define them. consequently, only pioneers and very brave people will attempt to build them. btw, tandem has only recently come out with an SQL interface to its distributed dbms offering. i'll be curious to see how much usage goes up now that end-users and mere humans can access the distributed databases. second, tandem's distributed dbms is a single-vendor hardware solution. when i visit companies and universities, senior managers say their number 1 problem is managing the diversity of hardware/software that proliferates through the organization. a distributed heterogenous dbms can cover up this diversity and give people control again of their corporate data. the complete solution to this will take years to achieve, but from my discussions, people really want it. also, some new application growth areas (e.g., factory autmation) insist on distributed dbms's. so, i stand by my statement. btw bill, what happens to a britton-lee customer who's bought 5 machines and now wants to query data that is spread across the machines? do they have to copy it by hand to one machine and run the query? all a distributed dbms does, is make that operation easier.
eric@hippo.UUCP (Eric Bergan) (06/24/87)
In article <2918@zen.berkeley.edu>, larry@ingres.Berkeley.EDU (Larry Rowe) writes: > In article <2861@blia.BLI.COM> billc@blia.BLI.COM (Bill Coffin) writes: > >>From larry@ingres.Berkeley.EDU (Larry Rowe) Wed Jun 17 09:19:02 1987 > >>2. Distributed DBMS's are a big, big win. [ ... ] > > > >Why? I'm convinced that many of the people who THINK they want > >distributed DBMS's REALLY need server architectures. See Jim Gray's > >article in the May UNIX REVIEW. I won't elaborate here -- but > >I would like to see a distribution vs. server discussion on the net. > > i haven't read jim's article, but knowing him and having read some tandem > tech reports on the topic, i think i have some additional insights. first, > application design for distributed applications is very, very hard. average > people don't have the experience and the vendors products do not offer enough > help yet to make it easy to define them. consequently, only pioneers and > very brave people will attempt to build them. btw, tandem has only recently > come out with an SQL interface to its distributed dbms offering. i'll be > curious to see how much usage goes up now that end-users and mere humans > can access the distributed databases. > > second, tandem's distributed dbms is a single-vendor hardware solution. > when i visit companies and universities, senior managers say their number > 1 problem is managing the diversity of hardware/software that proliferates > through the organization. a distributed heterogenous dbms can cover up > this diversity and give people control again of their corporate data. the > complete solution to this will take years to achieve, but from my discussions, > people really want it. also, some new application growth areas (e.g., > factory autmation) insist on distributed dbms's. so, i stand by my statement. I think for this argument (and particularly for Gray's papers, both in Unix Review, and also in the June, 1986 issue of IEEE Transactions on Software Engineering), it is important to distinguish two very different uses of relational databases. The first is what most of the database products have been initially used for - ad hoc queries against a database, where the number of queries far exceeds the number of updates. Typically such applications are characterized by a relatively low number of transactions per second, but the transactions themselves are probably more complex - joins, aggregates, etc. The second is a much more transaction-oriented system, the classic case being airline reservations. Here, transactions tend to be simple, but the transaction rate is much higher. I think Gray's comments are much more addressed to the transaction oriented distributed applications. In a transaction oriented system, transparency is much less important. At the time the application is written, the queries are determined, and it is possible to map out what servers have the data that is needed for a given transaction. There are almost no "ad hoc" queries which would require some kind of distributed optimizer to sort out at query time. Gray's point (which I think Larry is talking about), is that there are enough other headaches in a transaction oriented system just with the networking, and understanding the design of the application, without having to worry about how efficiently the database system decides to process the query. This is especially true in hooking up heterogeneous machines and databases. The chance of having an MVS VSAM file become a "transparent" part of a distributed relational database are pretty small. But it is feasible to hook up a server to it, that can participate in a distributed requester/server model. This does, of course, have the problem of having to change the application(s) if you decide to move the data partitioning around. I like the Sybase approach to this - namely the transactions are stored in the database itself, rather than in the applications. While you still have to change the transactions if you change the schema, at least they are all in one place, and the applications themselves do not have to be rebuilt. I think the real challenge for the database vendors will be how to interact with other database products - primarily non-relational ones. Surprisingly many of the corporate databases are under VSAM files or the like - very few are relational. Given that, trying to force relational semantics on these databases is going to be very difficult. While I believe that it is likely that someday, most of these corporate databases will convert to relational databases, I think that the transition time will be at least 10 years, maybe longer. The transition will happen as applications are replaced - not because they are converted. One final point in the distributed vs. single server discussion. Very few applications live in a vacuum (or if they do, it was because they were forced to). Almost all of them would like to be able to share data with other related applications that already exist, have their own database systems in place, and either work perfectly well, or would be too expensive to convert to something else. A single server model does not seem able to handle the economics (and sometimes politics) of such a case. A distributed system (not necessarily "transparent") does allow the new applications to share the data with a minimum (no?) impact on the existing applications. Bill - do you envision a single server approach also being desirable in the case of a geographically distributed system, where the sites are primarily autonomous, but some data replication and cross site queries are good? How would you design such a system? -- eric ...!ptsfa!hippo!eric
whwb@cgcha.UUCP (Hans W. Barz) (06/25/87)
In article <2891@zen.berkeley.edu>, larry@ingres.Berkeley.EDU (Larry Rowe) writes: > Several comments on the recent discussion of ``database machines.'' > > THE OTHER LINES HAVE BEEN REMOVED FOR READABILITY OF THIS NEWS Usually I do not add news in this group, since I have the impression that most of the partipicants are working only with small databases -- i.e. < 300 MB Some comments in a recent news on database machines -- see above -- did make me to write these lines since these databases are usefull for bigger applications: 1) The Teradata system -- name origins from TERABYTE -- is no competitive system to the Britton-Lee. The performance and price range is much above. 2) Distributed Database systems have usually a worse performance compared to non distributed systems. The only chance for a distributed database to gain performance is by partitioning a single job or query to multiple processors. This is only works currently for the Teradata. 3) The Teradata is not really a distributed database, since the communication in the Y-NET is synchronously while different processors are working asynchronously on the data -- this concept is not trivial and I cannot explain it completely in some lines. 4) I have done benchmarks on Teradata, DB2(3090), Britton-Lee(IDM) and Tandem-SQL. It is rather difficult to summarize that, but Teradata has the best price/ performance rate and delivers usually the best performance for the four above machines. H.W.Barz, WRZ, CIBA-GEIGY, CH
billc@blia.BLI.COM (Bill Coffin) (06/25/87)
>From larry@ingres.Berkeley.EDU (Larry Rowe) >>Even on a one-machine server architecture you need host software. > >Remeber to distinguish between a hardware server and a software server. >again, you can build a software server dbms and run it on conventional >hardware. Yes, I noted that. In fact, I think a s/w server will beat a hardware server in price/performance except at the high end (the people who will pay anything for the extra speed). However, an effective server needs to control its hardware. I'm saying that a server should have a whole machine to itself and should have a custom operating system (or an off-the-shelf OS that has been pacified). > btw, from what i can tell, most software vendors are implementing >(software) servers (e.g., rti, sybase, oracle, ...). I think there is a difference between a server (hardware or software) and a distributed dbms. Oracle and, I think, RTI are implementing distributed dbms's. Sybase is implementing a server. My point was that there is a proliferation problem no matter which architecture. Perhaps life is a bit simpler at BLI, since we don't have to port the dbms internals, but we still have to port the host software. There's no free lunch in this area. >the point here is that without including cost/performance in your comparison >everyone would buy the biggest machine(s) that ran their desired software. >personally, i'd love to have a cray performance machine for my personal >workstation. but, as we all know, it isn't practical. i agree that a >``back-end'' dbmachine may be the most cost effective and may offer substantial >performance benefits. what i don't agree is that it is the only solution >and that people should accept without questioning what their problem really >is. for example, an equally plausible solution to a heavily loaded central >machine is to off-load user-interface/application programs to a personal >workstation. from what i can tell, people don't really do these comparisons. I guess that I'm in agreement here. People need to analyze real-life situations when they choose a dbms configuration. Certainly a back-end isn't the only solution, and frequently it's a poor solution. btw, I think your alternative solution enhances the server argument -- why not offload the user-interfaces to bitmapped workstations (application servers) AND offload the dbms to a dbms server? And setup the old mainframe as a number-crunching server, and so on. It's nice to be able to hook up servers based on specialized abilities rather than put apples and pancakes in the same bag (ie: mutually antagonistic processes in the same machine). Anyway, comparing a back-end with an unloaded front-end is certainly not examining a real-life situation. >btw, another major advantage of all back-end dbservers is the ability to >interconnect different host computers. > [ ... ] britton-lee >has a similar advantage with pdp-11's and vaxes. Minor plug (here we go!): We also support data sharing between VM/CMS, Vax VMS, most major flavors of UNIX (SysV, BSD, Ultrix), MS-DOS, and a lot of other machines. > [ ... ] >second, tandem's distributed dbms is a single-vendor hardware solution. >when i visit companies and universities, senior managers say their number >1 problem is managing the diversity of hardware/software that proliferates >through the organization. a distributed heterogenous dbms can cover up >this diversity and give people control again of their corporate data. the >complete solution to this will take years to achieve, but from my discussions, >people really want it. also, some new application growth areas (e.g., >factory autmation) insist on distributed dbms's. so, i stand by my statement. OK, but it seems to me that heterogeneity and distribution are two different issues. I haven't yet seen any good solutions to the heterogeneity problem, just some talk. (Gray's article has some interesting observations on this.) Server architectures are well-understood and currently available. Mature heterogeneous/distributed dbms's are still in the future. Even whey they do mature, there will still be many situations in which a server is a clear win. >btw bill, what happens to a britton-lee customer who's bought 5 machines >and now wants to query data that is spread across the machines? do they have >to copy it by hand to one machine and run the query? all a distributed >dbms does, is make that operation easier. No, it's easier than that (now). However, it is not location transparent. OK, a distributed dbms is different from a server. They solve some similar problems, but they are not the same. I think it's going to be awhile before distributed dbms's can match the performance, reliability, and security of servers. The very nature of a server is that you don't worry about distributing data over many of them. If you do worry about that, then the server is probably not the solution you're looking for. (And, if you lick the heterogeneity problem then you can have N servers transparently within your distributed database.) -- W.H.Coffin. billc@blia.BLI.COM (ucbvax!{mtxinu|ucsfcgl}!blia!billc) >> the usual disclaimer about my employer and my wretched opinions. << >> the usual witticisms that swell netnews to ridiculous proportions. <<
billc@blia.BLI.COM (Bill Coffin) (06/25/87)
In article <131@hippo.UUCP>, eric@hippo.UUCP (Eric Bergan) writes: > Bill - do you envision a single server approach also being > desirable in the case of a geographically distributed system, where the > sites are primarily autonomous, but some data replication and cross > site queries are good? How would you design such a system? Of course not. This is the classic case where a distributed dbms is exactly what's wanted. I think there are many cases where you can change a few of those requirements and find that a server will do. For instance, salesmen who carry laptops and occasionally dial in updates or queries. Or systems where security is critical. Anyway, servers and distributed dbms's are not necessarily mutually exclusive. The classic distrib model has single machines in separate cities (this is the model shown on the teacher's blackboard when "distributed databases" is the day's topic). This is probably unrealistic; nodes on a distributed system could include several dissimilar LANs connected by long-haul lines and/or gateways. A LAN is a great place for a server. A distributed dbms could be built on top of this model, treating the whole LAN, via its server, as a single node in the distributed dbms. -- W.H.Coffin. billc@blia.BLI.COM (ucbvax!{mtxinu|ucsfcgl}!blia!billc) >> the usual disclaimer about my employer and my wretched opinions. << >> the usual witticisms that swell netnews to ridiculous proportions. <<
larry@ingres.Berkeley.EDU (Larry Rowe) (06/26/87)
In article <2877@blia.BLI.COM> billc@blia.BLI.COM (Bill Coffin) writes: >I think there is a difference between a server (hardware or software) and >a distributed dbms. Oracle and, I think, RTI are implementing >distributed dbms's. Sybase is implementing a server. > [yet another minor plug (YAMP)] rti currently sells a distributed dbms that runs on vax's under vms. it probably runs on some of the other systems too, but i can't keep up with them. oracle announced a distributed dbms at a big news conference in new york with projected delivery dates in late 86 early 87. in early 87 they ``withdrew the product.'' of course the press covered the first announcement but neglected to mention the second. so, it doesn't seem to matter whether a system actually works as advertised... sigh! on another point. rti did a joint study with one of their larger customers that built a database gateway to ibm/mvs databases. the architecture is: ingres/star (dist-dbms) / | \ / | \ / | \ / | \ ingres ingres ibm-gateway | | | db2/ims/vsam the user enters arbitrary sql queries to ingres/star which optimizes and executes the query. joins across machine boundaries worked. interestingly, the ibm-gateway used an ibm data extract product (DXT) to get data out of the databases on mvs. it worked faster than most folks expected so that joins across machine boundaries actually ran credibly. this system is not a product today, but i suspect rti and most relational system vendors will be deliverying similar products over the next couple of years. distributed databases and gateways to other data stores (file systems or data managers) will be very useful tools when they are widely available. i agree with bill that it will be a couple of years before these configurations are widely available and as reliable as single-site relationa systems are today. but, if your company/vendor isn't working on them today, you'll be significantly behind and struggling to catch up. larry
erics@cognos.uucp (Eric Schurr) (07/02/87)
> This does, of course, have the problem of having to change the >application(s) if you decide to move the data partitioning around. I >like the Sybase approach to this - namely the transactions are stored >in the database itself, rather than in the applications. While you still >have to change the transactions if you change the schema, at least they >are all in one place, and the applications themselves do not have >to be rebuilt. > This statment intrigues me. I don't know anything about SyBase--do they allow you to model/define *transactions*? Is this simply referring to table (file/record) definintions or to the much broader--and more complicated--notion of a transaction? What mechanism do they use to define and report this? -- Eric Schurr 3755 Riverside Dr. Cognos Incorporated Ottawa, Ontario decvax!utzoo!dciem! (613) 738-1440 CANADA K1G 3N3 nrcaer!cognos!erics
markh@rtech.UUCP (Mark Hanner) (07/06/87)
In article <1026@sirius.UUCP> erics@cognos.UUCP (Eric Schurr) writes: >In article <131@hippo.UUCP> eric@hippo.UUCP (Eric Bergan) writes: >> This does, of course, have the problem of having to change the >>application(s) if you decide to move the data partitioning around. I >>like the Sybase approach to this - namely the transactions are stored >>in the database itself, rather than in the applications. While you still >>have to change the transactions if you change the schema, at least they >>are all in one place, and the applications themselves do not have >>to be rebuilt. you can go one step further and prevent having to alter your applications at all when redistributing data in a distributed database: have the query language use a distributed database catalog for defining "aliases" for the data locations. in ingres/star, there is a catalog which contains the "alias", and the associated node/database/table name information required to locate the data. thus, the following statement in an application: select * from staff; does not need to be changed only because the database administrator moved the data table: create temporary link oldstaff with node = corp, database = personnel, table = staff; create table staff as select * from oldstaff with node = newcorp, database = newpersonnel, table = staff; this brings up the general problem of maintaining large systems with hundreds or thousands of applications in various stages of the product life cycle. the more difficult it is for the system administrators to tune their systems (including load balancing, where distributed database provides some unique opportunities), the slower those systems will run. applications ideally should be built without respect to performance, but allow the system administrator to tune the performance of applications to match real system loads through techniques such as changing storage structures, adding indexes, moving data to different disks or nodes, etc. as networks get more complex and user demand more distributed data capability ("i know it's on the ibm mainframe, but i want to be able to use it with data on my department's vax"), giving this flexibility to the system administrator will be essential. >This statment intrigues me. I don't know anything about SyBase--do >they allow you to model/define *transactions*? Is this simply referring >to table (file/record) definintions or to the much broader--and more >complicated--notion of a transaction? What mechanism do they use to >define and report this? > yes, sybase does store transactions in the database: define transaction getpayroll as select s.name,s.salary*t.hours from staff s, timecard t where s.name = t.name and s.type="HOURLY"; and then use: exec sql getpayroll; in an application to run the query. the next step is to allow the definition of assertions (referential integrity) and asynchronus actions ("whenever reactor_coolant < 100 then update valves (coolant = coolant + 3)"). making these features work will depend greatly upon how much application development support is provided to help manage the proliferation of application objects stored in databases. the world is just now beginning to agree on sql as a standard for interaction with relational databases. but CASE and other major advances in application development technology are begging for standards in data dictionaries and display environments, where there are multitudes of feuding factions. this discussion may belong in comp.case, but a database system is useless unless its easy to build applications with it... cheers, mark -- markh@rtech.UUCP ucbvax!mtxinu!rtech!markh "someone else was using my login to express the above opinions..."
bradbury@oracle.UUCP (Robert Bradbury) (07/16/87)
In article <2943@zen.berkeley.edu>, larry@ingres.Berkeley.EDU (Larry Rowe) writes: > [yet another minor plug (YAMP)] rti currently sells a distributed dbms > that runs on vax's under vms. it probably runs on some of the other > systems too, but i can't keep up with them. oracle announced a distributed > dbms at a big news conference in new york with projected delivery > dates in late 86 early 87. in early 87 they ``withdrew the product.'' > of course the press covered the first announcement but neglected to mention > the second. so, it doesn't seem to matter whether a system actually works > as advertised... sigh! > Oracle version 5.1 does support distributed access to hemogeneous and heterogeneous machines. The big stumbling block which caused the delay was building enough network interfaces to make the product really useful. The database stuff is relatively simple compared to the variety of network interfaces required: VMS: (DECNET,TCP/IP(Excelan,Wollongong),Async); IBM: (3270,DECNET,VTAM,TCP/IP); PC: (3270,DECNET,TCP/IP,Async); UNIX: (Async,TCP/IP). The VMS production release with DECNET support has been around for months. The UNIX production releases of 5.1 with TCP/IP support should be available next month for: 3B2, 3B5, 3B20, Sun, Apollo, Ultrix, Sequent and Xenix. The PC and mainframe products with network support should see the light of day in the Sept-Oct time frame. A simple calculation of the number of machines on which Oracle runs and the number of network interfaces possible on those machines indicates that there are 10's (perhaps 100's) of different machine/network combinations for which code must be written. (Opinion: vendors are going to fritter away man years interfacing to networks unless an IEEE/ANSI committee adopts a standard interface). > > Comments about Ingres to DB2 interface. > > this system is not a product today, but i suspect rti and most relational > system vendors will be deliverying similar products over the next couple > of years. > Oracle's interface to DB2 (SQL*CONNECT) is currently in alpha testing at a major customer site. It should be generally available before the end of the year. > distributed databases and gateways to other data stores (file systems > or data managers) will be very useful tools when they are widely available. > i agree with bill that it will be a couple of years before these configs > are widely available and as reliable as single-site relationa systems are > today. but, if your company/vendor isn't working on them today, you'll > be significantly behind and struggling to catch up. While the interface between a RDBMS and DB2 is fairly straight forward interfaces to IMS and ISAM files are less so. We estimate a good interface to IMS (including 2 phase commit and transaction recovery) to be a 10+ man-year project. An interface to ISAM files is simpler (perhaps 6 man-months) but requires alot of user "interfacing" due to the lack of a data dictionary. It isn't clear that RDBMS and hierarchical/flat-file interfaces will ever be useful for anything other than retrievals due to the matching problems in transaction and locking models. As always none of the above should be construed as a commitment by Oracle. The dates are however from the product managers and should be accurate. -- Robert Bradbury Oracle Corporation (206) 364-1442 hplabs!oracle!bradbury
mcclure@tut.cis.ohio-state.edu (James Edward McClure) (09/14/88)
Could someone please tell me exactly what a data base machine is and how if differs from a DBMS (besides being hardware based)? Thanks for your help!!
andy@garfield (Andy Lowry) (09/15/88)
In article <21755@tut.cis.ohio-state.edu> mcclure@tut.cis.ohio-state.edu (James Edward McClure) writes: > > Could someone please tell me exactly what a data base machine is and >how if differs from a DBMS (besides being hardware based)? > >Thanks for your help!! A database machine is not necessarily hardware based (in the sense of containing hardware that was specially designed for database processing). The term has been used in many different ways, but generally I think they all share the feature that there is some piece of hardware that is dedicated to some or all of the task of database processing. That could be anything from a general purpose computer that isn't used for anything but database processing (in which case there is probably a severely stripped-down low-overhead operating system supporting the database software) to a highly engineered collection of special-purpose hardware like associative memories and hardware sorters. The system configuration could be such that the database machine sits as a self-contained unit serving as a back-end processor providing complete database services to one or more hosts, or it could simply mean that there are some smart peripherals like disks with on-the-fly filters on their heads to make the disk behave associatively. There have been over a hundred of proposals since the late 1960's for database machines of varying scope and complexity, and many of these machines have been prototyped. A few have even been offered commercially. A book by Stanley Su, just published this year, gives the most comprehensive survey of the area that I have encountered. Here's the reference, a la Scribe: @book(su88a, key="Su", author="Stanley Y.W. Su", title="Database Computers: Principles, Architectures, and Techniques", publisher="McGraw-Hill", address="New York", year="1988") If you'd like something a little less ambitious, I wrote a 30-page survey this past spring titled "Synchronization, Communication and I/O Factors in Database Machine Performance" that I would be glad to send you (or anybody else). It does not describe the machines it covers in great detail, but rather explores the problems mentioned in the title and the ways various designs have attempted to overcome them. The bibliography will also point you to many detailed papers and some other good surveys. -Andy Lowry
dberg@cod.NOSC.MIL (David I. Berg) (09/15/88)
In article <21755@tut.cis.ohio-state.edu>, mcclure@tut.cis.ohio-state.edu (James Edward McClure) writes: > > Could someone please tell me exactly what a data base machine is.... A database machine (DBM) is a unit of hardware with a hard coded DBMS in its firmware and some amount of high-speed disk space to store the data base. It is usually used as a back-end processor to offload the processing of data base queries and I/O to the data base from the remote computer(s). One or more remote computers can be connected to it directly or via a network. Queries are formulated remotely and sent to the DBM for processing; results are then directed back to the source of the query. -- David I. Berg (dberg@nosc.mil) GENISYS Information Systems, Inc., 4250 Pacific Hwy #118, San Diego, CA 92110 MILNET: dberg@nosc.mil UUCP: {ihnp4 akgua decvax dcdwest ucbvax}!sdcsvax!noscvax!dberg
mike@blipyramid.BLI.COM (Mike Ubell) (09/16/88)
By my definition a database machine is a system that has been architected specificly for database management tasks. It may contain specialized hardware or general hardware components in a system with special architectual features to support dbms tasks. The machine will include software to perform the dbms tasks. The two companies who have been selling database machines the longest, Teradata and Britton Lee, both use standard processors in their current offerings with some specialized hardware in a total system architected for DBMS. Teradata has special interconnection bus that connects many specialized 80x86 processor boards (up to 1k I believe). The bus the patented Y-bus that acutally has active compenents that can do data merging and I believe some concurancy control. Britton Lee provides a family of systems with two or more specialized z8000 processor boards and an optional Data Base Accelerator which is a custom logic search engine. Our newest product uses a custom processor plus 68020 based I/O processors connected to a large shared memory. There have been many proposed, and some built, very specialized dbms processors in the literature. (The foregoing is not intended as a sales pitch, sorry if it comes across as such).
DMasterson@cup.portal.com (09/16/88)
> Could someone please tell me exactly what a data base machine is and >how if differs from a DBMS (besides being hardware based)? > In general, there is little difference. The concept is becoming blurred. Typically, a database machine is a smart machine to which database requests can be passed (typically in an SQL/Quel like language) and it will process the query and return the relevant rows from the database. This assumes, of course, a relational database machine which need not be the case. There have been textual database machines -- even a file server might be considered a database machine! Originally, database systems that implemented a front-end, back-end mechanism on two different machines was considered a database machine. Now, however, a lot of the database systems that grew up as software-only database systems are going that route (Ingres/Star, Informix-Turbo, Oracle(?), Sybase), so the distinction has definitely become blurred in the commercial sense. David Masterson DMasterson@cup.portal.com
sysop@stech.UUCP (Jan Harrington) (09/21/88)
in article <21755@tut.cis.ohio-state.edu>, mcclure@tut.cis.ohio-state.edu (James Edward McClure) says: > > > Could someone please tell me exactly what a data base machine is and > how if differs from a DBMS (besides being hardware based)? > > Thanks for your help!! There are a number of definitions of a database machine, though the one most commonly used is a computer (usually a mini) set-up as a slave to a host computer which is dedicated to database processing. The database machine usually runs a standard DBMS. The database machine offloads much of the database processing from the host computer. Since database processing is often CPU bound (lot of work is required to translate a users logical requests for data into physical storage locations), having a database machine can speed operations, since the host computer can be doing other things besides address translations. People seem to feel that a database machine increases throughput. Does that help? Jan Harrington, sysop Scholastech Telecommunications UUCP: husc6!amcad!stech!sysop or allegra!stech!sysop BITNET: JHARRY@BENTLEY ******************************************************************************** Miscellaneous profundity: "No matter where you go, there you are." Buckaroo Banzai ********************************************************************************
roger@esquire.UUCP (Ro Reid) (09/22/88)
in article <21755@tut.cis.ohio-state.edu>, mcclure@tut.cis.ohio-state.edu (James Edward McClure) says: > > Could someone please tell me exactly what a data base machine is and > how if differs from a DBMS (besides being hardware based)? > > Thanks for your help!! Speed (fast as hell, especially on complex queries) Cost (more than a software-based DBMS by quite a factor) Number of things that can go wrong (2-3 times more than a software DBMS resident on the host) Elaboration: At one time the ONLY way to get serious speed without completely swamping your host was to have a separate box to offload the database work to. This box has it's own hardware and software specifically designed to be fast at RDB type things, instead of being general purpose. The drawbacks were always there: more hardware to fail, a network (or some sort of interface) is involved and can fail, and lack of portability: in a Unix shop, you can move your software to the newest, latest fastest box, regardless of vendor. A hardware database machine is about a proprietary a beast as there is. Things have changed since we bought our first database machine 7 years ago. The 11/70 is no longer the workhorse of the Unix world, there are some fast, relatively cheap boxes out there and they keep getting faster and faster. So now many applications can afford to use the more generalized hardware to take advantage of the speed gains as they are developed, even if you maintain the separation between the front end and the back end, and basically make yourself a database machine by loading a software-based DBMS on one box and using another box as host. My experience is that right now, there are some very good software-based DBMS's out there that can negate the need for a specialized database machine. This holds until you start doing things above a certain level of complexity. For example, we found a software DBMS that was faster than our database machine, until we hit it with 5,6,7 way joins. And then the software system went to hell in a bitbucket, while the hardware system continued to be fast. We also found that software systems didn't do so well when the conditions for the retrieve got real complex and nasty. These are by no means final benchmark figures on all software systems, which is one reason I'm not naming names. But if you are not going to torture a database system the way we do, you probably should stick to software. It's when you find that there is no software out there that can do the job for you that you have to start considering database machines. We still have hopes that we can move to a software based system, if we can find one that can get the work done in the available CPU cycles for us. If you want architectural type details, you might contact BrittonLee; they might be glad to fill you in. I've had it explained to me before and it's amazing the things they do to make that sucker hum! -- Ro Reid {rutgers|phri|cucard}!cmcl2!esquire!roger uunet!esquire!roger roger@woof.columbia.edu "...to understand is always an ascending movement; this is why comprehension ought always to be concrete. (one is never got out of the cave, one comes out of it.)" -Simone Weil, First and Last Notebooks
kfw@ecrcvax.UUCP (Fai Wong) (10/09/89)
Hi, everybody, I am seeking for information on commercial database machines. I've read about the Britton-Lee IDM, the Intel iDBP and the Teradata DBC machines. The information I have describing these machines are out-of-date (ca. 1985). Could someone tell me if these companies still exist ? If they do, how could I contact them ? I would also appreciate any information on other existing commercial database machines. Many thanks in advace. Cheers, Kam-Fai.