tomr@ashtate (Tom Rombouts) (08/01/90)
Are relational databases an unproven technology regarding performance? Just to keep this group lively, here are some unauthorized excerpts from an article titled "Rude Awakening" on page 23 of the July 30, 1990 Computerworld: "A report by a British consultancy is throwing some cold water in the face of the primarily U.S. relational database management system industry. The report, titled 'Database, an Evaluation and Comparision,' attempts to sort out the often misleading claims of DBMS vendors - and to make some general statements about the use- fulness of RDBMS technology itself. < background material deleted > ....A key tenet of the report is that RDBMS technology has been available for 20 years but still has not been proved in large, complex applications. The report notes that users associate these products with poor system performance, even though they may be flexible and easier to implement." The article then goes on to cite a firm that is reluctant to replace IMS with DB2, and discusses other sites that use a mixture of relational and possibly non-relational systems. Please, let's not start a war here telling each other how wonderful relational technology is. Just wanted to bring this to the attention of this group. Maybe others out there are more familiar with the original British report. Maybe others have ideas on strategies to prove that "relational" does not have to mean "overhead." (Or does it? :-) ) I now stand back.... Tom Rombouts Torrance Techie tomr@ashtate.A-T.com V:(213)538-7108 DISCLAIMER: The above posting is intended to be informational only and should not reflect my opinions or those of any known corporation.
tim@ohday.sybase.com (Tim Wood) (08/02/90)
In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes: >Are relational databases an unproven technology regarding >performance? > >....A key tenet of the report is that RDBMS technology has been >available for 20 years but still has not been proved in large, >complex applications. The report notes that users associate >these products with poor system performance, even though they may >be flexible and easier to implement. The article then goes on >to cite a firm that is reluctant to replace IMS with DB2, and >discusses other sites that use a mixture of relational and >possibly non-relational systems. There isn't really enough information in your excerpt to comment on the article. The question, "Relational, yes or no?" is becoming less and less germaine. The question most important to most users is "Will widget W solve my problem, P?" Relational systems have so far been deployed in smaller-scale applications than have hierarchical and network systems. This is due to several factors: relational is "newer" (that is, the technology existed long before successful commercial products) and the older database architectures were deployed in the days when nearly all commercial computing resources were centralized and operating in a batch-processing environment. In that environment, updates and access to the database are relatively rigidly controlled. The appeal of relational systems has been the promise of flexible access to the database by users far removed from the DP department. The trend toward decentralized access has been strengthened by the growth of processing power directly available to individual users, and by the changing nature of applications themselves. Relational systems lend themselves well to distributed database, where by definition there will be fewer, if any, centralized points of transaction processing activity. So, an individual site in the distributed relational database may look "slow" compared to IMS on an IBM-MVS 3090, but the aggregate throughput of the networked database can be prodigious. This is not to say that there is some theoretical limit to the performance of individual relational database engines. Indeed, a major focus of the industry now is to develop local transaction processing speeds that rival those of older architectures on a platform of similar scale. Today's technology is proving (already has, actually) that the assertion that relational is slow is out-of-date. What's more, relational technology is solving the problems of distributed applications better than the older architectures, at transaction speeds that are so far adequate for most applications. A recent Digital Review survey asked relational users about their throughput requirements. They found that about 90% of applications required no more than about 12TPS. This TPS number will surely increase, as will the ability of relational systems to carry more load. Organizational reluctance to replace existing non-relational systems is now very understandable. Replacements will occur as the economic benefits of the distributed high-performance relational model increasingly outweigh the costs of changing. Actual replacements will be preceded by gradual integration of RDBMS into the organization's DP framework. It is important for relational products to allow connection with existing heterogenous systems, rather than requiring their replacement. -TW --- Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608 415-596-3500 tim@sybase.com {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim One day, when I can afford enough lawyers, I will speak for a whole company. For now, I speak just for myself.
swfc@ulysses.att.com (Shu-Wie F Chen) (08/04/90)
In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: |>In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes: |>>Are relational databases an unproven technology regarding |>>performance? |>> |>>....A key tenet of the report is that RDBMS technology has been |>>available for 20 years but still has not been proved in large, |>>complex applications. The report notes that users associate |>>these products with poor system performance, even though they may |>>be flexible and easier to implement. The article then goes on |>>to cite a firm that is reluctant to replace IMS with DB2, and |>>discusses other sites that use a mixture of relational and |>>possibly non-relational systems. |> [some deleted stuff] |> |>Relational systems have so far been deployed in smaller-scale |>applications than have hierarchical and network systems. This is due |>to several factors: relational is "newer" (that is, the technology |>existed long before successful commercial products) and the older |>database architectures were deployed in the days when nearly all |>commercial computing resources were centralized and operating in a |>batch-processing environment. In that environment, updates and access |>to the database are relatively rigidly controlled. I don't see how these reasons (which are not incorrect) explain why relational systems have so far only been deployed in smaller-scale applications. |> |>The appeal of relational systems has been the promise of flexible |>access to the database by users far removed from the DP department. RDBMSs have made two contributions: 1. non-procedural access 2. data independence I don't see what relational systems have to do with "the promise of flexible access ... far removed from the DP department." Are you implying that network communication or client/server is restricted to relational systems? |>The trend toward decentralized access has been strengthened by the |>growth of processing power directly available to individual users, and |>by the changing nature of applications themselves. Hmmm. Last week I spoke with a Sybase tech support person who said that Sybase's client/server architecture was geared toward having most of the computation performed at the server end. My response was "How about all that CPU power directly available to the user?" It seems that Sybase feels that database computation should not be done at the client end(I read this as personal workstation) because it would take away CPU cycles for editing, reading news, etc. They believe they can overcome the CPU bottleneck at the server end. This seems to contradict the above statement by Tim (who works for Sybase). [sorry for this digression, but Tim's position (which I agree with) seems to differ from that of his company's] |>Relational systems lend themselves well to distributed database, where |>by definition there will be fewer, if any, centralized points of ^^^^^^^^^^^^^ Huh? What definition? I think relational systems lend themselves well to distributed databases because they are set-oriented, rather than navigational systems like the hierarchical and network models. You can think in terms of sets of tuples coming from each site instead of thinking on the level of individual records. |>transaction processing activity. So, an individual site in the |>distributed relational database may look "slow" compared to IMS on an |>IBM-MVS 3090, but the aggregate throughput of the networked database |>can be prodigious. Is this an argument for throughput over response time? From the user's point of view, it is much easier to gauge response time. |> |>This is not to say that there is some theoretical limit to the |>performance of individual relational database engines. Indeed, a major |>focus of the industry now is to develop local transaction processing |>speeds that rival those of older architectures on a platform of similar |>scale. Today's technology is proving (already has, actually) that the |>assertion that relational is slow is out-of-date. What's more, I think that that assertion was proven incorrect about 10-15 years ago. |>relational technology is solving the problems of distributed |>applications better than the older architectures, at transaction speeds |>that are so far adequate for most applications. A recent Digital |>Review survey asked relational users about their throughput |>requirements. They found that about 90% of applications required no |>more than about 12TPS. This TPS number will surely increase, as will |>the ability of relational systems to carry more load. The figure 12TPS by itself is meaningless. How many users, what architecture, etc. should accompany any figures. Sybase claims 34 TPS for 30(?) users on a Sun-4. What do other vendors claim? |> |>Organizational reluctance to replace existing non-relational systems |>is now very understandable. Replacements will occur as the economic RDBMSs have their benefits. Non-RDBMSs have their benefits. Though it is true that RDBMSs are not as slow as anti-RDBMSers (of the great debate at SIGMOD in the 70's) claimed them to be, they still do not match the performance of navigational systems like IMS. One of the reasons that many corporations have not moved from IMS to relational systems is for this exact reason. 12TPS may be acceptable to relational users, but it surely isn't for IMS users. |>benefits of the distributed high-performance relational model increasingly ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Where? Commercial RDBMS vendors claim high-performance. What they are really claiming is that their RDBMS performs faster than the competition. 1000TPS is high-performance. 12TPS (or 34) is acceptable. |>outweigh the costs of changing. Actual replacements will be preceded |>by gradual integration of RDBMS into the organization's DP framework. |>It is important for relational products to allow connection with |>existing heterogenous systems, rather than requiring their replacement. As I stated earlier, the two major contributions of the relational model have been non-procedural access and data independence. However, the implementation to provide these features will incur overhead that navigational systems (like hierarchical and network) do not have to pay for. For instance, joins are a real big performance killer for relational systems. So there is some substance behind the users associating relational "... products with poor system performance, even though they may be flexible and easier to implement."[from the original posting on the British report] As Tom [the original poster] suggested, let's not start a war telling each other how wonderful relational technology is. But to answer Tom's question on whether "relational" has to mean "overhead": Relational does not mean overhead, but since it provides more "features" (flexible, easier to implement, easier to use(?)), some overhead *must* be incurred. I think a good discussion would be over where the overheads are. For starters, relational query compilation has to be smarter. But they may not (never?) be smart enough!?! Flames to /dev/null Disussion to comp.databases *swfc
tim@ohday.sybase.com (Tim Wood) (08/06/90)
In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: >In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: >|> >|>Relational systems have so far been deployed in smaller-scale >|>applications than have hierarchical and network systems. > >I don't see ... why >relational systems have so far only been deployed in smaller-scale >applications. What I'm driving at is that relational has not developed in a DP/MIS context, and DP/MIS is where most of the large-scale business applications have traditionally resided. Relational is the architecture of choice for the "bottom-up" development of organizational databases, where local DP departments are creating relational databases to manager their local operations, and looking for ways to tie all those local databases together. >|>The appeal of relational systems has been the promise of flexible >|>access to the database by users far removed from the DP department. > >RDBMSs have made two contributions: >1. non-procedural access >2. data independence True, but most users running canned applications won't be as aware of these features as applications programmers, who beenfit most from them. I was really only discussing the end-users, since they are the largest group of database utilizers in an organization. Your comment is correct and rounds out my point. >I don't see what relational systems have to do with "the promise of >flexible access ... far removed from the DP department." Are you >implying that network communication or client/server is restricted to >relational systems? No, but relational seems to be the context in which client/server is most rapidly being deployed. I do think it's easier to distribute a relational db than a naviagational (thanks, good adjective) one, because of the looser coupling among data objects. >|>The trend toward decentralized access has been strengthened by the >|>growth of processing power directly available to individual users, and >|>by the changing nature of applications themselves. > >Hmmm. Last week I spoke with a Sybase tech support person who said that >Sybase's client/server architecture was geared toward having most of the >computation performed at the server end. My response was "How about all >that CPU power directly available to the user?" Those MIPS are used for the applications. That local power makes it economical to perform complicated analysis and transformations on the data. Basically, the server preserves and disseminates existing knowledge, but new knowledge is created on the front-end. The front-end then submits that new knowledge to the server, which may reject it because the knowledge does not fit the world model known to the server (in slogan-speak, this is "DBMS enforced integrity"). Or the server accepts it, and the whole organization becomes "smarter." This is still a relatively new concept feature of products, an improvement over the case where each application has to apply the model. >It seems that Sybase >feels that database computation should not be done at the client end... >They believe they can overcome the CPU bottleneck at the server end... At this point, "database computation" is too vague a term to allow a response. >|>Relational systems lend themselves well to distributed database, where >|>by definition there will be fewer, if any, centralized [servers] > >Huh? What definition? If a database is distributed, then the database state is maintained by more than one server. The limiting case is where every machine on the network is of similar size and maintains an equal part of the database. A more likely scenario is a server hierarchy, such as in telephone exchanges. >I think relational systems lend themselves well to distributed databases >because they are set-oriented, rather than navigational systems like the >hierarchical and network models. That's what I was driving at. Thanks for the words beyond "so many words". >|>... the aggregate throughput of the networked database can be prodigious. > >Is this an argument for throughput over response time? From the user's >point of view, it is much easier to gauge response time. Distributed balances both. It's analogous to caching, or virtual memory in that you have a small, frequently-used subset of the database that is local, so average response times are close to what they would be if the entire database was local on a behemoth machine. Yet the whole database might be so large that it would take a buildingful of 3090's (or clones :-) to hold it all locally. I am speaking in generality here, much design and measurement must go into one's distributed db schema so that average response time is good and worst case not awful. Probably beyond the state of the practice today. Maybe one reason why distributed is slow catching on. >|>... Today's technology is proving (already has, actually) that the >|>assertion that relational is slow is out-of-date. What's more, > >I think that that assertion was proven incorrect about 10-15 years ago. It was proven that RDBMS COULD be as fast as existing navigational systems, but there haven't been competitive products till recently. "Proof" for many folks requires no less than a released (or announced :-) product. >|>... A recent Digital >|>Review survey asked relational users about their throughput >|>requirements. They found that about 90% of applications required no >|>more than about 12TPS. > >The figure 12TPS by itself is meaningless. How many users, what >architecture, etc. should accompany any figures. Sybase claims 34 TPS >for 30(?) users on a Sun-4. What do other vendors claim? Twelve TPS as measured at the server. So as you pile on users, response time will tank (ie go up). I think the survey intended an implicit clause, "with acceptable user response time." >.... One of the >reasons that many corporations have not moved from IMS to relational >systems is [unacceptable performance]. 12TPS may be acceptable to relational >users, but it surely isn't for IMS users. 'Cuss not. For large DP hardware, you'd better be talking well into the 100's. To handle volume, an RDBMS product has to scale with hardware. >... 1000TPS is high-performance. 12TPS (or 34) is acceptable. Show me someone getting 1000TPS on a Sun 3/280. What must not exist in a product is a performance ceiling above which throughput stops growing (linerarly) with increase in platform scale. That's the essence of the perceived "relational bug." >... [J]oins are a real big performance killer for relational systems. Not if they are pre-optimized or pre-computed. >So there is some substance behind the users >associating relational "... products with poor system performance, even >though they may be flexible and easier to implement."[from the original >posting on the British report] Sure, the substance is based on historical knowledge. That knowledge is being obsoleted by the onset of RDBMSs that scale well. I believe users will be able to have both DP-scale performance and ease of use in RDBMS in the near future. >But to answer Tom's question on whether "relational" has to mean "overhead": >Relational does not mean overhead, but since it provides more "features" >(flexible, easier to implement, easier to use(?)), some overhead *must* >be incurred. The question is where to place that overhead. That's one problem we (Sybase anyway) are trying to solve. >I think a good discussion would be over where the overheads are. For >starters, relational query compilation has to be smarter. Hmm, I've been developing the opinion that query compilation is a largely solved problem (cost-based optimizers, etc.), but that fundamental things like I/O management and access methods policies need a lot more work in RDBMS. So sounds like we have a good discussion ahead of us :-) . -TW --- Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608 415-596-3500 tim@sybase.com {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim One day, when I can afford enough lawyers, I will speak for a whole company. For now, I speak just for myself.
dafuller@sequent.UUCP (David Fuller) (08/07/90)
In article <10371@sybase.sybase.com> tim@ohday.sybase.com (Tim Wood) writes: >In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes: >>Are relational databases an unproven technology regarding >>performance? >> >>....A key tenet of the report is that RDBMS technology has been >>available for 20 years but still has not been proved in large, >>complex applications. The report notes that users associate >>these products with poor system performance, even though they may >>be flexible and easier to implement. The article then goes on >>to cite a firm that is reluctant to replace IMS with DB2, and >>discusses other sites that use a mixture of relational and >>possibly non-relational systems. Some random thoughts from someone who's in the trenches and deals with less than, uhh, theoretical arguments... In my experience with Very Large Databases, the DBMS type is less important than the quality of the individual system's implementation. The 10% of the time you spend developing is quickly subsumed by the requirement to plan for and provide a stable applications environment. To wit: The typical SQL-based RDBMS is abstract enough from what's going on down deep to permit gross errors in implementation. I've looked at systems which fetched 100,000 records and threw away every one except the single tuple of interest. The fact that it was an RDBMS was irrelevant. You coulda been doing IMS or FOCUS and made that mistake. Axiom 1: There is no substitute for planning. > >Relational systems have so far been deployed in smaller-scale >applications than have hierarchical and network systems. This is due >to several factors: relational is "newer" (that is, the technology >existed long before successful commercial products) and the older >database architectures were deployed in the days when nearly all >commercial computing resources were centralized and operating in a >batch-processing environment. In that environment, updates and access >to the database are relatively rigidly controlled. Sure, relational is "new", but the basic access methods have not considerably improved; we still use B-trees and relative files and maybe hashed files. The "relational" aspect is a layer above this. I can write slow code in any environment; and there is nothing inherent to the relational model which makes it slower than any other model. The fact is that noble 3NF implementations almost always get mutated by harsh reality: that you end up generating "extract" tables and other de-facto optimizations once you do a simple calculation of how many I/Os it's gonna take to support your subsecond, online application. That's reality; you can either spend money for hardware or take a hardnosed approach to implementation. Second, the biggest horror to big DBMS DBAs is the unknown called "ad-hoc queries". It is easy to hurt a production system on many platforms by issuing queries from hell that can't ever complete but require massive sequential scans. Big DBMS engines usually have strict controls on adhocery and either prioritize them low or require they complete in batch. In fact, lots of big systems do overnite extracts and provide an online system to promote decision support. Rarely do these systems permit queries to "live" data simply because supporting the surge load caused by adhoc in current implementations costs too much money. Axiom #2: Ad-hoc means unpredictable, which represents a basic incongruety against the goal of production. No current DBMS or implementation knows how to balance the two in a truly large implementation automatically. (I have not seen the DBMS yet that sends me mail and counsels "Dave, I've been reviewing access patterns and I really think you should consider a clustered index...") ... In conclusion: 1) There's no free lunch. Until we find a more expressive mechanism for revealing the intent of the user to the DBMS then we're going to live with controls over what a particular user can do. We need to be able to control plowing of new furrows thru a DBMS carefully versus handling heads-down data entry with predictable speed. 2) Experience at Tandem shows that a true SQL RDBMS doesn't have to be slower, in fact the State of California has committed to NonStop SQL for their entire vehicle database based on some strenuous benchmarks. 3) We are a long ways away from creating DBMS systems into which data can be poured and the relied on to balance access and update needs. No matter what your implementation, it will take intelligence and forethought to create a successful implementation. Speaking for myself, as always... -- Dave Fuller Sequent Computer Systems Think of this as the hyper-signature. (312) 318-0050 (humans) It means all things to all people. {uunet,sun,...}!sequent!dafuller
swfc@ulysses.att.com (Shu-Wie F Chen) (08/07/90)
In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: |>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: |>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: |>>|> |>>|>Relational systems have so far been deployed in smaller-scale |>>|>applications than have hierarchical and network systems. |>> |>>I don't see ... why |>>relational systems have so far only been deployed in smaller-scale |>>applications. |> |>What I'm driving at is that relational has not developed in a DP/MIS |>context, and DP/MIS is where most of the large-scale business applications |>have traditionally resided. Relational is the architecture of choice for |>the "bottom-up" development of organizational databases, where local DP |>departments are creating relational databases to manager their local |>operations, and looking for ways to tie all those local databases |>together. |> Yes, relational databases are easier to implement (from a DBA's point of view). |>>|>The appeal of relational systems has been the promise of flexible |>>|>access to the database by users far removed from the DP department. |>> |>>RDBMSs have made two contributions: |>>1. non-procedural access |>>2. data independence |> |>True, but most users running canned applications won't be as aware of |>these features as applications programmers, who beenfit most from them. |>I was really only discussing the end-users, since they are the largest |>group of database utilizers in an organization. Your comment is |>correct and rounds out my point. If you are talking about end users and canned applications, the model used isn't that important. If you talk about the programmers who implement the canned application, then it is a different story. Frankly, I am now confused. Your previous arguments made sense for application programmers, but now you say you were really talking about end users. |>>|>Relational systems lend themselves well to distributed database, where |>>|>by definition there will be fewer, if any, centralized [servers] |>> |>>Huh? What definition? |> |>If a database is distributed, then the database state is maintained by |>more than one server. The limiting case is where every machine on |>the network is of similar size and maintains an equal part of the database. |>A more likely scenario is a server hierarchy, such as in telephone exchanges. |> My question arose because you gave no reason why you thought relational systems were better for distributed databases. I then gave my reason below. |>>I think relational systems lend themselves well to distributed databases |>>because they are set-oriented, rather than navigational systems like the |>>hierarchical and network models. |> |>That's what I was driving at. Thanks for the words beyond "so many words". |> |>>|>... Today's technology is proving (already has, actually) that the |>>|>assertion that relational is slow is out-of-date. What's more, |>> |>>I think that that assertion was proven incorrect about 10-15 years ago. |> |>It was proven that RDBMS COULD be as fast as existing navigational systems, |>but there haven't been competitive products till recently. "Proof" |>for many folks requires no less than a released (or announced :-) product. |> Well, have there seen any released competitive products (by mean competitive, I don't mean better than other *relational* DBMSs, but better than any *other* DBMSs). |> |>>... 1000TPS is high-performance. 12TPS (or 34) is acceptable. |> |>Show me someone getting 1000TPS on a Sun 3/280. What must not exist in a |>product is a performance ceiling above which throughput stops growing |>(linerarly) with increase in platform scale. That's the essence of |>the perceived "relational bug." This was really a cheap shot on my part. I was referring to Kai Li's main-memory database system at Princeton which I believe achieved 1000 TPS. No, it was not on a Sun 3/280... |> |>>... [J]oins are a real big performance killer for relational systems. |> |>Not if they are pre-optimized or pre-computed. What do you mean by pre-optimized or pre-computed? What if I performed a join that was not pre-optimized or pre-computed? |> |>>So there is some substance behind the users |>>associating relational "... products with poor system performance, even |>>though they may be flexible and easier to implement."[from the original |>>posting on the British report] |> |>Sure, the substance is based on historical knowledge. That knowledge |>is being obsoleted by the onset of RDBMSs that scale well. I believe |>users will be able to have both DP-scale performance and ease of use in RDBMS |>in the near future. |> How many RDBMSs scale well (besides Sybase, of course ;-)? Better yet, how many RDBMSs scale? |>>But to answer Tom's question on whether "relational" has to mean "overhead": |>>Relational does not mean overhead, but since it provides more "features" |>>(flexible, easier to implement, easier to use(?)), some overhead *must* |>>be incurred. |> |>The question is where to place that overhead. That's one problem we |>(Sybase anyway) are trying to solve. |> |>>I think a good discussion would be over where the overheads are. For |>>starters, relational query compilation has to be smarter. |> |>Hmm, I've been developing the opinion that query compilation is a largely |>solved problem (cost-based optimizers, etc.), but that fundamental things |>like I/O management and access methods policies need a lot more work |>in RDBMS. So sounds like we have a good discussion ahead of us :-) . I/O management and access methods policies are orthogonal to the data model. These issues are just as important in navigational models. The reason I suggested query compilation as a point of study is that in navigational systems, the application programmer has to know the physical layout of the database files in order to write code that could navigate. The programmer has to know about the clustering, the indices, what pointers to chase, etc. (Please correct if I am wrong about this. I have never had the opportunity to program on a navigational system). Therefore, a *good* application programmer would know the best way to access the database for a given query and could write optimal code. One the other hand, in the relational model, application programmers are encouraged not to know the underlying physical layout of the database. They are dependent on the query compiler to map their logical view and operations to physical operations. I don't believe current compilers have reached the expertise of hand-crafted coders in performing this mapping. It is certainly easier to talk about relational things since a declarative language is used instead of a procedural one. But the penalty of a declarative language is that it must be translated to a procedural one. Though RDBMSs (and in particular, Sybase) can use precompiled queries to improve performance, this does not solve the problem of ad-hoc queries. BTW, is there such a thing as an ad-hoc query in navigational systems? Cheers, *swfc
normb@sequent.UUCP (Norm Browne) (08/08/90)
In article <13545@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: > > ... Though RDBMSs (and in particular, Sybase) can use >precompiled queries to improve performance, this does not solve the >problem of ad-hoc queries. Nothing solves the "problem" of ad-hoc queries (save of course more horsepower). I have never seen a single architecture that could possibly serve two divergent needs (such as transaction processing and decision support). The common and IMO appropriate methodology for handling these is to keep them separate, one system to handle TP and another (periodically refreshed) to provide DSS. >BTW, is there such a thing as an ad-hoc query in navigational systems? Focus (from Information Builders) has provided this capability in the mainframe world (ugh) for years. The report-writer/query language is non-procedural and can access such various data structures as VSAM, IMS, DB2 and SQL/DS, Adabas, Total, IDMS and just about anything else that runs on a 370. The end user is almost completely insulated from the underlying structure. There are other products that provide some of this type of functionality (Mark IV, Dyl280). ..NB
tim@ohday.sybase.com (Tim Wood) (08/11/90)
In article <13545@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: >In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: >|>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F >Chen) writes: >|>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) > >If you are talking about end users and canned applications, the model >used isn't that important. If you talk about the programmers who >implement the canned application, then it is a different story. >Frankly, I am now confused. Your previous arguments made sense for >application programmers, but now you say you were really talking about >end users. Making it very simple: The relational model eases application development. This tends to encourage application development. So the app. programmer and the users benefit: the users get more needs met because the app. programmer's job is easier because the relational model makes app. writing easier. >Well, have there [b]een any released competitive products (by ... >competitive, I don't mean better than other *relational* DBMSs, but >better than any *other* DBMSs)? Sure. Relational has made the metrics of non-procedural access and data independence part of the "competitiveness" equation. RDBMS's are competitive because people are buying them. They are now competing on another crucial metric, performance, so making navigational systems even less attractive. >|>>... [J]oins are a real big performance killer for relational systems. >|>Not if they are pre-optimized or pre-computed. >What do you mean by pre-optimized or pre-computed? What if I performed >a join that was not pre-optimized or pre-computed? Pre-optimized: the query processing strategy is determined and that strategy is saved in the DBMS in a pre-compiled form. The DBMS needs merely to execute the strategy to obtain the results. Pre-computed: the RESULTS (not just the strategy for obtaining them) are saved somewhere (in or out of the DBMS) saving the need to recompute them. A long ad-hoc join will tend to dent transaction processing performance (see Marc Zwieger@Sequent's thread on this). So maybe you run ad-hoc users at lower priority, or defer the volume updates for overnight (but allow your DBMS's view of the world to diverge several hours from reality), etc. >How many RDBMSs scale well (besides Sybase, of course ;-)? Better yet, >how many RDBMSs scale? I'm not motivated enough to do the research to answer this question. If I was a prospect or a consultant (or in Marketing :-), I would be. My point is, the important DBMSs will be ones that can operate efficiently at various scales, from departmental 80386(tm) to nerve-center mainframe. >|>Hmm, I've been developing the opinion that query compilation is a largely >|>solved problem (cost-based optimizers, etc.), but that fundamental things >|>like I/O management and access methods policies need a lot more work >|>in RDBMS. So sounds like we have a good discussion ahead of us :-) . >I/O management and access methods policies are orthogonal to the data >model. These issues are just as important in navigational models. Of course, but these issues haven't been as well addressed in relational models, and they are the ones (in most cases) that are hindering the ability of relational systems to scale well. The fact is, decreasing numbers of people are interested anymore in making navigational systems faster. >The reason I suggested query compilation as a point of study is that in >navigational systems, [ good summary of the application programming >issues deleted ]. [But in relational, programmers ] >are dependent on the query compiler to map their logical view and >operations to physical operations. I don't believe current compilers >have reached the expertise of [human] coders in performing this mapping. Probably not. The important ($) question for most users is, do today's optimizers do a GOOD ENOUGH job on ENOUGH queries, without doing a HORRIBLE job on nearly ANY query. The more decision support (ie ad-hoc queries) an organization does, the more important the optimizer will be. However, a decent optimizer has become a check-off item for any RDBMS today, as it should be: that is one of the things that allows them to perform at all on those easily-written (but sometimes hard to answer) ad-hoc queries. -Tim --- Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608 415-596-3500 tim@sybase.com {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim One day, when I can afford enough lawyers, I will speak for a whole company. For now, I speak just for myself.
ghm@ccadfa.adfa.oz.au (Geoff Miller) (08/13/90)
tim@ohday.sybase.com (Tim Wood) writes: >Making it very simple: >The relational model eases application development. This tends to >encourage application development. So the app. programmer and >the users benefit: the users get more needs met because >the app. programmer's job is easier because the relational model makes >app. writing easier. I would agree with Tim, and particularly with his choice of words - "relational model" rather than "RDBMS". One can (and we have) successfully implement databases designed using the relational model without using an RDBMS, and we still obtain the advantages which Tim points out. I have been concerned for some years now that the marketers of so-called "relational" products have pursuaded a gullible user community into thinking that a relational model can only be implemented using an RDBMS, which simply is not so! Geoff Miller (ghm@cc.adfa.oz.au) Computer Centre, Australian Defence Force Academy
swfc@ulysses.att.com (Shu-Wie F Chen) (08/13/90)
In article <10494@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: |>In article <13545@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: |>>In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes: |>>|>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F |>>Chen) writes: |>>|>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) |>> |>>If you are talking about end users and canned applications, the model |>>used isn't that important. If you talk about the programmers who |>>implement the canned application, then it is a different story. |>>Frankly, I am now confused. Your previous arguments made sense for |>>application programmers, but now you say you were really talking about |>>end users. |> |>Making it very simple: |>The relational model eases application development. This tends to |>encourage application development. So the app. programmer and |>the users benefit: the users get more needs met because |>the app. programmer's job is easier because the relational model makes |>app. writing easier. |> I agree... |>>Well, have there [b]een any released competitive products (by ... |>>competitive, I don't mean better than other *relational* DBMSs, but |>>better than any *other* DBMSs)? |> |>Sure. Relational has made the metrics of non-procedural access and |>data independence part of the "competitiveness" equation. RDBMS's |>are competitive because people are buying them. They are now |>competing on another crucial metric, performance, so making navigational |>systems even less attractive. |> Okay. Though I was expecting the names of the released products with high performance, I agree with your statement. |>>|>>... [J]oins are a real big performance killer for relational systems. |>>|>Not if they are pre-optimized or pre-computed. |>>What do you mean by pre-optimized or pre-computed? What if I performed |>>a join that was not pre-optimized or pre-computed? |> |>Pre-optimized: the query processing strategy is determined and that |>strategy is saved in the DBMS in a pre-compiled form. The DBMS needs |>merely to execute the strategy to obtain the results. Pre-computed: |>the RESULTS (not just the strategy for obtaining them) are saved |>somewhere (in or out of the DBMS) saving the need to recompute them. A Yes, pre-optimized and pre-computed queries will definitely be wins. But with respect to pre-optimized queries as a solution to improving joins performance, I had thought that the overhead came from the *execution* of the strategy, not the determination of the strategy. This really is a minor quibbling point on my part, the argument that a compiled program runs faster than an interpreted one is true regardless of whether the program contains joins. [some deleted stuff about ad-hoc queries, scalability of RDBMSs] |> |>>|>Hmm, I've been developing the opinion that query compilation is a largely |>>|>solved problem (cost-based optimizers, etc.), but that fundamental things |>>|>like I/O management and access methods policies need a lot more work |>>|>in RDBMS. So sounds like we have a good discussion ahead of us :-) . |>>I/O management and access methods policies are orthogonal to the data |>>model. These issues are just as important in navigational models. |> |>Of course, but these issues haven't been as well addressed in relational |>models, and they are the ones (in most cases) that are hindering |>the ability of relational systems to scale well. The fact is, |>decreasing numbers of people are interested anymore in making navigational |>systems faster. |> I've been using navigational to mean hierarchical and network DBMSs. It is true that there is decreasing interest in making these systems faster. However, there is increasing interest in object-oriented systems which are inherently navigational because of the class-composition hierarchy. (This does not imply that there can not be non-navigational components.) This discussion is leading to my last paragraph further down... |>>The reason I suggested query compilation as a point of study is that in |>>navigational systems, [ good summary of the application programming |>>issues deleted ]. [But in relational, programmers ] |>>are dependent on the query compiler to map their logical view and |>>operations to physical operations. I don't believe current compilers |>>have reached the expertise of [human] coders in performing this mapping. |> |>Probably not. The important ($) question for most users is, do today's |>optimizers do a GOOD ENOUGH job on ENOUGH queries, without doing |>a HORRIBLE job on nearly ANY query. The more decision support (ie ad-hoc |>queries) an organization does, the more important the optimizer will be. |>However, a decent optimizer has become a check-off item for any RDBMS today, |>as it should be: that is one of the things that allows them to perform |>at all on those easily-written (but sometimes hard to answer) ad-hoc queries. |>-Tim |>--- As far as I can tell, we are agreeing that RDBMS is a proven technology that has matured quite nicely into commercial products. It is not perfect for all applications (e.g. CAD/CAM, software programming environments, gigantic IMS databases). There remains work to be done on improving performance. The new question is: what comes next? For starters, I'll cite two references: 1. Third-Generation Data Base System Manifesto by the Committee for Advanced DBMS Function (Stonebraker, Rowe, Lindsay, Gray, Carey, Brodie, Bernstein, Beech). 2. The Object-Oriented Database System Manifesto by Atkinson, Bancilhon, DeWitt, Dittrich, Maier, and Zdonick. Any comments? *swfc
swfc@ulysses.att.com (Shu-Wie F Chen) (08/13/90)
In article <1809@ccadfa.adfa.oz.au>, ghm@ccadfa.adfa.oz.au (Geoff Miller) writes: |>tim@ohday.sybase.com (Tim Wood) writes: |> |>>Making it very simple: |>>The relational model eases application development. This tends to |>>encourage application development. So the app. programmer and |>>the users benefit: the users get more needs met because |>>the app. programmer's job is easier because the relational model makes |>>app. writing easier. |> |>I would agree with Tim, and particularly with his choice of words - |>"relational model" rather than "RDBMS". One can (and we have) successfully |>implement databases designed using the relational model without using an |>RDBMS, and we still obtain the advantages which Tim points out. I have |>been concerned for some years now that the marketers of so-called |>"relational" products have pursuaded a gullible user community into thinking |>that a relational model can only be implemented using an RDBMS, which |>simply is not so! |> Won Kim recently defined an object-oriented database to be a database that implements the object-oriented model (which he kind of defined ;-). Following this logic, isn't a RDBMS a database that implements the relational model? *swfc