kfw@ecrcvax.UUCP (Fai Wong) (10/10/89)
Hi, I am seeking for information on commercial database machines. I've read about the Britton-Lee IDM, the Intel iDBP and the Teradata DBC. The information I have describing these machines are out of date (ca 1985). Could anyone tell me if these machines are still being made ? If they are, how & where could I obtain more information about them ? I would also appreciate any information on any other commercial database machines. Thanks a billion in advance. Cheers, Kam-Fai. (please send e-mail to kfw@ecrcvax.UUCP)
johnl@esegue.segue.boston.ma.us (John R. Levine) (10/10/89)
In article <785@ecrcvax.UUCP> kfw@ecrcvax.UUCP (Kam-Fai Wong) writes: > I am seeking for information on commercial database machines. I've > read about the Britton-Lee IDM, the Intel iDBP and the Teradata > DBC. ... I was pretty much the only OEM user of the iDBP. I wrote a compiler for its command language (a tokenized relational algebra with a lot of other stuff) which either ran standalone or embedded in C, and a few demo applications, a text searcher and a QBE subset. The logical design was great, but the implementation by a bunch of iRMX jocks left a lot to be desired. They never finished it and withdrew it before it shipped. The design was basically relational, except that you could store pointers either to precompute joins or to implement oldthink architectures, and you could also have pointers to regions of unstructured data for text or images. I liked the design better than the IDM because it let you manipulate your data in flexible ways; it was easy for example to get each record in file A followed by the records in file B that are joined to it, without having to generate a joined file and retrieve the A record for every B record, stuff like that. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl Massachusetts has over 100,000 unlicensed drivers. -The Globe
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/28/90)
In reviewing my (old) files on DataBase machines (e.g. ShareBase == Britton Lee) it appeared that the main "problem" that a DBM is supposed to address is the limitation of main memory bandwidth. Specifically, if processing can be done "in the channel" rather than on the main processor(s), then only data which will be used later will be written to memory. Also, some level of parallelization can take place, and cycle consuming operations that can be done in parallel can be offloaded. Presumably, this allows cheaper machines with limited memory bandwidth to avoid wasting that bandwidth on reading data into memory which will be filtered out at the first pass. Or, which is wasted on "trivial but expensive" operations such as compression. Presumably the result is cheaper parallelism. Now, this is an issue which reappears in many forms: Specialized network processors, Specialized graphics processors, and, Specialized DataBase processors. The question with any such processor is whether it provides a big enough performance boost to keep the product ahead of general purpose machines, which are usually on a much faster design cycle. I have several questions: 1) Just what kind of performance, by various appropriate measures, do the current crop of DBMs provide (BTW - I notice that there are now 2 measures of standard TPS units - any enlightenment and/or correspondence between the two appreciated) vs standard architecture machines. Does anyone know what kind of performance IBM gets out of ACP and RDBMSs, say, on its biggest iron? How about smaller machines. Sun was quoting pretty high numbers, by one of the TPS measures, on the 4/490, for a bus based workstation server. How about other, more complex operations? 2) Has anyone considered an extended filesystem approach for Unix, wherein specialized DataBase operations are supported in the filesystem, and, a specialized bus-based (e.g. VME, FutureBus) processor is attached to the system. This would appear to be more flexible and allow many companies to access the functionality, the same way that they do with new disk controllers, etc. What primitive operations should be supported: powerful enough to reduce traffic to memory by a large fraction, general enough to use with a wide variety of database systems? What I envision is a definition for a dbfs, database filesystem, which would support various RDBMSs such as Oracle and Sybase via multiple dbfs's. The parallelism would have to be across controller boards/filesystems, with processors in the controllers for the operations. You might support better keyed access, compression, lock support (what kind?), disk optimizations (the usual), and any other useful parallelizable operations. In configuration terms, you might have 4 controller boards on your VME (say) system, and get some fraction of parallel performance across the boards. Comments? Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)604-6117
jkrueger@dgis.dtic.dla.mil (Jon) (03/01/90)
lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: >[about database machines as an example of the tradeoffs >involved in designing specialized machines of various sorts] >The question with any such processor is whether it provides a big >enough performance boost to keep the product ahead of general purpose >machines, which are usually on a much faster design cycle. It seems (I have no numbers) that database machines provided a big enough boost during roughly 1983 to 1988. After that those KMs adnd shortened design cycles narrowed the performance gap, reversed the price/performance gap. But a lot of database machines, then and now, justify their purchase not by providing better performance for the database user but rather by giving the general purpose machine back to everybody else. Database applications can be bad citizens in general timesharing environments. People like to offload them to a separate machine. Of course, since users connect to their databases from front ends running on other machines, the catch is how many problems partition themselves poorly between front and back end, thus bottlenecking on network throughput and sometimes latency. No amount of performance in the workstation or database machine helps in this case, and it is common. >I have several questions: >1) Just what kind of performance, by various appropriate measures, do the > current crop of DBMs provide (BTW - I notice that there are now > 2 measures of standard TPS units - any enlightenment and/or > correspondence between the two appreciated) vs standard architecture > machines. Warning: TPS measurements are susceptible to manipulations that make string inlining on Dhrystone look honest to a fault. And TPS's are almost never measured or reported with anything but intent to deceive. And even when done honestly, just as compiler optimizations achievable on small synthetic benchmarks can be poor predictors of performance for real applications, vendor X's TPS's on TP1 or DebitCredit can lead you down a garden path. A simple example is that DebitCredit requires you to scale table size by measured TPS's, you have to titrate it until your measured TPS's are obtained against a table of appropriate size. Reported TPS's seldom also report enough information to determine whether this was done. This is just a simple example; it gets much, much worse. There are many, many variables that can lead to misleading results, and misleadingly high results are seldom an accident. Even when done honestly, TPS are a highly derived unit. They're even further from characterizing overall DBMS performance than "MIPS" are from characterizing overall computational performance. TPS results aren't comparable across software, hardware, or communications pathways between front and back ends. E.g. you can use results to measure improvements due to software (e.g. single versus multiple server processes) by comparing on same hardware. But you can't compare results from database machines with results from different hardware and software, the data simply isn't meaningful. And when the data is collected, all it tells you is how well you did on short, simple queries. This is useful to people who want to attach their point-of-sale system to a general purpose database engine, but of very limited use to the rest of us. For instance, TPS numbers don't contribute a thing to predicting performance of CAD/CAM databases. >2) Has anyone considered an extended filesystem approach for Unix, > wherein specialized DataBase operations are supported in > the filesystem See: The Use of Technological Advances to Enhance Database System Performance, P. Hawthorn and M. Stonebraker, ACM-SIGMOD Conference Proceedings, ACM-SIGMOD International Conference on Management of Data, Boston, Massachusetts, June 1979. Performance Enhancements to a Relational Database System, M. Stonebraker, J. Woodfill, J. Ranstrom, M. Murphy, M. Meyer, and E. Allman, ACM Transactions on Database Systems, vol. 8, no. 2, June 1983. Both reprinted in: The INGRES papers, M. Stonebraker, ed., Addison-Wesley, 1986, chapters 5 and 6. > and, a specialized bus-based (e.g. VME, FutureBus) > processor is attached to the system. This would appear to be more > flexible and allow many companies to access the functionality, the > same way that they do with new disk controllers, etc. This idea is new as far as I know. > What primitive > operations should be supported: powerful enough to reduce traffic > to memory by a large fraction, general enough to use with a wide > variety of database systems? What I envision is a definition for a > dbfs, database filesystem, which would support various RDBMSs such as > Oracle and Sybase via multiple dbfs's. The parallelism would have > to be across controller boards/filesystems, with processors in the > controllers for the operations. You might support better keyed access, > compression, lock support (what kind?), disk optimizations (the usual), > and any other useful parallelizable operations. In configuration terms, > you might have 4 controller boards on your VME (say) system, and get > some fraction of parallel performance across the boards. Analysis of existing database systems shows that simple, short queries have distinctly different performance characteristics from complex queries. (TPS (sometimes) predicts performance for applications in which the former predominate). Things that help are fast commits, fast acquisition and relinquishing of locks, deferred writes, buffering of various sorts. Some of this can be helped by specialized hardware, but currently the bottlenecks tend to be software. Also the software can go a long way to reducing raw hardware requirements. For instance, a log file can record committed transactions, from which the main table is silently (safely, transparently) updated. This decouples latency from throughput. You can then buy a high latency device to store your tables and a high throughput device to store the log file. The economies achievable by not requiring both in a single device are said to be substantial (again, no numbers, sorry). Things that help complex queries are, well, complex. Hardware assists are more likely to help here, but in very close collaboration with matching software. For example, keeping as many parallel paths busy as possible is intimately connected with query decomposition. To saturate your special hardware resources, you have to know which parts of the query can execute in parallel, how much faster the total query will (most likely) complete if a given amount of resource is invested to a given part of the query at a given stage, not to mention the costs and latencies of fetching from databases distributed over multiple machines. Thus my guess is that general purpose machines will continue to have an edge for at least several years. Special hardware of various sorts will help some, but until the software can make intelligent use of it in real time, its potential will be mostly unrealized. -- Jon -- Jonathan Krueger jkrueger@dtic.dla.mil uunet!dgis!jkrueger The Philip Morris Companies, Inc: without question the strongest and best argument for an anti-flag-waving amendment.