tim@linus.sybase.com (Tim Wood) (09/02/88)
In article <6178@dasys1.UUCP> mtxinu!uunet!dasys1!alexis writes: > >There has been some discussion recently in comp.sys.mac concerning the relative >merits of two different methods of storing database files: the monolithic >(all-in-one-file) way and the distributed (files-all-over-the-place) method. >... > >The benefits of monolithic structure are few. It's dangerous to lead with your conclusion. Your position is not 100% false or true. Mono vs. separate file performance is very dependent on the underlying software platform (i.e. OS) and its implementation. I object to the use of "distributed" also; this term as used with DBMS's refers to a database that is physically fragmented across > 1 machine, but allows the user transparent access to it, i.e. access not requiring knowlege of or even showing the physical dispersion. I should mention I am from the POV of a transaction-processing relational DBMS vendor. Many users, much traffic, much multi-user contention. Issues are different from the PC/Mac DBMS ones. >With the Monolithic structure, it is possible (or guaranteed, depending on >your particular choice >of DBMS) that you will corrupt some other portion >of your database. Again, you are assuming your conclusion. Properly designed databases decouple their indexes sufficiently so that a corrupt one may be deleted and a new one created. Now, an error can happen with a data structure that the index and its table (or d.b.) share; that can lead to serious corruption. But if such sharing is designed into a multi-file system, then the same vulnerability exists. Also, the OS could crash, leaving several files inconsistent. >Good luck >finding out what got damaged- it may take you weeks to find it, Unless your system is properly instrumented, with automatic checkpointng, and properly managed with regular backups- >by which time all your backups may have the damage as well. >In rare cases you could trash your entire database.... >The danger, however, never goes away entirely. This reason alone should >convince most people. Our customers don't seem to be convinced of the danger. Of course, most software is not bug-free, and that's where customer support comes in. The idea is to make the catastrophe scenario very unlikely, with good software, fault-tolerance features, regular checkpoints and backups. Reducing the likelihood of catastrophe makes the optimization gained from a single, specially-managed file worth it. > >The other overriding reason to use a distributed structure is performance. If >your DBMS has to go through its own file-management code as well as the OS's, >it will always be slower than if it only needed to go through the OS. Unless you don't go thru the OS code. Know what a "raw disk" (in UNIX) or a "foreign mounted" disk (in VMS) is/are? They are physical-block interfaces to disks and disk partitions. They allow I/O to/from the user's own buffer rather than the disk cache's, and immediate write-thru instead of being subject to the OS's caching policies. Why use OS services for a DBMS when they weren't written for a DBMS? I think you are also exaggerating the trade-off. Why are 2 directory lookups (to reach table T in database D) faster than two file reads (1 for data dictionary to get table location and the other for the table page). Also with an OS approach, to get decent caching, you'd use up file descriptors very quickly keeping previously-used tables open. Not so if you have your own table handles within the DBMS, especially if they are a configurable resource. >...With a distributed structure, the middle step doesn't >exist. This time savings is not immense if the DBMS is very well-written, but >the vast majority are not (at least when it comes to speed optimization). Herewith, I toot my own horn: Sybase is very well-written (having written about 8-10% myself and reviewed about 90). The system was designed with data integrity and performance as the primary goals. > >There is a much bigger performance gain for distributed structures in a >multiple-machine or multiple-hard-disk environment. ... > [ discussion of multiple-disk interleaving, RAM-disks, > arm-waving around the problem of multi-user access management: >...Of course there are >important logistics to consider, such as how to lock out access to data which >is temporarily invalid because it is being updated privately by that node, but >often this is not a problem at all. Even when it is, it's better that not being >able to do the job at all. ] Unless you don't mind inconsistent transactions and corrupt data. As for the disk strategies, they're fine. I should mention that a mono organization should not prohibit you from using other disk devices (or files). If you have two disks, you want to lay things out so that they carry about the same load at any given time. A good mono organization allows this interleaving without forcing dependency on OS mechanisms for implementing it. > >There is one other very important reason to use a distributed structure that >comes to mind. Any monolithic structure will impose arbitrary restrictions on >the number of data files or fields (tables and columns) allowed in the >database. Sometimes these restrictions are very severe. Furthermore, if you >have a very large database, it may not fit on one physical disk, and with the >monolithic structure you are limited (generally) to one device. With the >distributed structure, these limitations just go away. Continued simplistic tone. OS's are software. DBMS's are software. Why can't a DBMS be written with the same liberal (or practically nonexistent) limits that the magical OS managing the separate-file structure has? > >....For example, one of the big DBMSs >that runs under UNIX (RTI Ingres? Oracle?) bypasses the file system to write >directly to the disk (let's skip the technical details...). While this is like >the monolithic system in some ways, it still allows for some of the benefits of >the distributed structure. UNIX wizards may have a lot to say about this... That's us (at least). We went so far as to modify the good ol' OS (Sun UNIX) to support async I/O to raw disks, so that our (UNIX) process can compute while the hardware deals with our last request(s). This feature is native to VMS, and we use it there too. Really the issue here is OS independence, not how many files you use. Our experience is that we know best how to deploy system resources to run our software, not the writers of the host operating system, fine as that OS may be for other purposes. -TW Disclaimer: I am responsible for these opinions. {ihnp4!pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim ..not an @ in the bunch...
jkrueger@daitc.daitc.mil (Jonathan Krueger) (09/04/88)
In article <861@sybase.sybase.com> tim@linus.sybase.com (Tim Wood) writes: >Herewith, I toot my own horn: Sybase is very well-written (having written >about 8-10% myself and reviewed about 90). The system was designed >with data integrity and performance as the primary goals. And have you met those goals? How do you measure your success? -- Jon -- Jonathan Krueger uunet!daitc!jkrueger jkrueger@daitc.arpa (703) 998-4777 Inspected by: No. 15
alexis@dasys1.UUCP (Alexis Rosen) (09/07/88)
Recently, Tim Wood wrote a response to my article on database file structures that was, I feel, much more on-target than the other response by Jon Krueger. He missed the same thing as Jon, though- that I was really addressing smaller machines. At any rate, I specifically excluded DBMSs which did raw I/O from my analysis. BTW, the raw I/O stuff looks like distributed structure to me. In fact, the real breakdown, I guess, is between products that layer two levels of file access on top of each other vs. the ones that only have one layer (i.e. they either use the native file system for each of their logical files, or do raw I/O and use their own file system). In article <861@sybase.sybase.com>, Tim Wood (tim@linus.sybase.com) writes: >In article <6178@dasys1.UUCP> mtxinu!uunet!dasys1!alexis writes: >>There has been some discussion recently in comp.sys.mac concerning the >>relative merits of two different methods of storing database files: the >>monolithic(all-in-one-file) way and the distributed >(files-all-over-the-place) >method. [etc.] >>The benefits of monolithic structure are few. > >It's dangerous to lead with your conclusion. Your position is not >100% false or true. Mono vs. separate file performance is very >dependent on the underlying software platform (i.e. OS) and >its implementation. Very true. But note my previous comments- for the general PC market, which is (right now) MS-DOS and MacOS, my original comclusions are true... > I object to the use of "distributed" also; >this term as used with DBMS's refers to a database that is >physically fragmented across > 1 machine, but allows the >user transparent access to it, i.e. access not requiring knowlege of >or even showing the physical dispersion. Also true. I was aware of this use of the term, which is why I was careful to describe exactly what I meant. Still, if you've got a clearer term, I'd be glad to adopt it. I'm not sure that just 'separate' fits the bill... >I should mention I am from the POV of a transaction-processing >relational DBMS vendor. Many users, much traffic, much multi-user >contention. Issues are different from the PC/Mac DBMS ones. I wish they weren't... I'd give an arm and a leg for a good fast transaction processing DBMS for the Mac. FoxBase has some promise, but it ain't there yet, not by a long shot. So when will Sybase run under Mac's OS, Tim? > [Makes a good point about invulnerability of large-system DBMSs which has, > unfortunately for me, little to do with PCs] > >>Good luck >>finding out what got damaged- it may take you weeks to find it, > >Unless your system is properly instrumented, with automatic checkpointng, >and properly managed with regular backups- Again, I just wish this kind of thing were available on smaller systems. I have a Mac II with 5 MB of RAM and 300 MB of disk. This is considerably more horsepower than a VAX 11/750 which I used to use. Why aren't such tools available? The same could be said for the 25MHz '386 systems out there. >>The other overriding reason to use a distributed structure is performance. If >>your DBMS has to go through its own file-management code as well as the OS's, >>it will always be slower than if it only needed to go through the OS. > >Unless you don't go thru the OS code. Know what a "raw disk" (in UNIX) >or a "foreign mounted" disk (in VMS) is/are? > [followed by fairly good discussion of why raw I/O is Good.] I specifically mentioned that raw I/O was excepted from my analysis. >>There is a much bigger performance gain for distributed structures in a >>multiple-machine or multiple-hard-disk environment. ... >> [ discussion of multiple-disk interleaving, RAM-disks, >> arm-waving around the problem of multi-user access management: >>...Of course there are >>important logistics to consider, such as how to lock out access to data which >>is temporarily invalid because it is being updated privately by that node, >>but often this is not a problem at all. Even when it is, it's better that not >>being able to do the job at all. ] > >Unless you don't mind inconsistent transactions and corrupt data. Since Jon misunderstood this remark also, I guess I was very unclear here. What I was saying was that in many cases, with canned applications, you have knowledge that certain tables will never be written to except by one user (exceept on weekends, on Feb. 29, and during a Solar eclipse. Whatever...) In this case, you need not establish a lock and can just do whateer you like without worries. This is what I meant by it being 'not a problem'. >As for the disk strategies, they're fine. I should mention that a >mono organization should not prohibit you from using other disk >devices (or files). [etc.] Well, it's easy enough to do disk striping, but I'm not convinced that the DBMS is always going to be smarter than I am. What about prioritization? I want certain users of table A, who don't need indices I, J, K, and L, to have very fast response time. Other users of A, who need I J K and L, can go hang because they aren't upper management. :-) In this case I want to put A on one spindle and everything else on another. I am not saying that it's impossible to write a DBMS which can handle things like this. I am saying that that would involve a fair amount of AI, and I haven't seen anything like it yet. >>There is one other very important reason to use a distributed structure that >>comes to mind. Any monolithic structure will impose arbitrary restrictions on >>the number of data files or fields [etc.] > >Continued simplistic tone. OS's are software. DBMS's are software. >Why can't a DBMS be written with the same liberal (or practically nonexistent) >limits that the magical OS managing the separate-file structure has? Very true. I got carried away bitching about the current state of the art in PCs. This is, of course, not necessarily the case with all systems. My mistake. I wonder, though, how many large DMBSs do have (virtually) unbounded file size? Surely Sybase does (;-) but what of the others? Years ago I used a DBMS on a VAX which had precisely this problem. Don't remember which one it was, though. >>....For example, one of the big DBMSs >>that runs under UNIX (RTI Ingres? Oracle?) bypasses the file system to write >>directly to the disk (let's skip the technical details...). While this is >>like the monolithic system in some ways, it still allows for some of the >>benefits of the distributed structure. UNIX wizards may have a lot to say >>about this... > >That's us (at least). [Toots own horn, has probably earned that right] >Really the issue here is OS independence, not how many files you use. >Our experience is that we know best how to deploy system resources >to run our software, not the writers of the host operating system, >fine as that OS may be for other purposes. > >-TW Yes. As I wrote at the beginning of this article, raw I/O fits my 'distributed' model more closely than it does the 'monolithic' model. Excluding this, though, does anyone think that mono structures have any big advantages? Anyway I am glad that you have taken the time to write the perfect DBMS back-end. Now all you need to do is sell it for Macs and PCs (not under unix, either) and I'll be very happy. --- Whew. Now, let me ask a question without making any sweeping statements: One point we all missed is the possibility of combining multiple tightly- related tables on the same physical portion of the disk. The speed advantage here might be worth a mono structure (but I would guess not). Some SQLs allow you to create "clusters". Now, what exactly is the DBMS writing to the disk? The only way I can think of for doing this is to actually store a join of the two files. This could chew up an enourmous amount of disk space. What is it doing? ---- Alexis Rosen {allegra,philabs,cmcl2}!phri\ Writing from {harpo,cmcl2}!cucard!dasys1!alexis The Big Electric Cat {portal,well,sun}!hoptoad/ Public UNIX if mail fails: ...cmcl2!cucard!cunixc!abr1 Best path: uunet!dasys1!alexis
tim@linus.sybase.com (Tim Wood) (09/10/88)
In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes: > >Recently, Tim Wood wrote a response to my article on database file structures >... >He missed ... that I was really addressing smaller >machines. At any rate, I specifically excluded DBMSs which did raw I/O from my >analysis. I don't believe the PC focus was clear. The article made rather sweeping statements about different storage structures in general, but the conclusions did not generalize beyond the single-user environment, and maybe not within that. As for raw I/O, if your DBMS does not take advantage of a mechanism in the environment that offers potentially higher throughput, then it is not the DBMS to use in your environment. It's akin to saying "I'm going to compare vendor X's bubble-sort vs. vendor Y's selection sort routines but ignore the quicksort from vendor W." > >BTW, the raw I/O stuff looks like distributed structure to me. No. A single raw disk partition (of a certain minimum size) can hold the entire set of databases and tables the DBMS sees. This is likely to give suboptimal performance (because no interleaving) and impose space constraints (i.e., need to grow beyond confines of the partition.) It's the most restrictive case of a scalable system (1 CPU, 1 disk). But a flexible mono system will let you scale upward and choose your physical layout. >In fact, the >real breakdown, I guess, is between products that layer two levels of file >access on top of each other vs. the ones that only have one layer (i.e. they >either use the native file system for each of their logical files, or do raw >I/O and use their own file system). That's a lot closer to the real issue. >>>...Of course there are >>>important logistics to consider, such as how to lock out access to data which >>>is temporarily invalid because it is being updated privately by that node, >>>but often this is not a problem at all. Even when it is, it's better that not >>>being able to do the job at all. ] >> >>Unless you don't [like] inconsistent transactions and corrupt data. > >... I guess I was very unclear here. >What I was saying was that in many cases, with canned applications, you have >knowledge that certain tables will never be written to except by one user >(exceept on weekends, on Feb. 29, and during a Solar eclipse. Whatever...) >In this case, you need not establish a lock and can just do whateer you like >without worries. This is what I meant by it being 'not a problem'. The only problem is trivial applications. Inability to support general concurrency with consistency severely limits the ability of your system to model the real-world problems you are trying to solve. If you know that your business is going to continue to operate from your garage with 3 employees for the forseeable, you might go with manual control. If your workload is going to grow, though, that will quickly become untenable. It's much better to adopt a transaction-oriented DB design from the start. Then your applications can evolve at the level of your business problem rather than "well, I guess it's time to find a DBMS that supports locking and revamp my applications for it". Moreover, with an active data dictionary, there will be concurrent updates to the master object directory (e.g. "sysobjects"), even if no user ever accesses an object in use by another, because the data dictionary must maintain changing information about all the objects. > >>As for the disk strategies, they're fine. I should mention that a >>mono organization should not prohibit you from using other disk >>devices (or files). [etc.] > >Well, it's easy enough to do disk striping, but I'm not convinced that the DBMS >is always going to be smarter than I am. What about prioritization? I want >certain users of table A, who don't need indices I, J, K, and L, to have very >fast response time. Other users of A, who need I J K and L, can go hang because >they aren't upper management. :-) In this case I want to put A on one spindle >and everything else on another. I am not saying that it's impossible to write a >DBMS which can handle things like this. I am saying that that would involve a >fair amount of AI, and I haven't seen anything like it yet. That's not AI, that's database design. The DBMS needs to offer the facilities for the DBA to specify physical storage usage at various levels. Good DB design and layout are powerful means of obtaining performance. Your example is puzzling, since indexes are supposed to make table access faster. You want the people who are hammering on A to use whichever of I, J, K & L will give them the fastest response. To do much besides adding a row at the end of A or truncating it, you should use the indexes. The parititioning you suggest is a worthwhile one. >As I wrote at the beginning of this article, raw I/O fits my 'distributed' >model more closely than it does the 'monolithic' model. Um, well, you're taking an argument against your conclusion and trying to make it one in favor. The raw partition is used because one desires to use only the DBMS storage management structures, not those of the OS. That fits the mono definition much more closely than the separate- file definition. >Excluding this, though, >does anyone think that mono structures have any big advantages? Hard to say, since you've qualified the monolithic idea so as to make it nearly meaningless. > >Anyway I am glad that you have taken the time to write the perfect DBMS >back-end. Now all you need to do is sell it for Macs and PCs (not under unix, >either) and I'll be very happy. Oh, I see. UNIX (& OS/2?) non grata. You are very attached then to underpowered, facility-poor "OS"s like MS-DOS and Mac? I suggest you stick to toy applications to match those environments, then. >Whew. Now, let me ask a question without making any sweeping statements: For a change. >One point we all missed is the possibility of combining multiple tightly- >related tables on the same physical portion of the disk. This is known generically as "clustering". I believe Oracle does it. >The speed advantage >here might be worth a mono structure (but I would guess not). Some SQLs allow >you to create "clusters". Now, what exactly is the DBMS writing to the disk? One idea is that you try to place pages for joining tables contiguously on the same disk track. Then, when a join query comes along, you can suck the first set of (potentially) joining rows into memory with one disk read. Then your join becomes an in-memory operation. This is only really effective if you are joining on the primary (i.e. sort) keys of the tables, because that is the basis for the clustering. It is not very amenable to schema changes because it reduces independence of the data from the storage structure. >The only way I can think of for doing this is to actually store a join of the >two files. This could chew up an enourmous amount of disk space. What is it >doing? Fortunately, it's not done that way. I suggest that you read some textbooks on databases, such as C.J. Date's _Intro to Database Systems_, 2nd. Ed. before making many more declarative postings. -TW {ihnp4!pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim ..not an @ in the bunch...
daveb@geac.UUCP (David Collier-Brown) (09/11/88)
In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes: | Recently, Tim Wood wrote a response to my article on database file structures | ... I guess I was very unclear here. | What I was saying was that in many cases, with canned applications, you have | knowledge that certain tables will never be written to except by one user | (exceept on weekends, on Feb. 29, and during a Solar eclipse. Whatever...) | In this case, you need not establish a lock and can just do whateer you like | without worries. This is what I meant by it being 'not a problem'. From article <976@sybase.sybase.com>, by tim@linus.sybase.com (Tim Wood): | The only problem is trivial applications. Inability to support general | concurrency with consistency severely limits the ability of your system to | model the real-world problems you are trying to solve. Actually, one wants to use a system with locks even in the trivial case: what you're avoiding is the cost of dealing with collisions as opposed to the cost of locking. In a good system, the cost of establishing a lock can be small. For example, on Unix, one can write ones' lock-manager critical code as a dummy "device" driver[1], and ensure the software path for a successful lock is relatively short, on the order of magnitude of a single system call. This is not necessarily true of a collision-resolution system: there is a fair bit more work involved here, often (but not always) on the order of a process switch. (Yes, there are PC/Mac databases in which locking is frighteningly expensive... but they don't **have** to be that way: see also [1]) In a large library system my employer was (re)building on top of Ingres, we noted that the nature of the real-world construct being modeled precluded almost all possible collisions (No two people can borrow the same copy of a book at the same time, unless they tear it in half). This allowed us to run prototypes with no locking (because they had no real database!) and still get self-consistent results. When we did the next prototype, though, we used the database with locking turned on and saw a moderate speed improvement (The DBMS was smarter than the proto-kludge). We kept the locking, arguing that we could not preclude administrative-correction transactions which would require locking to function (ie, we found out the patron was using someone else's card, or that the book was mis-labeled, and a librarian was unmunging the now-incorrect model). The system ran "far faster"[2] than one which had collisions on the patron tuple when it tried to parallelize charge-out transactions... --dave (once upon a time a boring programmer-type who worked with extra-boring librarian types) c-b [1] This can be done on Macs and probably OSdiv2: see the last issue of MacTutor for an example. You can also cheat and make it always return YES in the single-machine single-user case. [2] Purely estimation, alas. -- David Collier-Brown. | yunexus!lethe!dave 78 Hillcrest Ave,. | He's so smart he's dumb. Willowdale, Ontario. | --Joyce C-B
alexis@dasys1.UUCP (Alexis Rosen) (09/13/88)
In article <976@sybase.sybase.com> tim@linus.sybase.com (Tim Wood) writes: >In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes: >>Recently, Tim Wood wrote a response to my article on database file structures >>...He missed ... that I was really addressing smaller machines. At any rate, >>I specifically excluded DBMSs which did raw I/O from my analysis. > >I don't believe the PC focus was clear. The article made rather >sweeping statements about different storage structures in general, but >the conclusions did not generalize beyond the single-user environment, >and maybe not within that. No. They don't generalize to OLTP environments on large systems. They DO generalize to most multi-user DBMSs running on micros. >As for raw I/O, if your DBMS does not take advantage of a mechanism in >the environment that offers potentially higher throughput, then it is >not the DBMS to use in your environment. [etc.] That's not the situation for micro users. In general that environment doesn't support raw I/O, which is why I excluded it. >>BTW, the raw I/O stuff looks like distributed structure to me. >No. A single raw disk partition [is a rotten idea...] You implied the ability to use multiple spindles with raw I/O. Obviously, limiting access to one partition will always be suboptimal. >But a flexible mono system will let you scale upward and choose your physical >layout. Then it's not a mono system. It's a hybrid, which is probably better than a pure form of either structure. (I should have mentioned this possibility originally, but again, there are no major DBMSs for PCs yet which offer this ability). >>In fact, the >>real breakdown, I guess, is between products that layer two levels of file >>access on top of each other vs. the ones that only have one layer [etc.] > >That's a lot closer to the real issue. > >> [discussion about various strategies to minimize record & table locks] > ... Inability to support general >concurrency with consistency severely limits the ability of your system to >model the real-world problems you are trying to solve. If you know >that your business is going to continue to operate from your garage >with 3 employees for the forseeable, you might go with manual control. >If your workload is going to grow, though, that will quickly become >untenable. It's much better to adopt a transaction-oriented DB design >from the start. [etc.] You're missing the point here. I agree that OLTP is a great thing. I'd love to be able to use it. But I can do things on a pc network without OLTP, that would take five times the money to do on a system powerful enough to use OLTP, WITHOUT increasing the chances of blowing away important data. It won't take enormous effort on my part, either. Of course, in five years or so OLTP will be a lot easier (with '786 or '080 CPUs and 32 MB of RAM in the average micro). For now, though, even large companies don't always have the extra money to throw at a problem that OLTP demands. They want it done on a micro network, next month, with industry-standard tools which hundreds of local programmers are already familiar with. >Moreover, with an active data dictionary, there will be concurrent >updates to the master object directory (e.g. "sysobjects"), even if no >user ever accesses an object in use by another, because the data >dictionary must maintain changing information about all the objects. So? Not too many micro DBMSs out there with any kind of data dictionary. That sucks, I know, but I can still do useful work in them. >>>As for the disk strategies, they're fine. I should mention that a >>>mono organization should not prohibit you from using other disk >>>devices (or files). Then it's not a monolithic structure, is it? >>Well, it's easy enough to do disk striping, but I'm not convinced that the >>DBMS is always going to be smarter than I am. What about prioritization? >>[example of prioritization/partitioning] I am not saying that it's impossible >>to write a DBMS which can handle things like this. I am saying that that >>would involve a fair amount of AI, and I haven't seen anything like it yet. > >That's not AI, that's database design. >The DBMS needs to offer the facilities for the DBA to specify physical >storage usage at various levels. Good DB design and layout are powerful means >of obtaining performance. [etc.] >The parititioning you suggest is a worthwhile one. That's AI when the DBMS figures out the partitioning itself. Without that, the DBA has to put various files in various different physical places, and that's not a mono structure anymore... >>As I wrote at the beginning of this article, raw I/O fits my 'distributed' >>model more closely than it does the 'monolithic' model. > >Um, well, you're taking an argument against your conclusion and trying >to make it one in favor. The raw partition is used because one desires >to use only the DBMS storage management structures, not those of the >OS. That fits the mono definition much more closely than the separate- >file definition. It's hardly an argument against my conclusion when I specifically excepted it. Regardless, monolithic (by my definition) means 'one physical file' (this does not prohibit disk striping). The DBMS you describe allows raw I/O on several different volumes, which may live on different machines. They are separate devices and separate physical files. It also allows the DBA to assign specific logical files (tables) to specific physical files. This is the outstanding characteristic of distributed structures. In fact, you're describing a DBMS which is a hybrid. As I said before, the hybrid is probably the best way to do things. If you disagree with my definition, fine. I'll agree to disagree... >>Excluding this, though, >>does anyone think that mono structures have any big advantages? > >Hard to say, since you've qualified the monolithic idea so as to make it >nearly meaningless. I haven't. Actually there is one advantage which we both forgot (thanks to Dennis Cohen who reminded me). Under certain OSs (especially Micro OSs) opening a file can take a great deal of time. There may also be serious limits on the number of files you can keep open at any one time. These are both problems for a distributed file structure. I don't believe they outweigh the advantages, though- at least not with MS-DOS or Mac OS. >>Anyway I am glad that you have taken the time to write the perfect DBMS >>back-end. Now all you need to do is sell it for Macs and PCs (not under unix, >>either) and I'll be very happy. > >Oh, I see. UNIX (& OS/2?) non grata. >You are very attached then to underpowered, facility-poor "OS"s like >MS-DOS and Mac? I suggest you stick to toy applications to match those >environments, then. I suggest you stick to what you know about (OLTP). These OSs, whatever their faults (and they are legion), dominate the PC market. Nevertheless I and many other people have developed many non-trivial applications in these environments. >>Whew. Now, let me ask a question without making any sweeping statements: > >For a change. >...I suggest that you read some >textbooks on databases, such as C.J. Date's _Intro to Database Systems_, >2nd. Ed. before making many more declarative postings. Don't be nasty. It doesn't advance your position. Should I suggest to you that you read an introductory book on microcomputers? All of the foregoing discusses whether or not various benfits I attributed to distributed file DBMSs are applicable to mono DBMSs as well. Except what I wrote about file-opening overhead, I have yet to hear any advantages the mono structure has. (This is _specifically_ a mono structure managed by the OS). Are there any? ---- Alexis Rosen {allegra,philabs,cmcl2}!phri\ Writing from {harpo,cmcl2}!cucard!dasys1!alexis The Big Electric Cat {portal,well,sun}!hoptoad/ Public UNIX if mail fails: ...cmcl2!cucard!cunixc!abr1 Best path: uunet!dasys1!alexis
jkrueger@daitc.daitc.mil (Jonathan Krueger) (09/15/88)
In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes: >The response by Jon Krueger wasn't on-target I don't think we're aiming for the same target. I'm interested in defining and solving database problems. By trade I'm expected to identify classes of machines capable of implementing a given solution. Articles listing capabilities of machines of a specified size, shape, or price tag are relevant to this newsgroup and helpful to me. Articles that present current limitations of unspecified machines as inherent characteristics of database management systems waste my time and mislead the uninformed. For instance: >Any monolithic structure will impose arbitrary restrictions on >the number of tables and columns allowed in the database. is false and misleading. If it were qualified, for instance: As of this writing, I know of no commercially available software running under MS-DOS that uses a monolithic structure that does not also impose arbitrary restrictions on the number of tables and columns allowed in the database. it might be correct. If it were upgraded to market research: Surveying current limits of some commmercially available database management systems, I find: Name OS Type MaxTables MaxCols Foobase MS-DOS M 20 50 Barbase MS-DOS D Unlimited 255 it might even be helpful. And if you stated and analyzed the trend: Total costs (hardware, operating system, database) are currently: arbitrary restrictions large or unlimited monolithic $10,000 and up $50,000 and up distributed $2000 to $10,000 $10,000 and up Most of the difference in cost results from moving from single-user machines to shared systems. Cost per user may be calculated (etc.) your contribution to this group would be welcome and appreciated. -- Jon -- Jonathan Krueger uunet!daitc!jkrueger jkrueger@daitc.arpa (703) 998-4777 Inspected by: No. 15