david@ukma.UUCP (David Herron, NPR Lover) (12/16/85)
I just finished reading dmr's streams article from the BSTJ of last year, and an idea has occurred to me (actually re-occurred). I originally had this idea when reading about sockets... Could one use a variant of this to provide "funny" kinds of files? What I mean is, ISAM is merely a protocol for using a file, right? So to use a database "properly", open the file and push a <something> onto the file which does the indexing/etc. And you would still be able to use the file in a "raw" mode for low-level patching, etc. Is everybody confused now? -- David Herron, cbosgd!ukma!david, david@UKMA.BITNET. Experience is something you don't get until just after you need it.
greg@ncr-sd.UUCP (Greg Noel) (12/20/85)
In article <2416@ukma.UUCP> david@ukma.UUCP (David Herron, NPR Lover) writes: >I just finished reading dmr's streams article from the BSTJ of last year, > .... >Could one use a variant of this to provide "funny" kinds of files? > >What I mean is, ISAM is merely a protocol for using a file, right? >So to use a database "properly", open the file and push a <something> >onto the file which does the indexing/etc. And you would still >be able to use the file in a "raw" mode for low-level patching, etc. I spoke to DMR after he presented his paper and suggested something similar. In fact, if you look at it closely, a buffered filesystem is just a specialized interface pushed on top of the raw filesystem. I would like to see the whole stream mechanism generalized so that I could push a stream module in front of \any/ file, not just a tty file. This seems to be a simplification and unification of the concept of file accessing, and Unix is renouned for the simplification and unification of concepts, so why not? -- -- Greg Noel, NCR Rancho Bernardo Greg@ncr-sd.UUCP or Greg@nosc.ARPA
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/22/85)
> I would like to see the whole stream mechanism generalized so that I > could push a stream module in front of \any/ file, not just a tty file. Several of us have discussed this idea before. It comes down to the fact that there really are significant differences between random-access (disk) files and sequential (communication) files. To force random files into the stream model would require sacrificing some of their desirable properties (seekability, sharability, speed), alas. Nice try, though.
david@ukma.UUCP (David Herron, NPR Lover) (12/24/85)
In article <964@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) writes: >> I would like to see the whole stream mechanism generalized so that I >> could push a stream module in front of \any/ file, not just a tty file. > >Several of us have discussed this idea before. > ... > To force random files into the stream >model would require sacrificing some of their >desirable properties (seekability, sharability, >speed), alas. > >Nice try, though. Why do you lose "seekability"? Pushing a protocol on top of a file potentially brings in a whole set of ioctl()'s that can be performed. So lseek() becomes ioctl(fd, FLSEEK, offset) or some such. Other operations can be performed with similar ioctl()'s. Right? Or am I missing something? -- David Herron, cbosgd!ukma!david, david@UKMA.BITNET. Experience is something you don't get until just after you need it.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (12/26/85)
> Why do you lose "seekability"? Pushing a protocol on top of a file > potentially brings in a whole set of ioctl()'s that can be performed. > So lseek() becomes ioctl(fd, FLSEEK, offset) or some such. Other > operations can be performed with similar ioctl()'s. By the time you have made open() slip necessary protocol modules onto disk files, etc. you end up with just another, kludgier, implementation of the UNIX file system. Streams work as nicely as they do because they are modeled as full-duplex pipes with protocol "filters" inserted into the pipeline. Stream data comes from somewhere and goes somewhere. Disk files just sit; there is no inherent "flow". File data is not normally "consumed", for example. Now, FIFOs and pipes are a different matter, and indeed pipes are implemented using streams on 8th Edition UNIX. One could certainly add features to ordinary files. I once thought up one, which Mike Karels told me had already been invented and called "portals". This would be code associated with a file that would "fire up" when the file was accessed; not too different from the idea of attaching protocol modules to files. Something like this might be worth doing, especially for databasing, but not by stretching the stream I/O system beyond its basic model.
greg@ncr-sd.UUCP (Greg Noel) (12/26/85)
In article <964@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) claims that there are >significant differences between random-access >(disk) files and sequential (communication) >files. To force random files into the stream >model would require sacrificing some of their >desirable properties (seekability, sharability, >speed), alas. I'm prepared to believe that the current implementation of streams makes this difficult or impossible, but I'm not prepared to believe that the idea itself is difficult or impossible. After all, what is filesystem buffering except a protocol pushed on top of raw filesystems? Perhaps we should take this conversation off-line while you try to convince me, since it doesn't seem to be something of general interest. -- -- Greg Noel, NCR Rancho Bernardo Greg@ncr-sd.UUCP or Greg@nosc.ARPA
ark@ut-sally.UUCP (Arthur M. Keller) (12/27/85)
In article <376@ncr-sd.UUCP> greg@ncr-sd.UUCP (Greg Noel) writes: >In article <964@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) >claims that there are >>significant differences between random-access >>(disk) files and sequential (communication) >>files. To force random files into the stream >>model would require sacrificing some of their >>desirable properties (seekability, sharability, >>speed), alas. > >I'm prepared to believe that the current implementation of streams makes >this difficult or impossible, but I'm not prepared to believe that the >idea itself is difficult or impossible. Some people consider random access to be "stream access with repositioning". That might work well in a single-user environment but fails badly in a multi-user environment with concurrency control. In a file system or a database system, the granularity of locking is an important concept. It corresponds to the granularity of sharing. It may but need not correspond to the granularity of access, although the two must be comparable (equal or one contained in the other for all practical purposes). In streams, the granularity of access is not a very well defined concept. You don't really want to lock the "next 100 bytes I will next read". Rather, you probably want to lock the "next group of related fields" commonly called a record. The notion of streams generally indicates one source and one consumer for each dataflow pathway, although there may or may not be a separate, often implicit reverse pathway. The notion can be generalized to multiple producers and consumers, but still obeying a FIFO (or priority queue) discipline. No notion of modification of the data in a stream exists, although a transformer may be interposed that consumes a stream and produces another distinct stream by a transformation of the consumed stream. In a randomly accessed, shared file system, the data do not follow anything resembling a FIFO discipline. The data are not consumed, they are referenced; they may also be created, updated, and destroyed. To consider a shared file to be a stream actually means that the file is encapsulated in a process and your stream is communicating with that process. Then the process uses a more traditional discipline to interact with the file. This would involve a protocol transformation and the attendant overhead. I've only briefly touched on some of the issues, but I hope that this can give the readership of this newsgroup a feeling for the problems involved. Prof. Arthur M. Keller The University of Texas at Austin Department of Computer Sciences -- ------------------------------------------------------------------------------ Arpanet: ARK@SALLY.UTEXAS.EDU UUCP: {gatech,harvard,ihnp4,pyramid,siesmo}!ut-sally!ark
mishkin@apollo.uucp (Nathaniel Mishkin) (12/27/85)
At Apollo, we've developed a system for extending the concept of "stream". Basically, every "object" (read "file" if you're not familiar with the Smalltalk/object-oriented view of the world) has a type and every type has a "manager" (read "subroutine library") that contains one entry point (procedure) for each "operation" (read "generic procedure") that the type supports. In the case of stream I/O (the typed object view of the world extends beyond simple stream I/O), the operations are things like "read", "write", "seek", etc. Operations are grouped into "traits" (read "interfaces"). The stream I/O facility defines a set of traits including "IO" (contains the basic I/O operations listed above), "Socket" (contains the 4.2bsd socket operations like "bind", "connect", "listen", etc.), and "Pad" (contains operations for manipulating windows). User's can define new types, write their associated managers, install them into the system (without having to touch existing system source code), create objects of the new types and (lo and behold) have existing programs that use the stream I/O interface (i.e. that call "read", "write", etc.) work on the new objects. Type IDs are 64 bits and unique, so there's never a problem when moving objects from one system to another (as long as you bring the managers with you). Managers run in user state and are dynamically loaded when needed. As you might guess, there are lots of uses for this facility. For example, in the case of a DBMS, one simple facility you might want is to be able to read the DBMS like a sequential ASCII file, independent of what its real internal structure is. This might not be appropriate or reasonable for every database, but to take a simple example, today, you need a special program in order to dump the contents of a "dbm(3X)" database file. On an Apollo, such files could be typed as being "dbm" files. Then, you could write a manager for the "dbm" type that implemented the IO trait operations for that type. (You can think of the manager as being a different form of the special dump program.) After you did this, you'd be able to run program like "grep" on the "dbm" files and get useful results. If you were really ambitious, you might want to define a new trait -- call it the "ISAM" trait -- that had operations like "seek_by_key" that took logical keys (i.e. NOT byte offsets) as arguments. (Currently, only Apollo -- not users -- can define new TRAITS, but we hope to have this fixed sometime.) The idea is that this trait could be supported by different DBMS's, that you'd be able to write programs that used those operations, and that those programs would work with different DBMSs. Of course it's not clear that you could come up with a set of operations that made sense to enough different DBMS's to be worthwhile. (Consider the question of what the term "key" means to different DBMS's.) But it'd be interesting to investigate.
ark@ut-sally.UUCP (Arthur M. Keller) (12/29/85)
In article <2afa6c05.3166@apollo.uucp> mishkin@apollo.UUCP (Nathaniel Mishkin) writes: >At Apollo, we've developed a system for extending the concept of "stream". I would argue that what you have really done is implemented the concept of streams using the concept of objects. Since the concept of objects is at least as powerful as arbitrary procedure calls, this is not too surprising. >As you might guess, there are lots of uses for this facility. For example, >in the case of a DBMS, one simple facility you might want is to be able >to read the DBMS like a sequential ASCII file, independent of what its >real internal structure is. This is the concept of information hiding. You are using the features common to streams and databases, so it is not surprising you can fit a streams-like interface on top of a database. It's also possible that the streams-like interface was chosen to be a subset of the database-like interface, but that's not necessary. >If you were really ambitious, you might want to define a new trait -- >call it the "ISAM" trait -- that had operations like "seek_by_key" that >took logical keys (i.e. NOT byte offsets) as arguments. If you do decide to do that, I'd suggest supporting any number of keys, not just one, and don't unnecessary bind the concept of a unique index with the concept of a clustered index, but that's a whole 'nother pet peeve of mine. >Of course it's not clear that you could come up with a set of operations >that made sense to enough different DBMS's to be worthwhile. (Consider >the question of what the term "key" means to different DBMS's.) But >it'd be interesting to investigate. It should be possible to do it with any two databases designed using the same model (relational, hierarchical, network, entity-relational, functional, or what-have-you). Furthermore, a sufficiently powerful network-model database implementation could have a relational-model interface on it, which to the user would be indistinguishable from a relational-model database written using a traditional relational implementation style. Arthur M. Keller -- ------------------------------------------------------------------------------ Arpanet: ARK@SALLY.UTEXAS.EDU UUCP: {gatech,harvard,ihnp4,pyramid,siesmo}!ut-sally!ark
jack@boring.UUCP (Jack Jansen) (01/02/86)
Doug Gwyn states that files are completely different from pipes/FIFOs/etc, in that a diskfile doesn't have a data flow, like the others. I agree on this, but there's a way to make a file look like a stream: just say that your not talking to a *file*, but to a *file server*. This way, you get your stream model back. Now it's easy to insert modules that do ASCII-EBCDIC conversion, sparse file handling, database lookups, even readahead/writebehind, without modifying the basic low-level file server. There are great advantages to the file-server model: - You don't pay for features you didn't ask for (ever heard database people raving about unix readahead?) - It's easier to maintain, since it consists of more, but smaller modules. - Remote filesystems come for (almost) free. This is, by the way, the approach used in the Amoeba distributed operating system, and in some other message-passing operating systems. Hmm. Time to move to net.os? -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
larry@ingres.ARPA (Larry Rowe) (01/06/86)
In article <6717@boring.UUCP> jack@mcvax.UUCP (Jack Jansen) writes: > >There are great advantages to the file-server model: >- You don't pay for features you didn't ask for (ever heard >database people raving about unix readahead?) In single-user benchmarks read-ahead is a definite win for queries that scan a lot of pages to answer a query. The reason should be pretty obvious: read-ahead overlaps cpu and i/o processing. a simple dbms will run very nicely with unix-style read-ahead but a sophisticated dbms will eventually have to replace the general operating system read-ahead with a smarter read-ahead. the reason has to do with what page gets read on read-ahead. most dbms's impose a page structure on the data file that includes a forward pointer to the next primary page and a pointer to the overflow pages for the current page. when doing read-ahead, you want to scan the pages in the order: primary page, overflow page overflow page, ..., primary page, overflow page, etc. general unix read-ahead reads the next logically sequential page which won't give this ordering. larry