duncant@mbunix.mitre.org (Thomson) (11/05/90)
I'm curious about something: I understand that, on unix, the file system is designed so that a file always looks like a sequence of bytes, with no record structure at all. Is this correct? If so, how does one implement an efficient database manager on unix in a standard, portable, way? To be efficient, a database manager needs to have random access into files on a record-oriented basis. It seems to me that fseek() wouldn't do the job. (Am I wrong here?) If unix doesn'`t provide a record-oriented view of files, then any database implementation would have to go below unix, and access the mass storage devices directly. Is this right? I know there are database managers for unix, so there must be ways to do it.... I'm just curious about this, not planning to write a huge efficient database manager for unix or anything... -- (Please excuse the typos and garbage caused by line noise.)
cpcahil@virtech.uucp (Conor P. Cahill) (11/05/90)
In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes: >I understand that, on unix, the file system is designed so that a file always >looks like a sequence of bytes, with no record structure at all. No *system imposed* record structure. >If so, how does one implement an efficient database manager on unix in >a standard, portable, way? To be efficient, a database manager needs to By having an application imposed record structure. >have random access into files on a record-oriented basis. It seems to me >that fseek() wouldn't do the job. Most UNIX DBMSs will use read/write/lseek as opposed to the stdio functions to ensure that the stdio buffering does not get in the way. >provide a record-oriented view of files, then any database implementation >would have to go below unix, and access the mass storage devices directly. >Is this right? Nope. It would only have to impose its own record structure on the file. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
thad@cup.portal.com (Thad P Floryan) (11/05/90)
duncant@mbunix.mitre.org (Thomson) in <125379@linus.mitre.org> writes:
I understand that, on unix, the file system is designed so that a file
always looks like a sequence of bytes, with no record structure at
all. Is this correct?
YES, thank goodness! Contrast that UNIX view of a "file" to that on, say,
VAX/VMS where you find eleventy-seven RMS file types that complicate efficient
and portable I/O beyond belief. I have a commercial product in that market,
and I'm now porting it to UNIX, so this is not idle speculation.
If so, how does one implement an efficient database manager on unix in
a standard, portable, way? To be efficient, a database manager needs
to have random access into files on a record-oriented basis. It seems
to me that fseek() wouldn't do the job. (Am I wrong here?) If unix
doesn'`t provide a record-oriented view of files, then any database
implementation would have to go below unix, and access the mass
storage devices directly. Is this right?
One can impose any "view" on the file one desires. Assuming fixed-length
'records' and no funny-stuff at the beginning of the file, a typical method
to calculate any record's relative address in the file could be:
address = (record_number - 1) * sizeof(record_structure);
and that "address" would be used per "lseek(fd, (long)address, 0);". See
the writeup of lseek(2) for the meaning of its 3rd parameter which provides
some interesting options. Of course, a real DBMS could be "smarter" and
calculate a block address instead, (possibly) map that into memory, and
then calculate the record's in-core offset from the beginning of that buffer.
If you're going in for really big files whose 'records' might even be
variable-length, use a secondary index file(s) whose records are fixed length
and "point" to the address of their associated data records in the big file.
Common datafile index methods are B-tree and ISAM.
And if you're REALLY concerned about efficiency and your OS version permits
it, go for either the FFS or a 4K or 8K filesystem which could even be a
separate mount and dedicated to DBMS applications. Some "database" vendors
have claimed they've written their own filesystems due to perceived problems
with UNIX' filesystems, but I haven't seen the need for that even with some
of the humongous data files with which I operate. And a custom file system
means you're going to need a custom backup-and-restore facility and the
attendant special procedures.
Many standard filesystems are either 1K, 2K or 4K. This means the smallest
allocated space for a given file (ignoring sparse files) would be that size.
It also means that for small files you may end up with a lot of "wasted"
space at the end of each file. The 1K, for example, means the logical block
size comprises two 512-byte real sectors.
Stick with the "standard" software and tools for greater portability, and
switch to custom methods only if the specific case warrants it. With today's
modern UNIX systems and fast I/O subsystems you may be pleasantly surprised.
One final comment: you used the word "portable" often. If that is of concern,
then you may wish to store your numeric data in ASCII form even though there
is a conversion penalty. To move binary data files amongst systems such as
a 386/486 and 680x0 and SPARC and MIPS and VAX and ... is asking for trouble,
even for integer data.
Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]
gwyn@smoke.brl.mil (Doug Gwyn) (11/05/90)
In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes: >I understand that, on unix, the file system is designed so that a file always >looks like a sequence of bytes, with no record structure at all. To be more precise, the operating system itself does not impose any record structure on disk files within the standard hierarchical file system. Some device types, for example magnetic tape or punched-card reader, might have their own idea of what constitutes a "record" (normally each such record would have a length specified by the UNIX write() system call that provided its data, in the case of magnetic tape, or a particular fixed length, for a card reader). Also, the terminal handler under typical operation collects input from a terminal port up through a new-line and treats it in many respects as a (variable-length) record, although in this case partial, kernel-buffered reads are fully supported. >If so, how does one implement an efficient database manager on unix in >a standard, portable, way? To be efficient, a database manager needs to >have random access into files on a record-oriented basis. It seems to me >that fseek() wouldn't do the job. For normal disk files, applications are responsible for maintaining whatever structure they wish to use. Clearly, lseek() is suitable for getting directly to any known position within the file; if a fixed record size is assumed, then the arithmetic for the byte offset is trivial. For variable-sized records, a variety of organizations are possible. (In fact, this is a big win for the UNIX approach.) A typical one uses a separate "index file" with fixed, small record size that points into a large variable-sized record database file. B-trees and other structures are also commonly used. >If unix doesn't provide a record-oriented view of files, then any database >implementation would have to go below unix, and access the mass storage >devices directly. No, not at all, although a couple of database managers do support that mode in order to bypass the kernel overhead for the block-buffered inode-based file system.
rwhite@nusdecs.uucp (0257014-Robert White(140)) (11/06/90)
In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes: >I understand that, on unix, the file system is designed so that a file always >looks like a sequence of bytes, with no record structure at all. >Is this correct? You are correct. > [How do you do dbms and "records" question here] counting "records" starting from zero you fseek() to (record_num * record_size) and then you get the data by reading record_size bytes all at once. In general fact (excepting some IBM systems and the like) this is what every "record oriented" operating system does for you every time. Putting the record-type-info in the programs that use the data is "arguably better" because 'most' filing constructs do not need the overhead (in sorage and wasted processing) associated with record-oriented storage. e.g. of the following filing constructs Text File Flat File Program Executable Library File Archive File Directory DBMS Data File Index File (arguably trival of above) only the last three are really(tm) fixed-length/known-variable-length record structured files. Rob.
samlb@pioneer.arc.nasa.gov (Sam Bassett RCS) (11/06/90)
In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes: >then any database implementation >would have to go below unix, and access the mass storage devices directly. Sybase (and I believe Oracle) does exactly this -- raw partitions are set aside and manipulated by the db manager, outside of the UNIX file structure. >Is this right? Yes, if you mean "is that so", as to whether it is the most correct thing to do, I haven't the expertise to answer . . . Sam'l Bassett, Sterling Software @ NASA Ames Research Center, Moffett Field CA 94035 Work: (415) 604-4792; Home: (415) 969-2644 samlb@well.sf.ca.us samlb@ames.arc.nasa.gov <Disclaimer> := 'Sterling doesn't _have_ opinions -- much less NASA!'
drd@siia.mv.com (David Dick) (11/08/90)
In <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes: >I'm curious about something: >I understand that, on unix, the file system is designed so that a file always >looks like a sequence of bytes, with no record structure at all. >Is this correct? Yes. >If so, how does one implement an efficient database manager on unix in >a standard, portable, way? To be efficient, a database manager needs to >have random access into files on a record-oriented basis. Why does a database manager need record-level access? Is the calculation of the address of a record going to be more efficient in supervisor mode than in user mode? Doesn't a database manager want to decide exactly where data will be located and how it will be accessed? If the OS provides record-level access *it* will be deciding these things. >that fseek() wouldn't do the job. (Am I wrong here?) If unix doesn'`t >provide a record-oriented view of files, then any database implementation >would have to go below unix, and access the mass storage devices directly. >Is this right? I don't think this is right. Lack of record-level access does not imply the necessity for direct access. Direct access should be used anyway, for efficiency. Besides, UNIX *does* allow access to raw disk, which can be done in a portable way. That's what the feature is for. >I know there are database managers for unix, so there must be ways to >do it.... The real reason to use "raw" disk is to avoid the logical-to-physical mapping that UNIX filesystem code does in order to provide the array-of-bytes model for a file. That mapping provides some overhead that can be avoided by using the raw disk. I don't think a good DBMS implementation should depend on an OS record access implementation; it should access a disk as directly as it can, so whether the OS provides record-level access is irrelevant, IMHO. David Dick Software Innovations, Inc. [the Software Moving Company (sm)]
rob@b15.INGR.COM (Rob Lemley) (11/09/90)
In <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:
:I understand that, on unix, the file system is designed so that a file always
:looks like a sequence of bytes, with no record structure at all.
:If so, how does one implement an efficient database manager on unix in
:a standard, portable, way? To be efficient, a database manager needs to
:have random access into files on a record-oriented basis. It seems to me
:that fseek() wouldn't do the job. (Am I wrong here?)
yes
: If unix doesn'`t
:provide a record-oriented view of files, then any database implementation
:would have to go below unix, and access the mass storage devices directly.
:Is this right?
Absolutely not, in fact, relational databases have been implemented on
UNIX which make extensive use of shell scripts.
A good book on this subject is:
UNIX Relational Database Management
(Application Development in the UNIX Environment)
by Rod Manis, Evan Schaffer, and Robert Jorgenson.
Prentice-Hall 1988.
Rob
--
Rob Lemley
System Consultant, Scanning Software, Intergraph, Huntsville, AL
rcl@b15.ingr.com OR ...!uunet!ingr!b15!rob
205-730-1546