[comp.unix.questions] unix file structure

duncant@mbunix.mitre.org (Thomson) (11/05/90)

I'm curious about something:

I understand that, on unix, the file system is designed so that a file always
looks like a sequence of bytes, with no record structure at all.

Is this correct?

If so, how does one implement an efficient database manager on unix in
a standard, portable, way?  To be efficient, a database manager needs to
have random access into files on a record-oriented basis.  It seems to me
that fseek() wouldn't do the job.  (Am I wrong here?)  If unix doesn'`t
provide a record-oriented view of files, then any database implementation 
would have to go below unix, and access the mass storage devices directly.

Is this right?

I know there are database managers for unix, so there must be ways to
do it....

I'm just curious about this, not planning to write
a huge efficient database manager for unix or anything...


--
(Please excuse the typos and garbage caused by line noise.)

cpcahil@virtech.uucp (Conor P. Cahill) (11/05/90)

In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:
>I understand that, on unix, the file system is designed so that a file always
>looks like a sequence of bytes, with no record structure at all.

No *system imposed* record structure.

>If so, how does one implement an efficient database manager on unix in
>a standard, portable, way?  To be efficient, a database manager needs to

By having an application imposed record structure.

>have random access into files on a record-oriented basis.  It seems to me
>that fseek() wouldn't do the job.  

Most UNIX DBMSs will use read/write/lseek as opposed to the stdio functions
to ensure that the stdio buffering does not get in the way.

>provide a record-oriented view of files, then any database implementation 
>would have to go below unix, and access the mass storage devices directly.
>Is this right?

Nope.  It would only have to impose its own record structure on the file.


-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

thad@cup.portal.com (Thad P Floryan) (11/05/90)

duncant@mbunix.mitre.org (Thomson) in <125379@linus.mitre.org> writes:

	I understand that, on unix, the file system is designed so that a file
	always looks like a sequence of bytes, with no record structure at
	all.  Is this correct?

YES, thank goodness!  Contrast that UNIX view of a "file" to that on, say,
VAX/VMS where you find eleventy-seven RMS file types that complicate efficient
and portable I/O beyond belief.  I have a commercial product in that market,
and I'm now porting it to UNIX, so this is not idle speculation.

	If so, how does one implement an efficient database manager on unix in
	a standard, portable, way?  To be efficient, a database manager needs
	to have random access into files on a record-oriented basis.  It seems
	to me that fseek() wouldn't do the job.  (Am I wrong here?) If unix
	doesn'`t provide a record-oriented view of files, then any database
	implementation would have to go below unix, and access the mass
	storage devices directly.  Is this right?

One can impose any "view" on the file one desires.  Assuming fixed-length
'records' and no funny-stuff at the beginning of the file, a typical method
to calculate any record's relative address in the file could be:

	address = (record_number - 1) * sizeof(record_structure);

and that "address" would be used per "lseek(fd, (long)address, 0);".  See
the writeup of lseek(2) for the meaning of its 3rd parameter which provides
some interesting options.  Of course, a real DBMS could be "smarter" and
calculate a block address instead, (possibly) map that into memory, and
then calculate the record's in-core offset from the beginning of that buffer.

If you're going in for really big files whose 'records' might even be
variable-length, use a secondary index file(s) whose records are fixed length
and "point" to the address of their associated data records in the big file.
Common datafile index methods are B-tree and ISAM.

And if you're REALLY concerned about efficiency and your OS version permits
it, go for either the FFS or a 4K or 8K filesystem which could even be a
separate mount and dedicated to DBMS applications.  Some "database" vendors
have claimed they've written their own filesystems due to perceived problems
with UNIX' filesystems, but I haven't seen the need for that even with some
of the humongous data files with which I operate.  And a custom file system
means you're going to need a custom backup-and-restore facility and the
attendant special procedures.

Many standard filesystems are either 1K, 2K or 4K.  This means the smallest
allocated space for a given file (ignoring sparse files) would be that size.
It also means that for small files you may end up with a lot of "wasted"
space at the end of each file.  The 1K, for example, means the logical block
size comprises two 512-byte real sectors.

Stick with the "standard" software and tools for greater portability, and
switch to custom methods only if the specific case warrants it.  With today's
modern UNIX systems and fast I/O subsystems you may be pleasantly surprised.

One final comment: you used the word "portable" often.  If that is of concern,
then you may wish to store your numeric data in ASCII form even though there
is a conversion penalty.  To move binary data files amongst systems such as
a 386/486 and 680x0 and SPARC and MIPS and VAX and ... is asking for trouble,
even for integer data.

Thad Floryan [ thad@cup.portal.com (OR) ..!sun!portal!cup.portal.com!thad ]

gwyn@smoke.brl.mil (Doug Gwyn) (11/05/90)

In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:
>I understand that, on unix, the file system is designed so that a file always
>looks like a sequence of bytes, with no record structure at all.

To be more precise, the operating system itself does not impose any record
structure on disk files within the standard hierarchical file system.
Some device types, for example magnetic tape or punched-card reader, might
have their own idea of what constitutes a "record" (normally each such
record would have a length specified by the UNIX write() system call that
provided its data, in the case of magnetic tape, or a particular fixed
length, for a card reader).  Also, the terminal handler under typical
operation collects input from a terminal port up through a new-line and
treats it in many respects as a (variable-length) record, although in this
case partial, kernel-buffered reads are fully supported.

>If so, how does one implement an efficient database manager on unix in
>a standard, portable, way?  To be efficient, a database manager needs to
>have random access into files on a record-oriented basis.  It seems to me
>that fseek() wouldn't do the job.

For normal disk files, applications are responsible for maintaining
whatever structure they wish to use.  Clearly, lseek() is suitable for
getting directly to any known position within the file; if a fixed record
size is assumed, then the arithmetic for the byte offset is trivial.

For variable-sized records, a variety of organizations are possible.
(In fact, this is a big win for the UNIX approach.)  A typical one uses
a separate "index file" with fixed, small record size that points into
a large variable-sized record database file.  B-trees and other structures
are also commonly used.

>If unix doesn't provide a record-oriented view of files, then any database
>implementation would have to go below unix, and access the mass storage
>devices directly.

No, not at all, although a couple of database managers do support that mode
in order to bypass the kernel overhead for the block-buffered inode-based
file system.

rwhite@nusdecs.uucp (0257014-Robert White(140)) (11/06/90)

In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:
>I understand that, on unix, the file system is designed so that a file always
>looks like a sequence of bytes, with no record structure at all.
>Is this correct?

You are correct.

> [How do you do dbms and "records" question here]

counting "records" starting from zero you fseek() to
	(record_num * record_size)

and then you get the data by reading record_size bytes all at once.

In general fact (excepting some IBM systems and the like) this is
what every "record oriented" operating system does for you every
time.  Putting the record-type-info in the programs that use the
data is "arguably better" because 'most' filing constructs do not
need the overhead (in sorage and wasted processing) associated
with record-oriented storage.

e.g. of the following filing constructs

Text File
Flat File
Program Executable
Library File
Archive File
Directory
DBMS Data File
Index File (arguably trival of above)

only the last three are really(tm) fixed-length/known-variable-length
record structured files.

Rob.

samlb@pioneer.arc.nasa.gov (Sam Bassett RCS) (11/06/90)

In article <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:
>then any database implementation 
>would have to go below unix, and access the mass storage devices directly.

	Sybase (and I believe Oracle) does exactly this -- raw partitions
are set aside and manipulated by the db manager, outside of the UNIX file
structure.

>Is this right?

	Yes, if you mean "is that so", as to whether it is the most
correct thing to do, I haven't the expertise to answer . . .

Sam'l Bassett, Sterling Software @ NASA Ames Research Center, 
Moffett Field CA 94035 Work: (415) 604-4792;  Home: (415) 969-2644
samlb@well.sf.ca.us                     samlb@ames.arc.nasa.gov 
<Disclaimer> := 'Sterling doesn't _have_ opinions -- much less NASA!'

drd@siia.mv.com (David Dick) (11/08/90)

In <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:

>I'm curious about something:

>I understand that, on unix, the file system is designed so that a file always
>looks like a sequence of bytes, with no record structure at all.

>Is this correct?

Yes.

>If so, how does one implement an efficient database manager on unix in
>a standard, portable, way?  To be efficient, a database manager needs to
>have random access into files on a record-oriented basis.

Why does a database manager need record-level access?  Is the calculation
of the address of a record going to be more efficient in supervisor
mode than in user mode?  Doesn't a database manager want to decide
exactly where data will be located and how it will be accessed?
If the OS provides record-level access *it* will be deciding these 
things.

>that fseek() wouldn't do the job.  (Am I wrong here?)  If unix doesn'`t
>provide a record-oriented view of files, then any database implementation 
>would have to go below unix, and access the mass storage devices directly.

>Is this right?

I don't think this is right.  Lack of record-level access does
not imply the necessity for direct access.  Direct access should
be used anyway, for efficiency.  Besides, UNIX *does* allow access to
raw disk, which can be done in a portable way.  That's what the
feature is for.

>I know there are database managers for unix, so there must be ways to
>do it....

The real reason to use "raw" disk is to avoid the logical-to-physical
mapping that UNIX filesystem code does in order to provide the
array-of-bytes model for a file.  That mapping provides some overhead
that can be avoided by using the raw disk.

I don't think a good DBMS implementation should depend on 
an OS record access implementation; it should access a disk 
as directly as it can, so whether the OS provides record-level
access is irrelevant, IMHO.

David Dick
Software Innovations, Inc. [the Software Moving Company (sm)]

rob@b15.INGR.COM (Rob Lemley) (11/09/90)

In <125379@linus.mitre.org> duncant@mbunix.mitre.org (Thomson) writes:

:I understand that, on unix, the file system is designed so that a file always
:looks like a sequence of bytes, with no record structure at all.

:If so, how does one implement an efficient database manager on unix in
:a standard, portable, way?  To be efficient, a database manager needs to
:have random access into files on a record-oriented basis.  It seems to me
:that fseek() wouldn't do the job.  (Am I wrong here?)

yes

:                                                       If unix doesn'`t
:provide a record-oriented view of files, then any database implementation 
:would have to go below unix, and access the mass storage devices directly.

:Is this right?

Absolutely not, in fact, relational databases have been implemented on
UNIX which make extensive use of shell scripts.

A good book on this subject is:

	UNIX Relational Database Management
	(Application Development in the UNIX Environment)
	by Rod Manis, Evan Schaffer, and Robert Jorgenson.
	Prentice-Hall 1988.

Rob
--
Rob Lemley
System Consultant, Scanning Software, Intergraph, Huntsville, AL
rcl@b15.ingr.com    OR    ...!uunet!ingr!b15!rob
205-730-1546