[comp.unix.xenix] free form texual database

jbayer@ispi.UUCP (Jonathan Bayer) (01/06/89)

I have a need for a free form texual database for a Xenix system.  It
has to be able to store entries of arbitrary length (anywhere from a few
lines to several pages), and be indexed by at least a header, and if
possible by special key words located in the entries.  The key words in
the second case would be specified in a key word list, the database
should index any word in the entry which is in the key word list.

Additionally, it should be able to store data in a tree-structured
order, somewhat like the net news.


Any ideas?

Jonathan Bayer


-- 
Jonathan Bayer				"The time has come," the Walrus said...
Intelligent Software Products, Inc.	
19 Virginia Ave.				...uunet!ispi!jbayer
Rockville Centre, NY   11570	(516) 766-2867	jbayer@ispi

kinmonthprep@deneb.ucdavis.edu (Earl H. Kinmonth) (01/08/89)

In article <397@ispi.UUCP> jbayer@ispi.UUCP (Jonathan Bayer) writes:
>I have a need for a free form texual database for a Xenix system.  It
>has to be able to store entries of arbitrary length (anywhere from a few
>lines to several pages), and be indexed by at least a header, and if
>possible by special key words located in the entries.  The key words in
>the second case would be specified in a key word list, the database
>should index any word in the entry which is in the key word list.
>
>Additionally, it should be able to store data in a tree-structured
>order, somewhat like the net news.

I have such a data base.  It meets all of your criteria except tree-structure.
It uses hashed indexes instead.

Earl H. Kinmonth
History Department
University of California, Davis
Davis, California  95616
916-752-1636 (day: voice, night: fax)
916-752-0776 (secretary)
ucbvax!ucdavis!ucdked!cck (email)
cc-dnet.ucdavis.edu (request ucdked, login as guest)

jim@fsc2086.FSC.COM (Jim O'Connor) (01/08/89)

In article <397@ispi.UUCP>, jbayer@ispi.UUCP (Jonathan Bayer) writes:
> has to be able to store entries of arbitrary length (anywhere from a few
> lines to several pages), and be indexed by at least a header, and if
> possible by special key words located in the entries.  The key words in
> the second case would be specified in a key word list, the database
> should index any word in the entry which is in the key word list.
> 
> Additionally, it should be able to store data in a tree-structured
> order, somewhat like the net news.

Why not use the Xenix directory structure itself?  You could write a few
programs to accept the data (sounds like a WP file), create the header,
check for key words and add the file name to the key word index if any are 
found, and then store the whole thing in the appropriate directory.

By storing this data in files under the directory structure, you allow
yourself to use all of the existing Unix utilities to access this data.
Sounds like being able to use grep, awk, more (or less), etc. would come
in real handy when your user's would want to access the data.  You could
also use your favorite editor as the "data entry" program.

Sounds interesting, though.  I once thought of doing something similar as a
"report" management system, where electronic copies of all the reports
people generate around here could be stored in files, and then someone could
use "readrep" to read the reports.  Keys by topic, sender, key words, etc.
would have been useful.  If people got used to using it, it would cut down
some on the paper flow.

In reality, though, my "report" system could be done through e-mail or the news
by using special local groups.

--jim
------------- 
James B. O'Connor				jim@FSC.COM
Filtration Sciences Corp.			+1 615 821 4022 x651
105 W. 45th St. - Chattanooga, TN 37409

daveh@marob.MASA.COM (Dave Hammond) (01/09/89)

In article <379@fsc2086.FSC.COM> jim@fsc2086.FSC.COM (Jim O'Connor) writes:
>In article <397@ispi.UUCP>, jbayer@ispi.UUCP (Jonathan Bayer) writes:
>> has to be able to store entries of arbitrary length (anywhere from a few
>> lines to several pages), and be indexed by at least a header, [...]
>Why not use the Xenix directory structure itself?  You could write a few
>programs to accept the data (sounds like a WP file), create the header[...]
>By storing this data in files under the directory structure, you allow
>yourself to use all of the existing Unix utilities to access this data.
>Sounds like being able to use grep, awk, more (or less), etc. would come
>in real handy when your user's would want to access the data.  You could
>also use your favorite editor as the "data entry" program.

I went this route on a project a few years ago, and was sorry later that
I did.  The advantage of data manipulation with standard tools was far
overshadowed by the tremendously inefficient disk usage.  Because of the
filesystem inode limit, we were bound to a maximum of ~16,000 inodes on
a 30mb partition.  With the average database entry size under 1K, the
partition was effectively "filled" at ~16Mb, or half capacity.




[inews food]

--
Dave Hammond
...!uunet!masa.com!{marob,dsix2}!daveh

jim@tiamat.FSC.COM (Jim O'Connor) (01/09/89)

In article <451@marob.MASA.COM>, daveh@marob.MASA.COM (Dave Hammond) writes:
> In article <379@fsc2086.FSC.COM> jim@fsc2086.FSC.COM (Jim O'Connor) writes:
  [ discussion of using the filesystem as the textual database ]
> 
> I went this route on a project a few years ago, and was sorry later that
> I did.  The advantage of data manipulation with standard tools was far
> overshadowed by the tremendously inefficient disk usage.  Because of the
> filesystem inode limit, we were bound to a maximum of ~16,000 inodes on
> a 30mb partition.  With the average database entry size under 1K, the
> partition was effectively "filled" at ~16Mb, or half capacity.

The number of inodes in a file system is configurable when you make the file
system.  Altos's utility to init new hard disks (and hence filesystems) ask
for the "number of bytes/inode" to use.  The default is 2048 (sounds like your
system) but can be set to 1024 if you want.  Or, with if you use "mkfs"
directly, you can just specify the number of inodes that you want.

You are right about inefficient capacity usage with small files, though, since
space is allocated in block units. If you expect many small "records", a lot
of space is "wasted" in those partially filled blocks.

Possible solution:
Have the program that "creates records" store "small" records together in a
single file, with some sort of index scheme to keep track of where each record
is.   When you need the record, extract it into a single file, process it, and
then perhaps put it back.  Using "ar" might be a candidate for this.

--jim
------------- 
James B. O'Connor			jim@FSC.COM
Filtration Sciences Corporation		615/821-4022 x. 651

root@chessene.UUCP (This System) (01/11/89)

In article <451@marob.MASA.COM>, daveh@marob.MASA.COM (Dave Hammond) writes:
% In article <379@fsc2086.FSC.COM> jim@fsc2086.FSC.COM (Jim O'Connor) writes:
% >In article <397@ispi.UUCP>, jbayer@ispi.UUCP (Jonathan Bayer) writes:
% >> has to be able to store entries of arbitrary length (anywhere from a few
% >> lines to several pages), and be indexed by at least a header, [...]
% >Why not use the Xenix directory structure itself?  You could write a few
% >programs to accept the data (sounds like a WP file), create the header[...]
% >...

% I went this route on a project a few years ago, and was sorry later that
% I did.  The advantage of data manipulation with standard tools was far
% overshadowed by the tremendously inefficient disk usage.  Because of the
% filesystem inode limit, we were bound to a maximum of ~16,000 inodes on
% a 30mb partition.  With the average database entry size under 1K, the
% partition was effectively "filled" at ~16Mb, or half capacity.

RTFM. I had the same problem with the spool filesystem running out of inodes
when I started getting news.

NAME
	mkfs - construct a file system
SYNOPSIS
	/etc/mkfs special sectors[:inodes] [gap sectors/cyl]
				  ^^^^^^^

Or were you running a version of UN*X even more brain-damaged than ours is?
(That can't be possible... :-)
--
Mark Buda                                Domain: hermit@chessene.uucp
Dumb: ...rutgers!bpa!vu-vlsi!devon!chessene!hermit
"Here, with a compressed air drill, parsnips are harvested." - an old newsreel