[comp.databases] Free Text Databases

kam@itivax.iti.org (Keith A. McNabb) (05/12/89)

Would anyone be able to recommend a good, efficient, and fairly
powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or 
Sun/UNIX environment?  It should have no restrictions on record 
length and should allow pre-existing flat ASCII files to be 
easily incorporated.  At the same time, it should support the 
definition of various fields, so that searches may be more
selectively qualified, and it should support numeric operators.

I've learned that the PC version of BRS will do all of this.  
Is there anything better/slicker/more cost-effective out there?

Please excuse this query if the question has already come up in 
the past - I'm not a regular reader of this newsgroup.

--Thank you--

Keith McNabb
Industrial Technology Institute
kam@iti.org

fisher@sc2a.unige.ch (Markus Fischer) (05/30/89)

In article <1158@itivax.iti.org>, kam@itivax.iti.org (Keith A. McNabb) writes:
> Would anyone be able to recommend a good, efficient, and fairly
> powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or 
> Sun/UNIX environment?  It should have no restrictions on record 
> length and should allow pre-existing flat ASCII files to be 
> easily incorporated.  At the same time, it should support the 
> definition of various fields, so that searches may be more
> selectively qualified, and it should support numeric operators.

It really depends on what you want to do with the data.  I have now some
experience with sequential `coded' databases, for which I use a dBASE-like
program, and with bibliographical databases, as I'm responsible for an
abstract service in the field of archaeology.

As I did't find a good and cheap database system allowing character fields
with variable length (one fields is the author : 20 to 200 char.; another
is the actual abtract : 1 to 100 lines of text ! ), I simply used the
word processor I was accustumed to : WordPerfect.  This might seem a little
strange, but it really has a subset of database functions : sort, extract
by conditions, numerical variables and modify structure...  Of course, one
doesn't expect full statistical functions, or even calculations...

The main problem is actually that the structure of the data cannot be defined,
which means that the user must be carful to put each field in the right place.
In other words, you will have trouble if several people are to enter or edit
the data.

The main advantage is of course the quality of printouts : you are working
with a read word processor !  (i.e., the fields can contain formatting
codes...)

I hope I haven't annoyed anyone with these views; this wasn't mean to be
some kind of advertisement, either.  For the time being, I haven't found
anything as efficient as a `text-database' for text-processing...

Markus Fischer                -|--|--|--|--|--|--I   Department of Anthropology
                        -|--|--|--|--|--|--|-(#)-I   University of Geneva
      -|--|--|--|--|--|--|--|--|-(#)-|-(#)(#)(_)-I   CH-1227  Carouge (GE)
   -&-(_)-|--|--|-(#)-&--|-(#)(#)(_)(#)-&-(_)(#)-I   Switzerland
            -|--|--|--|--|-(#)(_)-|-(_)(_)(_)(#)-I
black (#) to kill ! --|--|-(#)(_)(_)(_)(#)(#)(_)(_)  fisher@sc2a.unige.ch
=+==+==+==+==+==+==+==+==+==+==+==+==+==+==+=(#)=+   fisher@cgeuge52.bitnet

bobd@bloom.UUCP (Bob Donaldson) (06/02/89)

In article <76@sc2a.unige.ch>, fisher@sc2a.unige.ch (Markus Fischer) writes:
> In article <1158@itivax.iti.org>, kam@itivax.iti.org (Keith A. McNabb) writes:
> > Would anyone be able to recommend a good, efficient, and fairly
> > powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or 
> > Sun/UNIX environment?  It should have no restrictions on record 
> > length and should allow pre-existing flat ASCII files to be 
> > easily incorporated.  At the same time, it should support the 
> > definition of various fields, so that searches may be more
> > selectively qualified, and it should support numeric operators.
> 
> ...  I simply used the
> word processor I was accustumed to : WordPerfect.  This might seem a little
> strange, but it really has a subset of database functions : sort, extract
> by conditions, numerical variables and modify structure...  Of course, one
> doesn't expect full statistical functions, or even calculations...
> 
> The main problem is actually that the structure of the data cannot be defined,
> which means that the user must be carful to put each field in the right place.
> In other words, you will have trouble if several people are to enter or edit
> the data.
> 
> The main advantage is of course the quality of printouts : you are working
> with a read word processor !  (i.e., the fields can contain formatting
> codes...)

As a variation on this theme, I can suggest a hybrid.  Use your favorite
word processor to generate each "database" entry, the store the data in a
'real' DBMS - I would suggest Empress/32 (runs under DOS & Sun/UNIX), since
it handles large variable length fields quite well.  A little preprocessing
would both do some QA/QC on the data entry & data format, and also allow
the extraction of fixed-length fields in the database which could then be
indexed.  The wordprocessor files would be stored complete in a variable
length, unprocessed field (type = bulk in Empress).  This allows you to
include all of the formatting codes, etc.  I expect that other vendors have
similar capabilities, but check WHATEVER you choose carefully - I have found
a lot of un-documented or well-hidden limitations in the use of these
unstructured data types in some packages.

-=-
Bob Donaldson              ...!cs.utexas.edu!natinst!radian!bobd
Radian Corporation                    ...!sun!texsun!radian!bobd
PO Box 201088       
Austin, TX  78720       (512) 454-4797

Views expressed are my own, not necessarily those of my employer.

ked@garnet.berkeley.edu (Earl H. Kinmonth) (06/02/89)

In article <76@sc2a.unige.ch> fisher@sc2a.unige.ch (Markus Fischer) writes:
>In article <1158@itivax.iti.org>, kam@itivax.iti.org (Keith A. McNabb) writes:
>> Would anyone be able to recommend a good, efficient, and fairly
>> powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or 
>> Sun/UNIX environment?  It should have no restrictions on record 
>> length and should allow pre-existing flat ASCII files to be 
>> easily incorporated.  At the same time, it should support the 
>> definition of various fields, so that searches may be more
>> selectively qualified, and it should support numeric operators.

This wish less sounds like an advertisement for Bibliofile, a set of tools
I have written over several years.  The only difference is that it is not
MSDOS or **IX.  It is available on both.

Write me for a blurp describing the system.


Earl H. Kinmonth
History Department
University of California, Davis
Davis, California  95616
916-752-1636 (2300-0800 PDT for FAX)
916-752-0776 (secretary)
ucbvax!ucdavis!ucdked!cck (email)
cc-dnet.ucdavis.edu [128.120.2.251]
	(request ucdked, login as guest)

jordan@cs.columbia.edu (Jordan Hayes) (06/05/89)

Keith A. McNabb <kam@itivax.iti.org> asks:

	Would anyone be able to recommend a good, efficient, and fairly
	powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or
	Sun/UNIX environment?

Have you seen TOPIC from Verity?  Runs on Sun/UNIX, VMS, MS-DOS, etc.,
and is pretty extensive.  Contact mcation@verity.com for more information
(hi mike!) ...

/jordan
#include <std/disclaimer.h>

paul@csnz.co.nz (Paul Gillingwater) (06/06/89)

In article <640@bloom.UUCP> bobd@bloom.UUCP (Bob Donaldson) writes:
+In article <76@sc2a.unige.ch>, fisher@sc2a.unige.ch (Markus Fischer) writes:
+> In article <1158@itivax.iti.org>, kam@itivax.iti.org (Keith A. McNabb) writes:
+> > Would anyone be able to recommend a good, efficient, and fairly
+> > powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or 
+> > Sun/UNIX environment?  It should have no restrictions on record 
+> > length and should allow pre-existing flat ASCII files to be 
+> > easily incorporated.  At the same time, it should support the 
+> > definition of various fields, so that searches may be more
+> > selectively qualified, and it should support numeric operators.
+> 
+> ...  I simply used the
+> word processor I was accustumed to : WordPerfect.  This might seem a little
+> strange, but it really has a subset of database functions : sort, extract
+> by conditions, numerical variables and modify structure...  Of course, one
+> doesn't expect full statistical functions, or even calculations...
+> 
+> The main problem is actually that the structure of the data cannot be defined,
+> which means that the user must be carful to put each field in the right place.
+> In other words, you will have trouble if several people are to enter or edit
+> the data.
+> 
+> The main advantage is of course the quality of printouts : you are working
+> with a real word processor !  (i.e., the fields can contain formatting
+> codes...)
+
+As a variation on this theme, I can suggest a hybrid.  Use your favorite
+word processor to generate each "database" entry, the store the data in a
+'real' DBMS - I would suggest Empress/32 (runs under DOS & Sun/UNIX), since
+it handles large variable length fields quite well.  A little preprocessing

Hmm...  we are doing quite a bit of work with BRS/Search, which works
on many machines, from MS-DOS to Sun UNIX, DG/AOS etc.  The same files
can be used by the DOS and UNIX versions without conversion.  We have
the tools that can import Word Perfect documents, with all formatting
codes intact.  The advantage of BRS over a "classic" RDBMS like Empress
(hmm... she's come up in the world since she was "Mistress"! :-)
is the search engine - every single significant word is searchable,
because every word is added to a dictionary and indexed.

Field length is not a problem - "paragraphs" may be 64kb long, and there
is no loss of efficiency or wasted storage if you have one record with
20 bytes and another with 20 kb (which is a problem with the "classic"
approach.

Summary:  if you are working with large amounts of free text, use the
correct tool.  Sure, the license fee is a bit steep, but you get what
you pay for, and it's a solid product.

I like it because I use it, not because I sell it.

>would both do some QA/QC on the data entry & data format, and also allow
>the extraction of fixed-length fields in the database which could then be
>indexed.  The wordprocessor files would be stored complete in a variable
>length, unprocessed field (type = bulk in Empress).  This allows you to
>include all of the formatting codes, etc.  I expect that other vendors have
>similar capabilities, but check WHATEVER you choose carefully - I have found
>a lot of un-documented or well-hidden limitations in the use of these
>unstructured data types in some packages.
>
>
>-=-
>Bob Donaldson              ...!cs.utexas.edu!natinst!radian!bobd
>Radian Corporation                    ...!sun!texsun!radian!bobd
>PO Box 201088       
>Austin, TX  78720       (512) 454-4797
>
>Views expressed are my own, not necessarily those of my employer.


-- 
Paul Gillingwater, Computer Sciences of New Zealand Limited
Bang: ..!uunet!dsiramd!csnz!paul    Domain: paul@csnz.co.nz
Call Magic Tower BBS V21/23/22/22bis 24 hrs +0064 4 767 326

ked@garnet.berkeley.edu (Earl H. Kinmonth) (06/12/89)

>+> > Would anyone be able to recommend a good, efficient, and fairly
>+> > powerful FREE TEXT database for a PC-AT compatible (MS-DOS) or 
>+> > Sun/UNIX environment?  It should have no restrictions on record 

>Hmm...  we are doing quite a bit of work with BRS/Search, which works
>on many machines, from MS-DOS to Sun UNIX, DG/AOS etc.  The same files

>Summary:  if you are working with large amounts of free text, use the
>correct tool.  Sure, the license fee is a bit steep, but you get what
>you pay for, and it's a solid product.
>
>I like it because I use it, not because I sell it.

If "free text" and indexing on every word are the main criteria, allow
me to toot my own horn and recommend Bibliofile, a set of tools for
file management originally developed for Medieval Latin texts and
romanized Japanese bibliographies.

Bibliofile includes ked, an editor that mimics ex with an interface to
vi, kord a sorting program with logic (if there's no author, sort on
title, if there's no title, sort on ______), kawk (a format interpreter
that is programmable in a subset of C), kwik (hashed searches on every
bloody word in a text), kref (plugs footnotes into manuscripts), kroff
(a nroff-style pretty printer that is an order of magnitude faster than
its namesake), etc.

Bibliofile is currently priced at $00.00, but this may double in the
near future. Bibliofile is available as source and binary for:

MSDOS (XT, AT)

UNIX (4.2, 4.3) BSD

ULTRIX

UNIX (SUN)

SCO Xenix 286

For information write to the address given in the signature.  NOTE THAT THIS
IS NOT THE SAME ADDRESS YOU GET BY USING THE R COMMAND OF THE NEWSREADER.
IF YOU DO NOT HAVE BRAINS ENOUGH TO FOLLOW THIS INSTRUCTION, YOU PROBABLY
DON'T HAVE BRAINS ENOUGH TO USE BIBLIOFILE.  (If you think I'm being funny,
I'll show you my mail log; roughly one of two will ignore this message.)

Earl H. Kinmonth
History Department
University of California, Davis
916-752-1636 (voice, fax [2300-0800 PDT])
916-752-0776 secretary

ucbvax!ucdavis!ucdked!cck