[comp.databases] looking for text retrieval software

andy@garnet.berkeley.edu (Andy Lieberman) (08/31/89)

We're looking for a full text information retrieval package that runs on
UNIX (the more flavors the better).  The plan is to write our own user interface
(client) that communicates to a unix server that we also write.  The unix 
server will take the clients search requests and communicate them to the 
search engine and pass the search results back to the client.

The search engine must be able to do keyword boolean searches.  Word-proximity
searching, weighted searching are also of interest.

The search engine must have a good programming interface.  It would be nice if
there was a run-time version of the search engine available so we don't have 
to buy a full package for every machine.  Although, utilities for maintaining
the database are also a consideration.

I am currently looking at BRS/Search and Topic by Verity.  These both seem
acceptable, but I would like to know of any other choices on the market.
Comments about BRS/Search and Topic are also welcome (I already have their 
literature, so I'd be more interested in opinion than fact.)  Are there any 
journals or magazines I should be reading?  It seems that everything I hear
about is SQL...

Please mail responses and I'll post a summary.

Thanks,
Andy Lieberman
Library Systems Office
UC Berkeley

paul@csnz.co.nz (Paul Gillingwater) (09/06/89)

In article <1989Aug30.204014.27985@agate.uucp> andy@garnet.berkeley.edu (Andy Lieberman) writes:
>We're looking for a full text information retrieval package that runs on
>UNIX (the more flavors the better).  

It's easier for me to post -- replies bounce too often with many US
sites that refuse to use fully-qualified domains.

Disclaimer:  I work for a dealer that sells BRS/Search.

I believe that BRS/Search is probably a good choice for some
applications.  We have developed many applications that use it,
and find that it has a very good programming interface.  I'm less
happy with the documentation, although there have been big improvements
recently.  Where BRS/Search falls down is that it does not have
any relational links between records or strong subrecord structuring.

We have programmed around that limitation by writing and interface
between BRS/Search and Informix SQL, which gives the best of both
worlds, i.e. BRS is used for a full MARC catalogue, and Informix is
used for the circulation control system for our library package.

My only other grumble with BRS is its price -- I think it's a bit
high -- but now that BRS has been taken over by a publishing empire,
I think we may see some more aggressive marketing....

-- 
Paul Gillingwater, Computer Sciences of New Zealand Limited
Domain: paul@csnz.co.nz  Bang: uunet!vuwcomp!dsiramd!csnz!paul
Call Magic Tower BBS V21/23/22/22bis 24 hrs NZ+64 4 767 326
SpringBoard BBS for Greenies! V22/22bis/HST NZ+64 4 767 742

andy@garnet.berkeley.edu (Andy Lieberman) (09/09/89)

In article <1989Aug30.204014.27985@agate.uucp> I wrote:
>We're looking for a full text information retrieval package that runs on
>UNIX (the more flavors the better).  The plan is to write our own user interface
>(client) that communicates to a unix server that we also write.  The unix 
>server will take the clients search requests and communicate them to the 
>search engine and pass the search results back to the client.
>The search engine must be able to do keyword boolean searches.  Word-proximity
>searching, weighted searching are also of interest.
>

Here's what I got:

----------
From: billr@brspyr1.brs.com (Bill Rowe)
FYI.  We, here at BRS, are on the NET. (Big supprise, huh?) :-)

Since you already have documentation, you probably know that BRS/Search
does just about everything that you require, except provide information
on how to write an interface for our search engine.  Well, that's about to 
change:

My current project involves producing a manual which documents how to 
develop your own interface to the BRS Search Engine.  If you have questions
of a technical nature, email me and I'll do my best to get you the answer.

If you interested in an advanced copy of this manual, contact your Sales Rep.
(Whoever that is?).
-----------

The distributor I talked to was Main Street Software in New York,
(212)779-8398.  They also suggested ordering the MNS Reference Manual for $35.
MNS is their 4GL.  The manual explains everything that can be done through a 
C program interface.

-------------
From: rcsmith@anagld.berkeley.edu (Ray Smith)

   My company markets a hardware based search engine you may find is more
flexible than the two software packages you mentioned depending on the
type application you are working with and the amount of data. Our search
engine conducts a serial serch of the data at disk throughput speeds. Since
it searches the data serially there is no need for the overhead associated
with indexing (typically the size of an index is 2-3 times the size of the
original text).

   Since our product is hardware based we are limited in the number of
platforms we support. Currently the UNIX versions of our system are
supported on Sun 3's & Sun 4's, Solbourne and Gould PowerNodes systems.
With both the Sun and Solbourne systems we support an Ethernet based search
server where you can build a smart frontend on a number of different
platforms (PC's, MAC, etc.)

   For customization, our system comes with a full development library.

   Below is a copy of an article I posted to "comp.newprod" a while back.
I also have a product description if you would like me to send it to you.
You can reach me at the phone number listed in the .signature for more
information.

   In addition to the features listed below we have added a few new ones
recently.  These include support for Soundex queries, a technique to catch
typical typographical errors (handles four common errors), and a SunView
point-and-shoot interface.  We are currently working on a MAC HyperCard
interface and a MS/DOS interface as well as a point-and-shoot interface
under X Window.

-Ray

-------------------------- Begin included text ----------------------------

	  "TEXTRACT" HIGH SPEED TEXT SEARCH AND RETRIEVAL SYSTEM

Analytics, Inc. is pleased to announce the availability and full technical
support of Textract, a hardware-based full text search and retrieval
engine.  Textract allows very high speed search and retrieval of ASCII
data stored in any format.

The user can query up to 100 information files at once, with natural
English language words and phrases, using up to 256 unique terms.  Both
search time and data file size are decreased by Textract's ability to
encode and compress data.

The Textract engine is capable of searching at speeds of 8 Megabytes per
second, although the effective search speed is limited by bus and disk
data transfer rates on host systems.  Textract currently runs on DEC
MicroVAX II's under VMS, Sun 3 & 4 under SunOS3.5 and SunOS4.0, and Gould
PowerNode platforms under UTX/32 R2.1.  Prices vary depending on host
computer.

Features include:
       -English language query with full Boolean logic operators
       -Single and multicharacter wild cards.
       -Numerical ranging
       -Vocabulary  and Thesaurus lookup
       -Order or unordered terms
       -Word proximity
       -ANSI terminal based menu interface
       -Complete "C" libraries for customization
       -Full Sun networking support including the use of NFS and the
	capability to use boards physically located on another machine
	if your local boards are busy.

For more information on Textract, feel free to contact Ray Smith at

	Analytics, Inc.         9891 Broken Land Pkwy.
	(301) 381-4300          Suite 200
				Columbia, MD 21046

	Email: rcsmith@anagld.UUCP
---------------------------------

From: Jkrueger <mtxinu!uunet.UU.NET!dgis!daitc.daitc.mil!jkrueger@ucbvax.berkeley.edu>
You can do these things with relational database management systems.
We're doing it.  There are advantages and disadvantages with respect
to text engines.

--------------------------------

>From: paul@csnz.co.nz (Paul Gillingwater)
Disclaimer:  I work for a dealer that sells BRS/Search.

I believe that BRS/Search is probably a good choice for some
applications.  We have developed many applications that use it,
and find that it has a very good programming interface.  I'm less
happy with the documentation, although there have been big improvements
recently.  Where BRS/Search falls down is that it does not have
any relational links between records or strong subrecord structuring.

We have programmed around that limitation by writing and interface
between BRS/Search and Informix SQL, which gives the best of both
worlds, i.e. BRS is used for a full MARC catalogue, and Informix is
used for the circulation control system for our library package.

My only other grumble with BRS is its price -- I think it's a bit
high -- but now that BRS has been taken over by a publishing empire,
I think we may see some more aggressive marketing....

---------------------

I agree that the price seems a bit high...  One of the main reasons I'm
trying to find other packages to choose from.

--------------------

From: mtxinu!prlb.philips.be!sunbim!od@ucbvax.berkeley.edu (Olivier Declerfayt)

"TOPIC is the first full-text search and retrieval system designed for
networked computing environments. TOPIC allows you to store, manage,
and retrieve any document regardless of its format or location on a
network. Employing Concept-Based Retrieval technology, TOPIC features a
rule-based approach to searching documents by subject of interest and
presenting search results in relevance-ranked order. TOPIC can be used
not only for managing internal text databases, but also for managing
external information sources. The TOPIC Real-Time System can
automatically monitor any number of live sources of time sensitive
text information, such as newswire feeds, and classify and disseminate
selected information tailored to each individual user's interest
profile.

When used as a retrospective searching tool, TOPIC enables
organizations to manage and provide access to the increasing amounts
of unstructured text and image data, such as reports, manuals,
evaluations, and e-mail, that traditional database systems cannot. The
TOPIC SQL-Bridge allows TOPIC full-text databases to be integrated
with popular SQL-based RDBMS system.

TOPIC written in C, runs in heterogeneous networked environments of
computers and supports Digital VAX/VMS; UNIX-based workstations and
minicomputers, such as Sun Microsystems, Pyramid Series 9000 and
MIPS; and DOS and Xenix-based microcomputer systems. TOPIC is
available in two configurations: as a stand alone multi-user system
and as a network-based system. TOPIC supports most major file sharing
networks including, NFS, DecNet, Novell, 3Com, TOPS, Banyan, TCP/IP,
and ArcNet."

Verity's address: 1550 Plymouth Street, Mountain View, CA 94043-1230
                  Phone: (415) 960 7600, Fax: (415) 960 7698
---------------

Someone asked me if I had an e-mail address for Verity.  I did, but can't
find it.  Maybe someone from Verity could make themselves known...


Andy Lieberman, Library Systems Office, UC Berkeley