[comp.databases] Wildcard searches in dBASE -or- Clipper

tleylan@pegasus.com (Tom Leylan) (02/07/91)

In article <1991Feb6.002241.4960@oeo2.uucp> greinkem@oeo2.uucp (Mark Greinke) writes:
>I hope somebody out there can help me.  I am trying to implement a
>wildcard indexed search under dBASE III+ (eventually to be compiled 
>using Clipper, Summer '87).  Is there a way to do this?  
>
>Basically if the user inputs a search string such as "BOGOTA" 
>it would find valid matches in "AMEMBASSY BOGOTA" and 
>                               "AMERICAN EMBASSY BOGOTA" and
>                               "AMB BOGOTA"
>
>I know this can be done under Informix, but the customer wants all
>the work to be done in dBASE or Clipper.  If it cannot be done,
>can you point me to a manual or book which documents this fact.
>
Mark,

Clipper would offer the superior solution in many ways most notably speed.

In order to do substring comparisons you would have to use the LOCATE 
command which must search every record until it finds a match and of course
if there is no match it scans the entire file.  (This is just the nature
of substring searches).  If you are willing to search "progressively"
on the key field matches will be near instantaneous.  You could type in
"A" and have the A's listed on the screen and then type "M" and have the
AM's listed and then "AME".  This is extremely "cool" for the user since
they narrow in on what they are looking for and if they spell something
wrong "AMERACAN" there wouldn't be any listed so they delete characters
until they see the entry they're looking for.

That method eliminates the "enter a key", "search the key", "report on key"
cycle if somebody doesn't exactly know what they are looking for.

If the three "BOGOTA" entries you use refer to the same place then you'll
want to isolate then name (normalize the file) to eliminate the spelling
variations.  More questions ?  Let me know.

tom leylan   tleylan@pegasus.com
ex-Senior Systems Analyst / Nantucket Corporation

tomr@dbase.A-T.COM (Tom Rombouts) (02/09/91)

In article <1991Feb6.002241.4960@oeo2.uucp> greinkem@oeo2.uucp (Mark Greinke) writes:
>I hope somebody out there can help me.  I am trying to implement a
>wildcard indexed search under dBASE III+ (eventually to be compiled 
>using Clipper, Summer '87).  Is there a way to do this?  
>
>Basically if the user inputs a search string such as "BOGOTA" 
>it would find valid matches in "AMEMBASSY BOGOTA" and 
>                               "AMERICAN EMBASSY BOGOTA" and
>                               "AMB BOGOTA"
>

For the above simple example, the dBASE "$" operator could be used
as such:

   "BOGOTA" $ <character expression> 

   which would return .T. is "BOGOTA" were anywhere in the character
expression.  You could use UPPER() (or LOWER() ) on either or
both expressions to ignore the case of the letters.

Of course, the _real_ way to do this would be to code up a
regular expression matching routine in C or .ASM, and link
it in to either Clipper or the upcoming Ashton-Tate Professional
Compiler.  The source for GNU grep is readily available on the
net, and chapters 19 and 20 of Sedgewick's "Algorithms in C"
discuss the details of pattern matching theory.  (Less detailed,
but more readable than Knuth.)  Also, Blaise's "Power Search"
package provides an interesting example of this in .ASM code.

Finally, for quick and dirty work, there are several DOS AWK
packages, as well as Friendly Finder, Ask Sam or other text
retrieval utiliies that would likely work directly on a .DBF
type file.  (If not, you could first export the data to text
format.)

Hope this helps....


Tom Rombouts  Torrance 'Tater  tomr@ashtate.A-T.com  V:(213)538-7108

keithm@dbase.A-T.COM (Keith Mund) (02/12/91)

dBASE IV has a command, "SET NEAR on/OFF" that allows near matches in
databases. The examples in the dBASE IV 1.1 manual on page 3-75 
offers an example similar to Tom's response. 

I have worked with a number of existing applications that are operating
in both dBASE IV and all of the clones. Overall performance of dBASE IV
is excellent and I find that the competitors claims of speed are often
based on individual isolated incidences rather than real-life use.

Keith Mund
-- 
Keith Mund