[comp.sys.mac.hypercard] Large Scale DataBase

GFX@PSUVM.BITNET (09/09/88)

We use a rather large database (500,000 + records) holding  a few
variables ( 25 +/-) related to industrial establishments.  We manage
it with SAS on an IBM mainframe, but are curious as to whether such
a dataBase could be installed in an HyperCard environment to provide
interactive queries (a rarity, but nonetheless...)  Our major motive
in doing so would be to take advantage of the (apparently) very fast
searching capabilities of HyperCard.  (in most instances, interactive
users do not have the establishment ID number, and therefore must resort
to subString searches to locate the appropriate records.  VERY inefficient
at the present time)


I would be delighted to hear from anyone with similar experience, or
from anyone able to formulate educated guesses...

What we would have in mind at this time would be a single field per
record, concatenating all the relevant information in less than, say,
250 characters.  Basic questions are: what stack size would it mean?
Is CD-ROM an alternative we should consider? (there would NOT be more
than one site)  Can we build the stack automatically from an ASCII file?
What would be the expected search time?  Any other approach you think is
superior?

Stephane

dan@Apple.COM (Dan Allen) (09/09/88)

In article <52305GFX@PSUVM> GFX@PSUVM.BITNET writes:
>We use a rather large database (500,000 + records) holding  a few
>variables ( 25 +/-) related to industrial establishments.  We manage
>it with SAS on an IBM mainframe, but are curious as to whether such
>a dataBase could be installed in an HyperCard environment to provide
>interactive queries (a rarity, but nonetheless...)  Our major motive
>in doing so would be to take advantage of the (apparently) very fast
>searching capabilities of HyperCard.  (in most instances, interactive
>users do not have the establishment ID number, and therefore must resort
>to subString searches to locate the appropriate records.  VERY inefficient
>at the present time)
>
>
>I would be delighted to hear from anyone with similar experience, or
>from anyone able to formulate educated guesses...
>
>What we would have in mind at this time would be a single field per
>record, concatenating all the relevant information in less than, say,
>250 characters.  Basic questions are: what stack size would it mean?
>Is CD-ROM an alternative we should consider? (there would NOT be more
>than one site)  Can we build the stack automatically from an ASCII file?
>What would be the expected search time?  Any other approach you think is
>superior?

To be brutally honest... WE DON't KNOW.  Stacks of cards in the 500,000
plus range have not really been tested too well.  I do know that there
may be some interesting bugs lurking in huge stacks, but we are making
progress in this area: we just got a 1.4 GB hard disk to do testing of
large stacks.

Now for a back of the envelope calcuation... 500,000 cards with a
minimum of 64 bytes of overhead per card and about 256 bytes of text per
card would give us 152 MB of text, not including room for HyperCards
internal indexing.  I guess if it was even double it would still fit on
a CD-ROM with some room for growth, but looking through the stack... who
knows the search time... I hope it would be good.

Yes, you could automatically build the stack for an ASCII file.  I built
a stack from 4 MB of ASCII text and it took an evening.  Figure
accordingly...

All in all, your proposition may not be that feasible, but it does give
us something to shoot for as we enhance HyperCard.

Dan Allen
HyperCard Team
Apple Computer

sysop@stech.UUCP (Jan Harrington) (09/14/88)

in article <52305GFX@PSUVM>, GFX@PSUVM.BITNET says:
> 
> We use a rather large database (500,000 + records) holding  a few
> variables ( 25 +/-) related to industrial establishments.  We manage
> it with SAS on an IBM mainframe, but are curious as to whether such
> a dataBase could be installed in an HyperCard environment to provide
> interactive queries (a rarity, but nonetheless...)  Our major motive
> in doing so would be to take advantage of the (apparently) very fast
> searching capabilities of HyperCard.  (in most instances, interactive
> users do not have the establishment ID number, and therefore must resort
> to subString searches to locate the appropriate records.  VERY inefficient
> at the present time)

Considering th size of your database, Hypercard would probably be a disaster.
It just wasn't created to manage that much data. I would suggest using one
of the faster Mac DBMSs to create an application. FoxBase would probably
be a good choice. It's very fast, and relatively easy to use.

Jan Harrington, sysop
Scholastech Telecommunications
UUCP: husc6!amcad!stech!sysop or allegra!stech!sysop
BITNET: JHARRY@BENTLEY

********************************************************************************
	Miscellaneous profundity:

		"No matter where you go, there you are."
				Buckaroo Banzai
********************************************************************************

rick@kimbal.UUCP (Rick Kimball) (09/17/88)

From article <660@stech.UUCP>, by sysop@stech.UUCP (Jan Harrington):
> in article <52305GFX@PSUVM>, GFX@PSUVM.BITNET says:
>> 
>> We use a rather large database (500,000 + records) holding  a few
>> variables ( 25 +/-) related to industrial establishments.  We manage
>> ...

I saw an advertisment in the Sept issue of MacTutor for a data base
package called "GridFile".  According to the ad it supports data
bases of 4 gigabytes in size and is available for HyperCard and
LightSpeed 'C'.

Standard Disclaimers.

___

Rick Kimball                                Software Design Group, Inc.  
Manager, Software Development               800 Trafalgar Ct. Suite 340
UUCP:rick@kimbal or ..!rtmvax!kimbal!rick   Maitland, FL 32751
CIS: 72277,214                              (407) 660-0006 , 788-6875
-- 
Rick Kimball                                Software Design Group, Inc.  
Manager, Software Development               800 Trafalgar Ct. Suite 340
UUCP:rick@kimbal or ..!rtmvax!kimbal!rick   Maitland, FL 32751
CIS: 72277,214                              (407) 660-0006 , 788-6875

landman%hanami@Sun.COM (Howard A. Landman) (09/21/88)

In article <52305GFX@PSUVM> GFX@PSUVM.BITNET writes:
>We use a rather large database (500,000 + records) holding  a few
>variables ( 25 +/-) related to industrial establishments.  We manage
>it with SAS on an IBM mainframe, but are curious as to whether such
>a dataBase could be installed in an HyperCard environment to provide
>interactive queries (a rarity, but nonetheless...)

>I would be delighted to hear from anyone with similar experience, or
>from anyone able to formulate educated guesses...

A really rough guess, based on my stack with 1,600 cards which takes
nearly .5 MB, would be: (500,000 / 1,600) * .5 MB, or about 156 MB.

>What we would have in mind at this time would be a single field per
>record, concatenating all the relevant information in less than, say,
>250 characters.  Basic questions are: what stack size would it mean?

An easy lower bound is 500,000 cards * 250 bytes/card = 125 MB; and that's
just for the text itself.  So the 156 MB above looks pretty reasonable.
Of course, if the average utilization of the field is much less than
250 chars, then it could be smaller.

>Is CD-ROM an alternative we should consider?

Not if you update things often, since you'd have to remaster the CD each time.
Plus, a CD-ROM drive will cost about the same as a large hard disk.

>Can we build the stack automatically from an ASCII file?

Building a text stack from an ASCII file is fairly straightforward.  I do
it in my Go Games and Fuseki stack.  Let me know if you want a copy ...

>What would be the expected search time?

The search speed of HyperCard gets slower as the size of the stack
increases, and if we assume linear degradation your searches might
take 2.5 hours since mine take up to half a minute.  (Additional searches
on the same key are much faster - I think HyperCard either builds indices
or at least scans ahead to the next one while you're not looking.)

>Any other approach you think is superior?

You should try to determine numbers for a fast database system (FoxBase?),
so you have a point of comparison.

	Howard A. Landman
	landman@hanami.sun.com
	UUCP: sun!hanami!landman

korn@eris.berkeley.edu (Peter "Arrgh" Korn) (09/21/88)

In <69189@sun.uucp>, landman@sun.UUCP (Howard A. Landman) said:  
>In article <52305GFX@PSUVM> GFX@PSUVM.BITNET writes:
>>We use a rather large database (500,000 + records) holding  a few
>>variables ( 25 +/-) related to industrial establishments.  We manage
>>it with SAS on an IBM mainframe, but are curious as to whether such
>>a dataBase could be installed in an HyperCard environment to provide
>>interactive queries (a rarity, but nonetheless...)
>
>>I would be delighted to hear from anyone with similar experience, or
>>from anyone able to formulate educated guesses...
>
>A really rough guess, based on my stack with 1,600 cards which takes
>nearly .5 MB, would be: (500,000 / 1,600) * .5 MB, or about 156 MB.

To add another data point:  I have a stack containing all of the registered
UUCP sites in the U.S. as of a about two months ago.  The stack has 2,916
cards in it (including about 40 help cards), and is 1.36 Meg large, 
averaging out to 466 bytes/card.  The stack imports data from the uucp
sites database, and took several hours on a MacII to do the import (I
don't know exactly how long b/c I was out of the house at the time and
forgot to put timing information into the code that did the import).

Once imported, I compacted the stack twice to get optimal searching performance,
and then I 'locked' the stack to make it read-only.  Opening the stack takes
roughly 5 seconds.  Once open and Hypercard has cached whatever information
is caches on opening, search time is minimal.  To find "starnine", the first
occurance of which is on the 889th card, takes under 1 second.  To find
"starnine" in field "site name" (ie: the card belonging to starnine), which
is the 1,408th card, also takes under a second.

For interactive query I find performance is *very* good.  Again, this is
on a twice-compacted locked stack of 1.3Meg and ~3,000 cards.  Also, I'm
running on a Quantum 80 Meg drive on a MacII w/1,000K devoted to HyperCard
1.2.1 under MultiFinder 6.0.  When I tried this same stack out on a MacSE
w/an Apple 20 Meg drive, search time was on the order of 5 seconds (using
HyperCard 1.2, not 1.2.1).

Again, quite reasonable for *interactive* query.  I would NOT want to use
it for generating reports however.

>... [other questions, the answers already given I agree with completely]

>>What would be the expected search time?
>
>The search speed of HyperCard gets slower as the size of the stack
>increases, and if we assume linear degradation your searches might
>take 2.5 hours since mine take up to half a minute.  (Additional searches
>on the same key are much faster - I think HyperCard either builds indices
>or at least scans ahead to the next one while you're not looking.)

Howard, judging from my search times, I'd guess that twice-compressed
locked stacks have FAR superior search times.  Also, if you know which
field your data is in, search time is also faster.  Extrapolating linearly
from my search times and assuming equivalent hardware etc. and 1/2 a
second for the search times (which is a good rough guess), the search time
for a ~150 Meg twice-compacted locked stack would be roughly 60 seconds.

Anyone at Apple have any data on this sort of thing?

>
>>Any other approach you think is superior?
>
>You should try to determine numbers for a fast database system (FoxBase?),
>so you have a point of comparison.

I agree entirely here.  60 seconds is just that:  60 seconds.  It isn't
fast or slow until you compare it with something.  However, for a number
of interactive things (like the UUCP stack), the HyperCard interface is
superior enough to anything FoxBase could give me that even if FoxBase
were twice or three times as fast it wouldn't even be a consideration (in
fact, for this application HyperCard is the ONLY consideration in my
opinion...).

Peter
--
Peter "Arrgh" Korn
korn@ucbvax.Berkeley.EDU
{decvax,hplabs,sdcsvax,ulysses,usenix}!ucbvax!korn

stein@premise.ZONE1.COM (Rich Epstein) (09/25/88)

In article <69189@sun.uucp> landman@sun.UUCP (Howard A. Landman) writes:
>In article <52305GFX@PSUVM> GFX@PSUVM.BITNET writes:
>>We use a rather large database (500,000 + records) holding  a few
>>variables ( 25 +/-) related to industrial establishments.  We manage
>>it with SAS on an IBM mainframe, but are curious as to whether such
>>a dataBase could be installed in an HyperCard environment to provide
>>interactive queries (a rarity, but nonetheless...)
>
>>Any other approach you think is superior?
>
A possible solution is a product called OSTAT (Oracle/SAS Interface)
from Software Interfaces (713 460-0707). I have not used the product
but the literature claims if is available under VM/CMS and MVS.

The general idea is to allow easy transfer of SAS data to an Oracle
database. Oracle has its own tools for interactive queries. In fact,
I believe Oracle has announced a Hypercard interface to their own
database. 

-Richard W. Epstein, Robin Computing Inc. Arlington MA
-(guest at Premise)