[comp.databases] HyperCard performance analysis with large datasets: Some answers

alexis@dasys1.UUCP (Alexis Rosen) (06/22/88)

[Line eater? Ha! No suc

In a previous message on comp.sys.mac I claimed that HyperCard would probably
be sufficient for searching large stacks of the size someone else had
discussed, but that I had no hard data to back this up. I promised that I
would provide this as soon as I got my system up.

Well, two days ago I dumped 33,327 records out of FoxBase+/Mac into a text
file (tab-delimited). This file was 4,497,393 bytes long. (FoxBase took less
than two minutes to dump it.)

Importing it to HyperCard was trivial, but it did require a script. Importing
_ANYTHING_ into HyperCard requires a script.

Unfortunately, HyperCard crapped out after importing 19,017 records. I don't
know why; it said something like "unexpected error 1837". I feel like an
idiot for not writing it down exactly, but I was too annoyed at the time.
Anyway, importing 19k records took about 4 hours, give or take 15%. The stack
size is now 5,283,840 bytes. From these numbers it appears that for fairly
normal text, stacks will be roughly twice the size of their straight-ASCII
equivalents. Not an unreasonable trade-off, I think.

After the error, I quit and re-launched HyperCard. It gave me an error, saying
that the number of cards in the file was 'wrong'. After that, though, there
was no problem, so I assume it just detected an error in its internal book-
keeping and corrected the problem automagically (it did take a few moments
for the disk drive to settle down after the error message).

On to the good stuff...
At first I was disappointed by HC's performance. It took about 45 seconds to
find a word unique to the 19,017th card the first time (after going there once,
re-finds were very quick, undoubtedly due to caching). This is NOT the whole
story, however. One very odd thing I noticed was that HyperCard kept my disk
drive going all the time. It reacted to user events in a perfectly normal
fashion, stopping disk access while responding, and then started seeking on the
disk again... It became clear to me that this was normal HC operation, and that
I had never observed it before because I never used a 20,000 card stack before.

When I gave it the minute (or so) it needed, it stopped seeking and left the
drive alone. After that, searches were *MUCH* faster than I had first
experienced. The same find of a unique word on a previously-unseen card
(#19,017) took only a few seconds. *ALL* finds were enormously quicker.

There is another factor as well. I know I saw it mentioned some time ago
(perhaps by Steve Maller at Apple), and my experience certainly supports it.
HyperCard searches using three-letter units. The more you give it, the faster
it will find what you're looking for. If you know you're looking for a jazz
CD named 'Living in the Crest of a Wave' by 'Bill Evans', HyperCard can find
that card if you say 'find "crest"'. It will do so several times faster if you
say 'find "liv cre wav bil eva"'. Every word you give it helps it find things
faster. I do not know how much effect four-letter-or-larger fragments have on
seek times. I expect it is considerably favorable only if there are many cards
which are sufficiently close to the find string so that they would be matches
if only three-letter fragments were given. This is the least clear of all my
guesses, though, and could be totally wrong.

There is a potential problem, however. Obviously all of that background disk
access was HC loading pointers (or whatever) into memory, so that when I gave
it time to load them all up at the beginning, it was much faster. For 19,017
records, 750K of memory (HC's default MultiFinder partition) was not sufficient
to cache everything, and there were no dramatic speed gains. At a partition of
1.0 MB (which is a bit more than you get running UniFinder on a 1MB machine),
almost all RAM space was used. So, for ~20,000 records, I guess that HC needs
about 300K of RAM over the 750K default to work at its best. Perhaps a 1MB
machine will be sufficient for up to 15,000 records? (Note that all these
numbers apply to a stack where each record is of approximately the same size as
the ones I used. That comes out to about 135 bytes of ASCII text per card, or
270 bytes of stack space per card on the disk.)

All of the tests I performed were on a 4MB Mac II. How does this translate to
a Mac Plus or a Mac SE?  Probably better than you'd think, as long as the Mac
has sufficient memory. What is sufficient? The guidelines above should work as
a pretty good rule of thumb. For find commands given more than three words, my
Mac II almost always found stuff within two to three seconds

CAVEATS:
1) For stacks with unbalanced word distributions (word X shows up in every
stack), finding a word group which contains several unusually common word-
fragments and one rare one will definitely be slower than if all were rare.
In other words, the more unique the words and their combination, the faster
HyperCard will be. This seems eminently reasonable to me...
2) HyperCard likes to cache whole cards, I think. Certainly it likes to cache
bitmaps. So if you go through many cards in one session, performance may
degrade. 'Many' depends on the contents of the cards and the amount of free
RAM you have. I haven't actually tested this, but if seems likely.
3) I don't know if performance changes drastically with the ratio of cards to
text in the stack, but I would bet the difference isn't major. Probably it gets
somewhat faster with less text per card, net stack size staying the same. I
could easily be wrong on this one, though.
4) At all times the command 'Go Card {actual card number}' executed instant-
aneously. (I would guess that pointers to each card within the file are loaded
into memory on startup. At 4 bytes apiece 20,000 cards comes to under 20K RAM).
5) I used version 1.0.1 for these tests. It is known to be much slower than
version 1.2 et al.  Nevertheless, I don't think that there is any performance
difference between the two, in this text-search situation.
6) Version 1.2 is *NOT* the latest version. That honor goes to 1.2.1.  This
version corrects three bugs in V1.2, the most important of which is V1.2's
tendency to crash with stacks over 8000-odd cards (8191?). This might be the
fix for my original problem of not being able to import all of the 33,327 lines
in my text file (earlier versions shared this bug). Then again, maybe not...


SUMMARY:
For production database work with large datasets, don't even THINK of using
HyperCard. In such situations, seek times must be measured in milliseconds, not
seconds. For serious database work on the Mac, there is only one choice...
FoxBase.

For PERSONAL use of mostly-text stacks up to 4 MBytes, any 1MB Mac should be
sufficient for for the job. A decent hard disk is a must, of course. The new
fast 30 MB drives, available for ~$650 street, should be plenty. This won't be
a speed demon, but if you need to access less than 5-10 cards a minute you
should have no trouble whatsoever.

Of course, the faster the hardware, the better it gets...

Any comments, discussions, or corrections to this article are welcome.
Note that this was posted to a fairly broad range of groups; restrict your
follow-ups if appropriate.

I answer all mail, so if you don't hear from me try another path or just send
it again, since the local mailer is a trifle erratic sometimes.
I make no guarantees about this analysis or the performance of HyperCard.
I have no affiliations with anyone. So don't bother them, either...

-- 
Alexis Rosen                       {allegra,philabs,cmcl2}!phri\
Writing from                       {bellcore,harpo,cmcl2}!cucard!dasys1!alexis
The Big Electric Cat                  {portal,well,sun}!hoptoad/
Public UNIX                         if mail fails: ...cmcl2!cucard!cunixc!abr1

dan@Apple.COM (Dan Allen) (06/25/88)

In article <5100@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:
>5) I used version 1.0.1 for these tests. It is known to be much slower than
>version 1.2 et al.  Nevertheless, I don't think that there is any performance
>difference between the two, in this text-search situation.
>seconds. For serious database work on the Mac, there is only one choice...

1.2 has speeded up the text-search situation.  But in order for it to be
faster, you must do a Compact Stack TWICE on the stack.  It will then
search 6 times faster.

Dan Allen
Apple Computer

kurtzman@pollux.usc.edu (Stephen Kurtzman) (06/25/88)

>1.2 has speeded up the text-search situation.  But in order for it to be
>faster, you must do a Compact Stack TWICE on the stack.  It will then
>search 6 times faster.

Wow, I'll bet it really goes fast if compact FOUR times! :-)

Seriously, is that 6x figure based on a mathematical analysis of the
algorithm or benchmarks. I have noticed the searches are faster, but mine
do not seem to be 6 times faster.