alexis@dasys1.UUCP (Alexis Rosen) (06/22/88)
[Line eater? Ha! No suc In a previous message on comp.sys.mac I claimed that HyperCard would probably be sufficient for searching large stacks of the size someone else had discussed, but that I had no hard data to back this up. I promised that I would provide this as soon as I got my system up. Well, two days ago I dumped 33,327 records out of FoxBase+/Mac into a text file (tab-delimited). This file was 4,497,393 bytes long. (FoxBase took less than two minutes to dump it.) Importing it to HyperCard was trivial, but it did require a script. Importing _ANYTHING_ into HyperCard requires a script. Unfortunately, HyperCard crapped out after importing 19,017 records. I don't know why; it said something like "unexpected error 1837". I feel like an idiot for not writing it down exactly, but I was too annoyed at the time. Anyway, importing 19k records took about 4 hours, give or take 15%. The stack size is now 5,283,840 bytes. From these numbers it appears that for fairly normal text, stacks will be roughly twice the size of their straight-ASCII equivalents. Not an unreasonable trade-off, I think. After the error, I quit and re-launched HyperCard. It gave me an error, saying that the number of cards in the file was 'wrong'. After that, though, there was no problem, so I assume it just detected an error in its internal book- keeping and corrected the problem automagically (it did take a few moments for the disk drive to settle down after the error message). On to the good stuff... At first I was disappointed by HC's performance. It took about 45 seconds to find a word unique to the 19,017th card the first time (after going there once, re-finds were very quick, undoubtedly due to caching). This is NOT the whole story, however. One very odd thing I noticed was that HyperCard kept my disk drive going all the time. It reacted to user events in a perfectly normal fashion, stopping disk access while responding, and then started seeking on the disk again... It became clear to me that this was normal HC operation, and that I had never observed it before because I never used a 20,000 card stack before. When I gave it the minute (or so) it needed, it stopped seeking and left the drive alone. After that, searches were *MUCH* faster than I had first experienced. The same find of a unique word on a previously-unseen card (#19,017) took only a few seconds. *ALL* finds were enormously quicker. There is another factor as well. I know I saw it mentioned some time ago (perhaps by Steve Maller at Apple), and my experience certainly supports it. HyperCard searches using three-letter units. The more you give it, the faster it will find what you're looking for. If you know you're looking for a jazz CD named 'Living in the Crest of a Wave' by 'Bill Evans', HyperCard can find that card if you say 'find "crest"'. It will do so several times faster if you say 'find "liv cre wav bil eva"'. Every word you give it helps it find things faster. I do not know how much effect four-letter-or-larger fragments have on seek times. I expect it is considerably favorable only if there are many cards which are sufficiently close to the find string so that they would be matches if only three-letter fragments were given. This is the least clear of all my guesses, though, and could be totally wrong. There is a potential problem, however. Obviously all of that background disk access was HC loading pointers (or whatever) into memory, so that when I gave it time to load them all up at the beginning, it was much faster. For 19,017 records, 750K of memory (HC's default MultiFinder partition) was not sufficient to cache everything, and there were no dramatic speed gains. At a partition of 1.0 MB (which is a bit more than you get running UniFinder on a 1MB machine), almost all RAM space was used. So, for ~20,000 records, I guess that HC needs about 300K of RAM over the 750K default to work at its best. Perhaps a 1MB machine will be sufficient for up to 15,000 records? (Note that all these numbers apply to a stack where each record is of approximately the same size as the ones I used. That comes out to about 135 bytes of ASCII text per card, or 270 bytes of stack space per card on the disk.) All of the tests I performed were on a 4MB Mac II. How does this translate to a Mac Plus or a Mac SE? Probably better than you'd think, as long as the Mac has sufficient memory. What is sufficient? The guidelines above should work as a pretty good rule of thumb. For find commands given more than three words, my Mac II almost always found stuff within two to three seconds CAVEATS: 1) For stacks with unbalanced word distributions (word X shows up in every stack), finding a word group which contains several unusually common word- fragments and one rare one will definitely be slower than if all were rare. In other words, the more unique the words and their combination, the faster HyperCard will be. This seems eminently reasonable to me... 2) HyperCard likes to cache whole cards, I think. Certainly it likes to cache bitmaps. So if you go through many cards in one session, performance may degrade. 'Many' depends on the contents of the cards and the amount of free RAM you have. I haven't actually tested this, but if seems likely. 3) I don't know if performance changes drastically with the ratio of cards to text in the stack, but I would bet the difference isn't major. Probably it gets somewhat faster with less text per card, net stack size staying the same. I could easily be wrong on this one, though. 4) At all times the command 'Go Card {actual card number}' executed instant- aneously. (I would guess that pointers to each card within the file are loaded into memory on startup. At 4 bytes apiece 20,000 cards comes to under 20K RAM). 5) I used version 1.0.1 for these tests. It is known to be much slower than version 1.2 et al. Nevertheless, I don't think that there is any performance difference between the two, in this text-search situation. 6) Version 1.2 is *NOT* the latest version. That honor goes to 1.2.1. This version corrects three bugs in V1.2, the most important of which is V1.2's tendency to crash with stacks over 8000-odd cards (8191?). This might be the fix for my original problem of not being able to import all of the 33,327 lines in my text file (earlier versions shared this bug). Then again, maybe not... SUMMARY: For production database work with large datasets, don't even THINK of using HyperCard. In such situations, seek times must be measured in milliseconds, not seconds. For serious database work on the Mac, there is only one choice... FoxBase. For PERSONAL use of mostly-text stacks up to 4 MBytes, any 1MB Mac should be sufficient for for the job. A decent hard disk is a must, of course. The new fast 30 MB drives, available for ~$650 street, should be plenty. This won't be a speed demon, but if you need to access less than 5-10 cards a minute you should have no trouble whatsoever. Of course, the faster the hardware, the better it gets... Any comments, discussions, or corrections to this article are welcome. Note that this was posted to a fairly broad range of groups; restrict your follow-ups if appropriate. I answer all mail, so if you don't hear from me try another path or just send it again, since the local mailer is a trifle erratic sometimes. I make no guarantees about this analysis or the performance of HyperCard. I have no affiliations with anyone. So don't bother them, either... -- Alexis Rosen {allegra,philabs,cmcl2}!phri\ Writing from {bellcore,harpo,cmcl2}!cucard!dasys1!alexis The Big Electric Cat {portal,well,sun}!hoptoad/ Public UNIX if mail fails: ...cmcl2!cucard!cunixc!abr1
dan@Apple.COM (Dan Allen) (06/25/88)
In article <5100@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes: >5) I used version 1.0.1 for these tests. It is known to be much slower than >version 1.2 et al. Nevertheless, I don't think that there is any performance >difference between the two, in this text-search situation. >seconds. For serious database work on the Mac, there is only one choice... 1.2 has speeded up the text-search situation. But in order for it to be faster, you must do a Compact Stack TWICE on the stack. It will then search 6 times faster. Dan Allen Apple Computer
kurtzman@pollux.usc.edu (Stephen Kurtzman) (06/25/88)
>1.2 has speeded up the text-search situation. But in order for it to be >faster, you must do a Compact Stack TWICE on the stack. It will then >search 6 times faster. Wow, I'll bet it really goes fast if compact FOUR times! :-) Seriously, is that 6x figure based on a mathematical analysis of the algorithm or benchmarks. I have noticed the searches are faster, but mine do not seem to be 6 times faster.