rahardj@ccu.umanitoba.ca (Budi Rahardjo) (05/06/91)
How do I get a fast pattern matching in a large associative array ? I have a large associative array (approx. 17000), read from a dbm file. I'd like to match a pattern with the contents of the array. Assigning the index/keys to an array takes forever ... , never mind doing the matching. dbmopen(RECORD,"dbtest",0666) || die "Can't open dbm file"; @k = keys(%RECORD); <====== this takes almost 9 seconds !! For comparison, doing a grep thru a flatfile (with the same contents) takes 2 secs. If I do several "@k = keys(%RECORD);", this is going to be a problem. Is there a way I can iterate over dbm file without stepping thru the index/keys ? (Basically doing a 'grep' on the contents). Or maybe read the raw-dbm file (without the index), ie treat it as a flat file ... Thanks, -- budi
merlyn@iwarp.intel.com (Randal L. Schwartz) (05/07/91)
In article <1991May6.163310.21805@ccu.umanitoba.ca>, rahardj@ccu (Budi Rahardjo) writes: | For comparison, doing a grep thru a flatfile (with the same contents) | takes 2 secs. | | If I do several "@k = keys(%RECORD);", this is going to be | a problem. | | Is there a way I can iterate over dbm file without stepping thru | the index/keys ? (Basically doing a 'grep' on the contents). | Or maybe read the raw-dbm file (without the index), ie treat it | as a flat file ... You mean, like: while (($key,$val) = each(%RECORD)) { if ($key =~ /pattern/) { "someaction"; } } or do you mean something else? @_{4,3,2,1}=("hacker,","Perl ","another ","Just ");print while ($_,$_)=each _; -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/07/91)
As quoted from <1991May6.163310.21805@ccu.umanitoba.ca> by rahardj@ccu.umanitoba.ca (Budi Rahardjo): +--------------- | Assigning the index/keys to an array takes forever ... , never mind | doing the matching. +--------------- Standard dbm is slow. +--------------- | Is there a way I can iterate over dbm file without stepping thru | the index/keys ? (Basically doing a 'grep' on the contents). | Or maybe read the raw-dbm file (without the index), ie treat it | as a flat file ... +--------------- Dbm files aren't stored in any format that's meaningful to humans. You can combine the search with the processing, if it will make things faster: the "each" operator returns the next available value each time it is called, instead of doing it all at once. Check the manual. ++Brandon -- Me: Brandon S. Allbery Ham: KB8JRR/AA 10m,6m,2m,220,440,1.2 Internet: allbery@NCoast.ORG (restricted HF at present) Delphi: ALLBERY AMPR: kb8jrr.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery KB8JRR @ WA8BXN.OH
rahardj@ccu.umanitoba.ca (Budi Rahardjo) (05/07/91)
Just another perl hacker (Randal L. Schwartz) writes: ...[deleted, read previous posting] ... >You mean, like: >while (($key,$val) = each(%RECORD)) { > if ($key =~ /pattern/) { > "someaction"; > } >} actually it's more like ... if ($val =~ /$pattern/) { ... It's so slow. I am tempted to dump the dbm file into a flatfile (keep 2 files, dbm and flat file), and grep the flatfile. -- budi
oz@yunexus.yorku.ca (Ozan Yigit) (05/07/91)
Brandon S. Allbery writes: > Standard dbm is slow. The dbm/ndbm package (as with any other external hashing scheme) is designed for a particular type of data access, and the result is a single [on the average] access to database for locating a key and its associated data. There may actually be a a slower re-implementation of dbm/ndbm out there, but in general, the dbm/ndbm and their re-implementations exist exactly because they work extremely well for the type of access they were designed for. So, I can only assume that when you say standard dbm is slow, you really mean a sequential pass over a database, which dbm is *not* designed for, is slow. In tested this assertion for the average use and I found that the usual sequential scan of keys in the entire dict/words-generated database with firstkey/nextkey/strncmp (using sdbm) is about 4 times slower than a fread/strncmp loop over the entire flat version of the database. I have not tested the regular ndbm. [2] On the other hand, BSD HASH [1] with ndbm compatibility routines ran as fast as the fread/strncmp loop. I have no idea why perl would be "so slow" in its sequential access to the database. oz --- [1] Seltzer, Margo and Ozan Yigit, ``A New Hash Package for UNIX'' Proceedings of the USENIX Technical Conference, Dallas, Texas. January 1991 [2] sdbm reads the entire database block-by-block during a sequential scan, but cannot avoid holes, ndbm uses a very clever hash value reconstruction algorithm to access only the occupied blocks, but is not noticably faster. --- SunOS: There are many edible ways to serve | internet: oz@nexus.yorku.ca up scrambled eggs, but one cannot get them | uucp: utzoo/utai!yunexus!oz back as over-easy or sunny side up. | phone: [416] 736 5257x33976
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/08/91)
As quoted from <22673@yunexus.YorkU.CA> by oz@yunexus.yorku.ca (Ozan Yigit): +--------------- | Brandon S. Allbery writes: | | > Standard dbm is slow. | | So, I can only assume that when you say standard dbm is slow, you really | mean a sequential pass over a database, which dbm is *not* designed for, | is slow. +--------------- I mean that *general* database operations, such as the original poster is asking for, are slow. dbm/ndbm have their place, but that place isn't and isn't intended to be as a replacement for a regular database manager. Or for sequential or relative access files. ++Brandon -- Me: Brandon S. Allbery Ham: KB8JRR/AA 10m,6m,2m,220,440,1.2 Internet: allbery@NCoast.ORG (restricted HF at present) Delphi: ALLBERY AMPR: kb8jrr.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery KB8JRR @ WA8BXN.OH
oz@yunexus.yorku.ca (Ozan Yigit) (05/08/91)
Brandon S. Allbery writes: >dbm/ndbm have their place, but that place isn't and >isn't intended to be as a replacement for a regular database manager. I know, and I thought I did say something like that. ;-) Maybe Larry may want to look into Marcus J. Ranum's b+tree package as a more general-purpose database package for more general-purpose uses that perl users seem to want. oz