[comp.lang.perl] The best way to search an array ?

wengland@stephsf.stephsf.com (Bill England) (11/26/90)

  What is the best method for discovering if a string is in
  a given set of strings?  For example say you want to verify
  that the string 'WA' is a valid two char state abreviation?

  I can think of at least four methods ...

  Method      Description

    1       Search a string that contains all two letter state names;
            Here you would prefix and suffix the state var and
            then search the string.  Something like ;
			...
			$state = '~AK~AL~AR~AZ~CA~CO~...';
			$search_string = '~'.$prove_state.'~';
            if( /$search_string/ eq $state) { ... }

    2       Build an associative array %state so that you have;
            $state{'AK'}="Alaska";
            $state{'AL'}="Alabama";
            $state{'AR'}="Arkansas";
            $state{'AZ'}="Arizona";
            $state{'CA'}="California";
            ...
            Then test for the existance of $state{$prove_state}.
            ( Has disadvantage of extra bagage if you don't need the
              states full name.   )

    3       Build a simple array and then search with a foreach
                @state = ('AK','AL','AR','AZ','CA', ...);
                foreach $st(@state){
                    last if $st eq $prove_state;
                }

    4       The same as 3 above but, use a binary search instead of
            a linear search.



   Which of the above would be the best to use in perl? 

   Is there a better way ???

 +--------
 |  Bill England
 |  wengland@stephsf.COM
 |
  * *      H -> He +24Mev
 * * * ... Oooo, we're having so much fun making itty bitty suns *
  * *

chris@utgard.uucp (Chris Anderson) (11/29/90)

In article <443@stephsf.stephsf.com> wengland@stephsf.stephsf.com (Bill England) writes:
>  What is the best method for discovering if a string is in
>  a given set of strings?  For example say you want to verify
>  that the string 'WA' is a valid two char state abreviation?
>
>  I can think of at least four methods ...

[methods deleted]

>    2       Build an associative array %state so that you have;
>            $state{'AK'}="Alaska";
>            $state{'AL'}="Alabama";
>            $state{'AR'}="Arkansas";
>            $state{'AZ'}="Arizona";
>            $state{'CA'}="California";
>            ...
>            Then test for the existance of $state{$prove_state}.
>            ( Has disadvantage of extra bagage if you don't need the
>              states full name.   )

Naah.  You don't need the extra baggage:

			foreach $one (@all_the_states)  {
				$states{$one}++;
			}

			or

			undef %states;
			@states{@all_the_states} = ();

Either way, you don't have the baggage and you can find the name
easily and quickly.  Associative arrays are great!

Chris
-- 
+---------------------------------------------------------------+
|  Chris Anderson, QMA, Inc.      utgard!chris@csusac.csus.edu  |
|      My employer doesn't listen to me... why should you?      |
+---------------------------------------------------------------+

tchrist@convex.COM (Tom Christiansen) (11/30/90)

In article <1990Nov28.160822.21351@utgard.uucp> chris@utgard.uucp (Chris Anderson) writes:
>			undef %states;
>			@states{@all_the_states} = ();
>
>Either way, you don't have the baggage and you can find the name
>easily and quickly.  Associative arrays are great!

Yes, they're great, but the recent patches have made the quoted usage of
questionable utility.  This I find highly unfortunate, since it's so very
useful.

The problem is Larry made it so you can have a key defined, but whose
value is undefined.  This means strange things like this happen:

    @colors = ( 'red', 'blue', 'green' );
    @seen{@colors} = ();
    print "defined\n" 		if defined $seen{'blue'};
    print "undefined\n" 	unless defined $seen{'blue'};

That now prints undefined, although it used to print defined.

You can still say 
    for (@seen) { $colors{$_}++; }
with an explicit loop, but I really like the @foo{@bar} = () usage.

I can't figure out WHY you want entries in an associative arrays
with undefined values.  Does doing that really make sense??

Larry addressed this in <10334@jpl-devvax.JPL.NASA.GOV> on 12 Nov 90
19:22:14 GMT, in a message entitled "Re: defined() on an undef array
element created via slice."  After granting that correct interpretation
of this is a judgment call, he pointed out that it's nice to distinguish
between

    @foo{"bar"} = ();

and

    @foo{"bar"} = ('');

(I still get nervous with that usage because despite the parens, 
 novices will think it a scalar assignment as there's only one element
 in the slice.)

But what really bothers me is that 

    @seen{@colors} = ('');

makes red defined and blue not, but they're both valid keys, and this
isn't what I call following the principle the least surprise.

    @seen{@colors} = (1..$#colors);

is a hack that will get you by, but I still prefer the prior behavior.
Maybe I just got too used to it.


--tom

markb@agora.uucp (Mark Biggar) (12/03/90)

In article <443@stephsf.stephsf.com> wengland@stephsf.stephsf.com (Bill England) writes:
>
>  What is the best method for discovering if a string is in
>  a given set of strings?  For example say you want to verify
>  that the string 'WA' is a valid two char state abreviation?
>
>  I can think of at least four methods ...
>
>  Method      Description
>
>    1       Search a string that contains all two letter state names;
>            Here you would prefix and suffix the state var and
>            then search the string.  Something like ;
>			...
>			$state = '~AK~AL~AR~AZ~CA~CO~...';
>			$search_string = '~'.$prove_state.'~';
>            if( /$search_string/ eq $state) { ... }
>
If you search set is small this may be the bets way, but use index() instead of
of an regular expression, like so:

	     if (index($state, "~$prove_state~") > = 0)

Besides your if condition is wrong, it should be:

	     if ($state =~ /~$prove_state~/)

>    2       Build an associative array %state so that you have;
>            $state{'AK'}="Alaska";
>            $state{'AL'}="Alabama";
>            $state{'AR'}="Arkansas";
>            $state{'AZ'}="Arizona";
>            $state{'CA'}="California";
>            ...
>            Then test for the existance of $state{$prove_state}.
>            ( Has disadvantage of extra bagage if you don't need the
>              states full name.   )
>
if you don't need the full names just set the associated array value to 1.
this is a very good way to go if the search set is generated by your
program.  If you have a very large constant search set using a dbm file
for it may be the best way to go.

>    3       Build a simple array and then search with a foreach
>                @state = ('AK','AL','AR','AZ','CA', ...);
>                foreach $st(@state){
>                    last if $st eq $prove_state;
>                }
>
This is only good if you need the array for other reasons, such as an array
of month names that you wish to index by month number.

>    4       The same as 3 above but, use a binary search instead of
>            a linear search.
>
Same problems as with 3.

>   Which of the above would be the best to use in perl? 
>
>   Is there a better way ???

That pretty much covers it.
--
Perl's maternal uncle
Mark Biggar

wengland@stephsf.stephsf.com (Bill England) (12/13/90)

In article <1990Dec3.035541.24184@agora.uucp> markb@agora.uucp (Mark Biggar) writes:
>In article <443@stephsf.stephsf.com> wengland@stephsf.stephsf.com (Bill England) writes:
>
 [...]
>if you don't need the full names just set the associated array value to 1.
>this is a very good way to go if the search set is generated by your
>program.  If you have a very large constant search set using a dbm file
>for it may be the best way to go.
>
>--
>Perl's maternal uncle
>Mark Biggar


  Thanks for the ideas.  I've attached the resulting routine to
  the end of this  article.  I'm not entirely comforatable with
  referancing an array that has been undef'ed yet. ;-)

----- vrfy_state.pl -----
package state_check;
##
 # Copyright(c) 1991, Stephen Software Systems, Inc.
 # This software may be modified and copied under the terms of
 # the GNU Public License.
 #
 # State Check Library
 #   Given a USA $state determine if it is correct.
 #   Return a postmaster approved two letter abreviation.
 #   $state may be either states full name or two letter abreviation.
 #
 #    WSE Created Wed Nov 28 14:08:25 PST 1990
 #
 #    Send problems, suggestions, errors, and 
 #    heresys to support@stephsf.COM.
##

%st_name = split(/!/,
'ALASKA!AK!ALABAMA!AL!ARKANSAS!AR!ARIZONA!AZ!CALIFORNIA!CA!COLORADO!CO!CONNECTICUT!CT!DELAWARE!DE!FLORIDA!FL!GEORGIA!GA!HAWAII!HI!IOWA!IA!IDAHO!ID!ILLINOIS!IL!INDIANA!IN!KANSAS!KS!KENTUCKY!KY!LOUISIANA!LA!MASSACHUSETTS!MA!MARYLAND!MD!MAINE!ME!MICHIGAN!MI!MINNESOTA!MN!MISSOURI!MO!MISSISSIPPI!MS!MONTANA!MT!NEBRASKA!NB!NORTH CAROLINA!NC!NORTH DAKOTA!ND!NEW HAMPSHIRE!NH!NEW JERSEY!NJ!NEW MEXICO!NM!NEVADA!NV!NEW YORK!NY!OHIO!OH!OKLAHOMA!OK!OREGON!OR!PENNSYLVANIA!PA!RHODE ISLAND!RI!SOUTH CAROLINA!SC!SOUTH DAKOTA







!SD!TENNESSEE!TN!TEXAS!TX!UTAH!UT!VIRGINIA!VA!VERMONT!VT!WASHINGTON!WA!WISCONSIN!WI!WEST VIRGINIA!WV!WYOMING!WY');

%st_abr = split(/( )/,'AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NB NC ND NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY ');

##
 # State_Check
 #   In:  usa_state (two letter abreviation, or full state name )
 #   Out: () || ST(Usa two letter state abreviation)
##
sub main'State_Check {
	local($state) = @_;
	$state =~ y/a-z/A-Z/;

	return $state            if defined($st_abr{$state});
	return $st_name{$state}  if defined($st_name{$state});
	return ();
}

-- 
 +-  Bill England,  wengland@stephsf.COM -----------------------------------+
 |   * *      H -> He +24Mev                                                |
 |  * * * ... Oooo, we're having so much fun making itty bitty suns *       |
 |__ * * ___________________________________________________________________| 

tchrist@convex.COM (Tom Christiansen) (12/14/90)

From the keyboard of wengland@stephsf.stephsf.com (Bill England):
:sub main'State_Check {
:	local($state) = @_;
:	$state =~ y/a-z/A-Z/;
:
:	return $state            if defined($st_abr{$state});
:	return $st_name{$state}  if defined($st_name{$state});
:	return ();
:}
:

Hmm... shouldn't that last one be:

	return '';  # or return undef

or since explicit returns cost more than implicit ones:

	'';   	    # or undef

The null _scalar_ value in perl is '' (defined) or undef (undefined),
whereas () is an empty list.  It probably will work out ok as is, but it's
one of those scalar/vector confusion things that always make me at least a
little nervous, like saying @x[1] when you mean $x[1].

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me