[comp.lang.perl] Uniq, anyone?

chris@utgard.uucp (Chris Anderson) (10/12/90)

On occasion, I find myself wanting something on the order of:

	@foo = (1,1,2,2,3,3);
	@bar = &uniq (@foo);

Where @bar contains: 1 2 3

I've made a couple of stabs at coding this and haven't come
up with anything that's very efficient.  Has anyone made such
a beast?

Chris

-- 
| Chris Anderson                                                       |
| QMA, Inc.                     email : {csusac,sactoh0}!utgard!chris  |
|----------------------------------------------------------------------|
| My employer never listens to me, so why should he care what I say?   |

jgreely@morganucodon.cis.ohio-state.edu (J Greely) (10/13/90)

In article <1990Oct12.075314.15054@utgard.uucp> chris@utgard.uucp
 (Chris Anderson) writes:
>On occasion, I find myself wanting something on the order of:
>	@foo = (1,1,2,2,3,3);
>	@bar = &uniq (@foo);

This seems to do it, although it strips null fields out entirely (I
call that a feature, myself):

sub uniq {
	local($o);
	grep(($_ ne $o) && defined($o = $_),sort @_);
}
--
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/13/90)

In article <1990Oct12.075314.15054@utgard.uucp> chris@utgard.uucp (Chris Anderson) writes:
: On occasion, I find myself wanting something on the order of:
: 
: 	@foo = (1,1,2,2,3,3);
: 	@bar = &uniq (@foo);
: 
: Where @bar contains: 1 2 3
: 
: I've made a couple of stabs at coding this and haven't come
: up with anything that's very efficient.  Has anyone made such
: a beast?


@in = (1,1,2,2,3,3);

# If we know that @in is sorted:

$prev = 'nonesuch';
@out = grep($_ ne $prev && (($prev) = $_), @in);
print "@out\n";

# If we don't know whether @in is sorted:

undef %saw;
@out = grep(!$saw{$_}++, @in);
print "@out\n";

# If you don't know about @in, and don't care whether @out is sorted:

undef %ary;
@ary{@in} = ();
@out = keys(%ary);
print "@out\n";

# Same as above, but assuming that all values of @in are small positive ints.

$prev = 'nonesuch';
@out = grep($_ != $prev && (($prev) = $_), @in);
print "@out\n";

undef @saw;
@out = grep(!$saw[$_]++, @in);
print "@out\n";

undef @ary;
@ary[@in] = @in;
@out = sort @ary;
print "@out\n";

Larry

markb@agora.uucp (Mark Biggar) (10/14/90)

In article <JGREELY.90Oct12142916@morganucodon.cis.ohio-state.edu> J Greely <jgreely@cis.ohio-state.edu> writes:
>In article <1990Oct12.075314.15054@utgard.uucp> chris@utgard.uucp
> (Chris Anderson) writes:
>>On occasion, I find myself wanting something on the order of:
>>	@foo = (1,1,2,2,3,3);
>>	@bar = &uniq (@foo);
>
>This seems to do it, although it strips null fields out entirely (I
>call that a feature, myself):
>
>sub uniq {
>	local($o);
>	grep(($_ ne $o) && defined($o = $_),sort @_);
>}

This requires sorting the list before uniq'ing it, which make the algorithm
nlogn.  The following algorithm is better:

sub uniq {
	local(%item) = ();
	for (@_) {
		$item{$_} = 1;
	}
	keys(%item);
}

This produces an unsorted result that can of course can be sorted.
--
Perl's Maternal Uncle
Mark Biggar

jgreely@morganucodon.cis.ohio-state.edu (J Greely) (10/15/90)

In article <1990Oct13.182400.22738@agora.uucp> markb@agora.uucp
 (Mark Biggar) writes:
>This requires sorting the list before uniq'ing it, which make the algorithm
>nlogn.  The following algorithm is better:

"Better" is a slippery word in algorithm analysis.  In this case,
you're ignoring the overhead of using an associative array, which can
be significant.  I find that when the array is filled with random
strings (for example, the lines of a fortune(6) file), the sort-and-
iterate method wins big (at n=32000, your solution failed with "Out of
memory!" after about 15 minutes, while mine completed in less than 30
seconds).

(Side note: you should declare $_ local in your solution, to avoid
stomping on its current value)
--
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)

markb@agora.uucp (Mark Biggar) (10/17/90)

In article <JGREELY.90Oct15112321@morganucodon.cis.ohio-state.edu> J Greely <jgreely@cis.ohio-state.edu> writes:
>(Side note: you should declare $_ local in your solution, to avoid
>stomping on its current value)

But a foreach loop iterator variable is automatically local to the loop
so no local() is needed.

--
Perl's Maternal Uncle
Mark Biggar

jgreely@morganucodon.cis.ohio-state.edu (J Greely) (10/19/90)

In article <1990Oct17.022227.1525@agora.uucp> markb@agora.uucp
 (Mark Biggar) writes:
>But a foreach loop iterator variable is automatically local to the loop
>so no local() is needed.

I just hunted through the patches; sure enough, it's documented as of
patchlevel 27.  Now that the dbm problems are fixed, we'll probably
catch up sometime next week.
--
J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)