grady@fx.com (Steven Grady) (11/21/90)
grep is a useful unix utility that has a parallel perl operator. What about uniq? I'd like to see uniq (or something) that strips out the duplicates in an array. It would be cool if it were built into perl. Meanwhile, has anyone written an efficient routine to do this? Linear time would be nice. I'd like one that doesn't require the array to be sorted. I could write one myself, but I thought I'd avoid duplicating code.. -- Steven grady@fx.com Birds of prey know they're cool.
chris@utgard.uucp (Chris Anderson) (11/21/90)
In article <1990Nov21.021344.16038@fxgrp.fx.com> grady@postgres.berkeley.edu writes: >grep is a useful unix utility that has a parallel perl operator. >What about uniq? I'd like to see uniq (or something) that strips >out the duplicates in an array. It would be cool if it were >built into perl. I asked this myself several weeks ago ... here's the answer I used from the ones presented by the net. sub uniq { local (@in) = @_; local (%ary, @out); undef %ary; @ary{@in} = (); @out = keys(%ary); @out; } Hope this helps Chris
tchrist@convex.COM (Tom Christiansen) (11/21/90)
In article <1990Nov21.021344.16038@fxgrp.fx.com> grady@postgres.berkeley.edu writes:
=grep is a useful unix utility that has a parallel perl operator.
=What about uniq? I'd like to see uniq (or something) that strips
=out the duplicates in an array. It would be cool if it were
=built into perl.
=Meanwhile, has anyone written an efficient routine to do this?
=Linear time would be nice. I'd like one that doesn't require the
=array to be sorted. I could write one myself, but I thought
=I'd avoid duplicating code..
This is turning into a FAQ, isn't it?
In <9952@jpl-devvax.JPL.NASA.GOV> on 12 Oct 90, Larry wrote a good
article on this. He covers these case:
1) If @in is sorted:
$prev = 'nonesuch';
@out = grep($_ ne $prev && (($prev) = $_), @in);
2) If we don't know whether @in is sorted:
undef %saw;
@out = grep(!$saw{$_}++, @in);
3) If we don't know if @in is sorted, nor case whether @out is:
undef %ary;
@ary{@in} = ();
@out = keys(%ary);
(I usually use #3.)
And then he points out that if you know that @in contains
only small positive integers, you can use:
@out = grep(!$saw[$_]++, @in);
for case 2, and for case 3:
@ary[@in] = @in;
@out = sort @ary;
I guess I'll add it to the FAQ.
--tom
lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (11/22/90)
In article <1990Nov21.021344.16038@fxgrp.fx.com> grady@postgres.berkeley.edu writes: : grep is a useful unix utility that has a parallel perl operator. : What about uniq? I'd like to see uniq (or something) that strips : out the duplicates in an array. It would be cool if it were : built into perl. : : Meanwhile, has anyone written an efficient routine to do this? : Linear time would be nice. I'd like one that doesn't require the : array to be sorted. I could write one myself, but I thought : I'd avoid duplicating code.. sub uniq { local(%seen); grep(!$seen{$_}++, @_); } @out = &uniq(@in); It's more-or-less linear, apart from the extra time to bifurcate %seen occasionally as it's growing. I don't know whether you want to call it efficient--it's going to depend on how quickly your machine can calculate the hash function. It may be faster to sort and compare adjacent values, depending... Larry
jgk@osc.COM (Joe Keane) (11/30/90)
In article <1990Nov21.021344.16038@fxgrp.fx.com> grady@postgres.berkeley.edu writes: >grep is a useful unix utility that has a parallel perl operator. >What about uniq? I'd like to see uniq (or something) that strips >out the duplicates in an array. It would be cool if it were >built into perl. In article <10501@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes: >sub uniq { > local(%seen); > grep(!$seen{$_}++, @_); >} Note that the Unix `uniq' utility is defined to remove _adjacent_ duplicate lines, which is an easier task than removing _all_ duplicate lines. I don't know which the original poster wanted. >It's more-or-less linear, apart from the extra time to bifurcate %seen >occasionally as it's growing. Why not linear? You increase the size exponentially, don't you?