[comp.lang.perl] Manipulating strings

setzer@matagh.ncsu.edu (Th'PoC) (05/24/91)

The problem: I have four files of identical length (< 5k).  Each
contains 8 bit data, but do not contain any of `A'..`Z'.  At any fixed
offset into each file, either 1) all four characters are identical, or
2) one or more of the characters has the eighth bit set.  I want to
generate an "overlay", i.e., if (1), output the character; if (2),
output a letter in `A'..`P' that will tell me which files had the
eighth bit set.

The [*ugly*] solution:  I `open'ed the four files, slurped each
of them into an array, and looped over the length of the array:

--
open(A, "< a") || die "can't open 'a'";  @a=split(//,join("",<A>));
# do the same for files "b", "c", and "d"

for ($i=0; $i < $#a+1; $i++) { # do the `if's on `ord($a[$i])&0x80' etc.
--

The question: Is there a "prettier" way to do this?  (There must be.
Isn't there a perl axiom that if you have to use an indexing variable,
you're doing it the "wrong" way?)  It comes down to simple string
manipulation, but I can't see a way to use any of the implicit looping
constructs (`for (@a)', `foreach ...').  I would appreciate insight
into an alternate way to solve the problem.  Thanks.

Wm.
--
If the fundamentalists don't hate you, you have the wrong lifestyle.
  -- James Nicoll

allbery@NCoast.ORG (Brandon S. Allbery KF8NH) (05/24/91)

As quoted from <SETZER.91May23110013@matagh.ncsu.edu> by setzer@matagh.ncsu.edu (Th'PoC):
+---------------
| The question: Is there a "prettier" way to do this?  (There must be.
| Isn't there a perl axiom that if you have to use an indexing variable,
| you're doing it the "wrong" way?)  It comes down to simple string
| manipulation, but I can't see a way to use any of the implicit looping
| constructs (`for (@a)', `foreach ...').  I would appreciate insight
| into an alternate way to solve the problem.  Thanks.
+---------------

Not for iterating over multiple objects in step, but:

open(A,'A');
$/ = undef;
@a = unpack('c*',<A>);
close(A);
# do the same for B, C, D
while (($a,$b,$c,$d)=(scalar(shift(@a)),scalar(shift(@b)),scalar(shift(@c)),
		      scalar(shift(@d))))
{
    # compare $a, $b, $c, $d
}

WARNING:  this is incredibly wasteful of memory.  Were your files not small,
this would cause problems.  But for short files it should be REAL fast.

Basically, we pull each file into memory, with the unpack(c*) splitting it
into arrays of characters.  Then the shift pulls off each character in turn.

++Brandon
-- 
Me: Brandon S. Allbery			 KF8NH: DC to LIGHT!  [44.70.4.88]
Internet: allbery@NCoast.ORG		 Delphi: ALLBERY
uunet!usenet.ins.cwru.edu!ncoast!allbery

tchrist@convex.COM (Tom Christiansen) (05/24/91)

From the keyboard of allbery@ncoast.ORG (Brandon S. Allbery KF8NH):
:Not for iterating over multiple objects in step, but:
:
:open(A,'A');
:$/ = undef;
:@a = unpack('c*',<A>);
:close(A);
:# do the same for B, C, D
:while (($a,$b,$c,$d)=(scalar(shift(@a)),scalar(shift(@b)),scalar(shift(@c)),
:		      scalar(shift(@d))))
:{
:    # compare $a, $b, $c, $d
:}
:

How come you're casing those shifts to a scalar context?

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/25/91)

In article <SETZER.91May23110013@matagh.ncsu.edu> setzer@matagh.ncsu.edu (Th'PoC) writes:
: The problem: I have four files of identical length (< 5k).  Each
: contains 8 bit data, but do not contain any of `A'..`Z'.  At any fixed
: offset into each file, either 1) all four characters are identical, or
: 2) one or more of the characters has the eighth bit set.  I want to
: generate an "overlay", i.e., if (1), output the character; if (2),
: output a letter in `A'..`P' that will tell me which files had the
: eighth bit set.
: 
: The [*ugly*] solution:  I `open'ed the four files, slurped each
: of them into an array, and looped over the length of the array:
: 
: --
: open(A, "< a") || die "can't open 'a'";  @a=split(//,join("",<A>));
: # do the same for files "b", "c", and "d"
: 
: for ($i=0; $i < $#a+1; $i++) { # do the `if's on `ord($a[$i])&0x80' etc.
: --
: 
: The question: Is there a "prettier" way to do this?  (There must be.
: Isn't there a perl axiom that if you have to use an indexing variable,
: you're doing it the "wrong" way?)  It comes down to simple string
: manipulation, but I can't see a way to use any of the implicit looping
: constructs (`for (@a)', `foreach ...').  I would appreciate insight
: into an alternate way to solve the problem.  Thanks.

Well, here's one way:

#!/usr/bin/perl

vec($foo,0,0);

$a = `cat a`;
$b = `cat b`;
$c = `cat c`;
$d = `cat d`;

$string = $a;

$a =~ tr/\200-\377/A/;
$a =~ tr/A/\0/c;
$b =~ tr/\200-\377/B/;
$b =~ tr/B/\0/c;
$c =~ tr/\200-\377/D/;
$c =~ tr/D/\0/c;
$d =~ tr/\200-\377/H/;
$d =~ tr/H/\0/c;

$highs |= $a;
$highs |= $b;
$highs |= $c;
$highs |= $d;

($mask = $highs) =~ tr/A-Z/\177/c;
$mask =~ tr/A-Z/\0/;
$string &= $mask;
$string |= $highs;

print $string;

: If the fundamentalists don't hate you, you have the wrong lifestyle.
:   -- James Nicoll

If the fundamentalists hate you when they should hate your lifestyle,
they've lost sight of the fundamental lifestyle.
  -- me

Larry

allbery@NCoast.ORG (Brandon S. Allbery KF8NH) (05/26/91)

As quoted from <1991May24.070032.18307@convex.com> by tchrist@convex.COM (Tom Christiansen):
+---------------
| :while (($a,$b,$c,$d)=(scalar(shift(@a)),scalar(shift(@b)),scalar(shift(@c)),
| :		      scalar(shift(@d))))
| 
| How come you're casing those shifts to a scalar context?
+---------------

I didn't have a Perl to test it on or a Perl manual to check (still working on
that), and I wasn't certain if the shift's would gobble too much in an array
context.

++Brandon
-- 
Me: Brandon S. Allbery			 KF8NH: DC to LIGHT!  [44.70.4.88]
Internet: allbery@NCoast.ORG		 Delphi: ALLBERY
uunet!usenet.ins.cwru.edu!ncoast!allbery