[comp.unix.questions] a perl question

rjk@sawmill.uucp (Richard Kuhns) (11/10/89)

I'm not entirely sure that this is the newsgroup I should use, but
I've seen a number of perl questions/answers and I don't know of a
better newgroup (until comp.lang.perl comes along).

My question:  I'd dearly love to have a filter, written in perl (the
rest of the code for this project is in perl, and I'll post it when I
get it working), which would turn the string `B^HBO^HOL^HLD^HD' into
`$bold_startBOLD$bold_end', where $bold_start and $bold_end are
predefined character strings.  I have a filter that does this already
written in C, but it seems to me I should be able to do it easier in
perl (using regular expressions?), but I can't come up with a good way
to do it.  /(.)\010$1/ recognizes one element of such a string (always
the first).  s/(.)\010$1/$1/g specifically does NOT work (it only
changes the first occurence).

Thanks (in advance, of course).

Rich Kuhns
newton.physics.purdue.edu!sawmill!rjk

tchrist@convex.COM (Tom Christiansen) (11/11/89)

In article <RJK.89Nov9162936@sawmill.uucp> rjk@sawmill.uucp (Richard Kuhns) writes:

|I'm not entirely sure that this is the newsgroup I should use, but
|I've seen a number of perl questions/answers and I don't know of a
|better newgroup (until comp.lang.perl comes along).

|My question:  I'd dearly love to have a filter, written in perl (the
|rest of the code for this project is in perl, and I'll post it when I
|get it working), which would turn the string `B^HBO^HOL^HLD^HD' into
|`$bold_startBOLD$bold_end', where $bold_start and $bold_end are
|predefined character strings.  I have a filter that does this already
|written in C, but it seems to me I should be able to do it easier in
|perl (using regular expressions?), but I can't come up with a good way
|to do it.  /(.)\010$1/ recognizes one element of such a string (always
|the first).  s/(.)\010$1/$1/g specifically does NOT work (it only
|changes the first occurrence).

This is quite close to what you want:

    $SO = "\033[1m";
    $SE = "\033[m";

    $_ = "this string is B\010BO\010OL\010LD\010D today\n";

    if (/(.)\010$1/) {
	$begin = $`;
	do { s/$&/$1/; } while /(.)\010$1/;
	( $end = $' ) =~ s/.(.*)/$1/;
	s/^$begin/$&$SO/;
	s/$end$/$SE$&/;
    }

    print;

I say "quite close" because if you consider the following string:

    $_ = "this string is B\010BO\010OL\010LD\010D and B\010BR\010RI\010IG\010GH\010HT\010T today\n";

The "and' also gets emboldened, which isn't quite right, but this should 
be a good starting point.  

It would be really nice if just
    s/((.)\010$1)+/${SO}$1${SE}/g; 
would somehow work without any explicit looping, but as with your substitute, 
$1 won't be reset on each scan.  I'll forward this to the perl-users
mailing list (who are waiting on comp.lang.perl) to see whether anybody
there has any bright ideas.


--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

tchrist@convex.COM (Tom Christiansen) (11/12/89)

I just got mail from Larry Wall who pointed out that you need 
to use \1 in the LHS of the substitute.  He said:

|You want something like this:
|
|    s/(.)\010\1/<<<$1>>>/g;
|    s/>>><<<//g;
|
|where <<< and >>> can be anything that don't occur in the text.
|
|Within a pattern you want to use \1, not $1, because $1 means interpolate
|the previous pattern match.

Which makes it work.  When you're done, change the <<< and >>> into
start-standout and end-standout, like this:

    s/<<</$SO/g;  # or s/<<</\033[1m/g; or whatever
    s/>>>/$SE/g;


--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

maart@cs.vu.nl (Maarten Litmaath) (11/14/89)

At first I wrote it in sed:
----------8<----------8<----------8<----------8<----------8<----------
: top
	s/\(\n.*\)\n\(.\)^H\2/\1\2\
/
	t top
	s/\(.\)^H\1/\
\1\
/
	t top
	s/\n\([^\n]*\)\n/${bold}\1${endbold}/g
----------8<----------8<----------8<----------8<----------8<----------

Then I ran s2p on the script -> wrong perl script!  ('\n' is handled
incorrectly.)  A rewrite in perl:

----------8<----------8<----------8<----------8<----------8<----------
#!/usr/local/bin/perl
eval "exec /usr/local/bin/perl -S $0 $*"
	if $running_under_some_shell;

$SO = "\033[7m";
$SE = "\033[m";

line: while (<>) {
	while (1) {
		while (s/(\n.*)\n(.)\010\2/$1$2\n/) {
			;
		}
		last if !s/(.)\010\1/\n$1\n/;
	}
	s/\n(.*)\n/$SO$1$SE/g;
	print;
}
-- 
"Richard Sexton is actually an AI program (or Construct, if you will) running
on some AT&T (R) 3B" (Richard Brosseau) | maart@cs.vu.nl, mcsun!botter!maart

merlyn@iwarp.intel.com (Randal Schwartz) (11/15/89)

In article <RJK.89Nov9162936@sawmill.uucp>, rjk@sawmill (Richard Kuhns) writes:
| I'm not entirely sure that this is the newsgroup I should use, but
| I've seen a number of perl questions/answers and I don't know of a
| better newgroup (until comp.lang.perl comes along).
| 
| My question:  I'd dearly love to have a filter, written in perl (the
| rest of the code for this project is in perl, and I'll post it when I
| get it working), which would turn the string `B^HBO^HOL^HLD^HD' into
| `$bold_startBOLD$bold_end', where $bold_start and $bold_end are
| predefined character strings.  I have a filter that does this already
| written in C, but it seems to me I should be able to do it easier in
| perl (using regular expressions?), but I can't come up with a good way
| to do it.  /(.)\010$1/ recognizes one element of such a string (always
| the first).  s/(.)\010$1/$1/g specifically does NOT work (it only
| changes the first occurence).

I saw this question come through the perl-users@virginia.edu mailing
list first, but I'll post my reply here (being the token Perl
wizard...:-):

#!/usr/bin/perl
$bold_start = "whatever"; $bold_end = "whatever";
while (<>) {
	if (/\010/) {
		s/(.)\010\1/\201\1\202/g; # surround bold with \201 and \202
		s/\202\201//g; # optimize away all end-start pairs
		s/\201/$bold_start/og; # replace start with real start
		s/\202/$bold_end/og; # and likewise for end
	}
	print;
}

There you have it.  OK, so it's not a one-liner... big deal.

Just another Perl hacker,
(lwall says he's "Not just another Perl hacker"... :-)
-- 
/== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\
| on contract to Intel's iWarp project, Hillsboro, Oregon, USA, Sol III  |
| merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn	         |
\== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/