[comp.lang.perl] Capitalizing words in a line.

marvit@hplpm.hpl.hp.com (Peter Marvit) (02/14/91)

My own thanks to the fellow who needed to UPPERCASE certain key words in
a file.  It's a problem I just needed a solution for.  However...

My conumdrum? I need to *capitalize* some or all words on an input line.
That is, change from "lower case" to "Mixed Case And Capitalized Words"
(as in GnuEmacs's 'capitalize-word/region').

I tried doing a s|\w.|tr[a-z][A-Z]| and some variants, but nothing worked
correctly.  I plead to the perl community for help.

Please e-mail solutions and I will summarize for the net.

	-Peter "scalloped perl" Marvit


: Peter Marvit   Hewlett-Packard Labs in Palo Alto, CA   (415) 857-6646    :
: Internet: <marvit@hplabs.hpl.hp.com>  uucp: {any backbone}!hplabs!marvit :

marvit@hplpm.hpl.hp.com (Peter Marvit) (02/21/91)

Thanks to Bill Mann and Tom Christiansen who provided solutions to "how
to capitalize words on a line."  Their nearly identical codes:

	s/\b([a-z])/$b=$1,$b=~tr:a-z:A-Z:,$b/ge;
AND
	s#(\b.)#($x = $1) =~ tr/a-z/A-Z/, $x#ge;

both operating on $_, of course. I find the second to be a bit "trickier"
to read, but both work just fine.  

	-Peter "knit one, perl two" Marvit


: Peter Marvit   Hewlett-Packard Labs in Palo Alto, CA   (415) 857-6646    :
: Internet: <marvit@hplabs.hpl.hp.com>  uucp: {any backbone}!hplabs!marvit :

marvit@hplpm.hpl.hp.com (Peter Marvit) (02/21/91)

Well another conumdrum, this time one of efficiency.  I created several
subroutines for different capitalization needs -- the first word only,
every word, etc.

One of the routines leaves every word in the string alone (e.i., all
UPPER CASE) but changes only first word to Mixed Case.  Unfortunately, my
first code will cause the perl script to grow in memory usage, apparently
without bound as a function of input data:

	sub capitalizefirstonly {
	    # Get the string itself
	    $_ = $_[1];
	    # Make the whole first word lower case
	    s/\b([A-Z]*)/$b=$1,$b=~tr:A-Z:a-z:,$b/e;   # <---------
	    # Now capitalize the first word
	    s/\b([a-z])/$b=$1,$b=~tr:a-z:A-Z:,$b/e;
	    # Return the whole string
	    return($_);
	}

It also chews up mucho CPU, though the memory usage is the real killer
(est. 2-4X input data size).  We tried many variations and the the key
code which causes the problems is the line which turns the first word
into lower case (marked "<-----" above). Our eventual solution, which
appears to be much more efficient time and memory-wise:

	sub capitalizefirstonly {
	    local($b,@words);
	    @words = split(' ',$_[1]);
	    $words[1] =~ tr/A-Z/a-z/;
	    $words[1] =~ s/\b([a-z])/$b=$1,$b=~tr:a-z:A-Z:,$b/e;
	    $_ = join(' ',@words);
	    return($_);
	}

My question?  What's wrong with the first method.  More straight forward,
but a horrible memory pig.  Is this a perl bug (heaven forbid?).

	-Peter "knit 2, perl -1" Marvit




: Peter Marvit   Hewlett-Packard Labs in Palo Alto, CA   (415) 857-6646    :
: Internet: <marvit@hplabs.hpl.hp.com>  uucp: {any backbone}!hplabs!marvit :

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (02/22/91)

In article <MARVIT.91Feb20192455@hplpm.hpl.hp.com> marvit@hplpm.hpl.hp.com (Peter Marvit) writes:
: One of the routines leaves every word in the string alone (e.i., all
: UPPER CASE) but changes only first word to Mixed Case.  Unfortunately, my
: first code will cause the perl script to grow in memory usage, apparently
: without bound as a function of input data:
: 
: 	sub capitalizefirstonly {
: 	    # Get the string itself
: 	    $_ = $_[1];
: 	    # Make the whole first word lower case
: 	    s/\b([A-Z]*)/$b=$1,$b=~tr:A-Z:a-z:,$b/e;   # <---------
: 	    # Now capitalize the first word
: 	    s/\b([a-z])/$b=$1,$b=~tr:a-z:A-Z:,$b/e;
: 	    # Return the whole string
: 	    return($_);
: 	}

: My question?  What's wrong with the first method.  More straight forward,
: but a horrible memory pig.  Is this a perl bug (heaven forbid?).

Yep.  Each tr was dropping a couple of chunks of memory when compiled.
This doesn't show up until you put it inside an eval.

As mentioned earlier, the way you'd want to do that subroutine under 4.0 is:

	sub capitalizefirstonly {
	    local($tmp) = @_
	    $tmp =~ s/\b(\w+)/\L\u$1/;
	    $tmp;
	}

Larry