[comp.lang.perl] Using 'split' with multi character expressions

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (05/26/90)

In article <1933@pyrltd.UUCP> jimmy@pyrltd.UUCP (Jimmy Aitken) writes:
: (Patchlevel 18, runiing under OSx4.4c on a Pyramid)
: 
: I'm trying to read in a file, and split on a delimieter that seperates
: different records.  In the test case below, these are a line of 7 '-'
: signs.  I can split on a simple string, but I want to guard against
: the case where the pattern could be included elsewhere.  In the case
: shown below, it works with the exception that record 2 line 3 is sent
: into a speerate array element.
: 
: I've tried all possible combinations of pattern matching that I can
: think of to specify a unique pattern for split, but tnothing seems to
: work as I expect it.  I would expect that setting $artsep to
: "^-------$" to do what I want, but it does not.  Have I read the
: manual wrong, and if so, can someone point me in the correct direction
: please?

Go ahead and use "^-------$" but set $* to enable multi-line pattern matching.
Otherwise the ^ only matches on the beginning of the string because
the pattern matcher is optimizing on the assumption that your string only
contains one line.

Larry

mleone@f.gp.cs.cmu.edu (Mark Leone) (05/26/90)

On a related note, why doesn't split allow a case-insensitive pattern?
I.e., split(/$word/i, $string).  It seems like all the other pattern
operations support case-insensitivity!

Is there some other good way to do this when $word and $string are not
known at compile-time?

--
Mark R. Leone  <mleone@cs.cmu.edu>              
Computer Science, Carnegie Mellon University    
Pittsburgh, PA 15213

merlyn@iwarp.intel.com (Randal Schwartz) (05/26/90)

In article <9423@pt.cs.cmu.edu>, mleone@f.gp (Mark Leone) writes:
| 
| On a related note, why doesn't split allow a case-insensitive pattern?
| I.e., split(/$word/i, $string).  It seems like all the other pattern
| operations support case-insensitivity!
| 
| Is there some other good way to do this when $word and $string are not
| known at compile-time?

Well, as a bit of a hack, I could suggest:

sub isplit { # &isplit(word,string) => array
	local($word,$_) = @_;
	local(@ind,@result,$start,$end);
	@ind = (0);
	s#$word#push(ind,length($`),length($`.$&)),$&#ieg;
	push(ind,length);
	# @ind now has pairs of indicies (0-origin) into $_
	# that bound the non-$word items; convert into result:
	while (@ind) {
		$start = shift(ind);
		$end = shift(ind);
		push(result,substr($_,$start+$[,$end-$start));
	}
	@result;
}

print join(":",&isplit("z","foo z bar Z bletch")),"\n";
print join(":",&isplit("x*y","foo xy bar xxxy bletch XxXxY bug")),"\n";

Yup, doesn't handle the equiv of split(/(a)/i, $string).  Anyone care
to try that?  This is all I could do in the 20 minutes I had to play
with this today.

print &isplit("z","JzuZsztZ zaZnzoZtzhZezrZ zPZezrZlz ZhzaZczkZezrZ,");
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (05/26/90)

In article <9423@pt.cs.cmu.edu> mleone@f.gp.cs.cmu.edu (Mark Leone) writes:
: 
: On a related note, why doesn't split allow a case-insensitive pattern?
: I.e., split(/$word/i, $string).  It seems like all the other pattern
: operations support case-insensitivity!
: 
: Is there some other good way to do this when $word and $string are not
: known at compile-time?

Uh, I don't have any trouble with split(/$word/i,$string).  Are you sure
you didn't just try split(/x/i, $string) and generalize from that?  The
case of a single explicit split letter doesn't work right, but I can't
get anything else to misbehave here, including if $word is a single char.

By the way, the next patch will fix split(/x/i).

Larry

mleone@f.gp.cs.cmu.edu (Mark Leone) (05/28/90)

In article <8209@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>Uh, I don't have any trouble with split(/$word/i,$string).  Are you sure
>you didn't just try split(/x/i, $string) and generalize from that?  The
>case of a single explicit split letter doesn't work right, but I can't
>get anything else to misbehave here, including if $word is a single char.

Oops!  You're right, I lept to the wrong conclusion by testing a single
explicit letter (and reading the documentation :-).  Thanks for setting
me straight.

- Mark