[comp.lang.perl] Question about split

victor@watson.ibm.com (Victor Miller) (05/29/91)

I'm a little confused about perl's behavior in split.  If you run (on
4.003) then code below

#!/usr/local/bin/perl

#Test how certain patterns split
sub test {
    ($a) = @_;
    @a = split(/:/,$a);
    print "split('$a')=(",join(',',@a),") count=",scalar(@a),"\n";
}

&test('a:b:c: ');
&test('a:b:c:');
&test('a:b:c');
#end of program


You get the results:

split('a:b:c: ')=(a,b,c, ) count=4
split('a:b:c:')=(a,b,c) count=3
split('a:b:c')=(a,b,c) count=3

Which I find a little counter-intuitive: I thought that perl should
distinguish between the second and third cases.  I would have thought
that the output of the second case should have been

split('a:b:c:')=(a,b,c,) count=4

Why is it done the way that it is?
--
			Victor S. Miller
			Vnet and Bitnet:  VICTOR at WATSON
			Internet: victor@watson.ibm.com
			IBM, TJ Watson Research Center

rahardj@ccu.umanitoba.ca (Budi Rahardjo) (05/29/91)

victor@watson.ibm.com writes:
: I'm a little confused about perl's behavior in split.  If you run (on
: 4.003) then code below

: #!/usr/local/bin/perl
: #Test how certain patterns split
: sub test {
:     ($a) = @_;
:     @a = split(/:/,$a);
:     print "split('$a')=(",join(',',@a),") count=",scalar(@a),"\n";
: }
: &test('a:b:c: ');
: &test('a:b:c:');
: &test('a:b:c');
: #end of program

Replace    @a = split(/:/,$a);
with       @a = split(/:/,$a,10);

You will get
split('a:b:c: ')=(a,b,c, ) count=4
split('a:b:c:')=(a,b,c,) count=4
split('a:b:c')=(a,b,c) count=3

: Why is it done the way that it is?

I don't know ...
This is a problem that I have (similar to yours :

#!/usr/local/bin/perl
$pat2='a:b::::::::::::::x::::::::::::::::::::::::::::::::::';
# contains 50 :
@res = split(/:/,$pat2);
print "splitted (" . scalar(@res) . ")\n";
@res = split(/:/,$pat2,100);
print "splitted (" . scalar(@res) . ")\n";
# end of program

The second split (the one with '100') produce the result that
I wanted (an array of 50 entries)...

-- budi

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (06/01/91)

In article <VICTOR.91May28141138@irt.watson.ibm.com> victor@watson.ibm.com writes:
: I'm a little confused about perl's behavior in split.  If you run (on
: 4.003) then code below
: 
: #!/usr/local/bin/perl
: 
: #Test how certain patterns split
: sub test {
:     ($a) = @_;
:     @a = split(/:/,$a);
:     print "split('$a')=(",join(',',@a),") count=",scalar(@a),"\n";
: }
: 
: &test('a:b:c: ');
: &test('a:b:c:');
: &test('a:b:c');
: #end of program
: 
: 
: You get the results:
: 
: split('a:b:c: ')=(a,b,c, ) count=4
: split('a:b:c:')=(a,b,c) count=3
: split('a:b:c')=(a,b,c) count=3
: 
: Which I find a little counter-intuitive: I thought that perl should
: distinguish between the second and third cases.  I would have thought
: that the output of the second case should have been
: 
: split('a:b:c:')=(a,b,c,) count=4

A careful reading of the documentation for split will point out the fact
that null trailing fields are stripped if no limit is specified.

: Why is it done the way that it is?

The primary reason is that it surprises people less frequently.  Especially
when they split on whitespace and there's trailing whitespace, such as
an unstripped newline.  Note that the semantics of individual fields is
much the same, since an undefined field evaluates to the same value as
a null field.  It's only of concern to people counting fields.  And you
can get the other behavior by supplying a limit.

Other than that, no particular reason...  :-)

Larry