[comp.lang.perl] can we ever compile perl?

tchrist@convex.COM (Tom Christiansen) (12/11/90)

In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>1. Compile some large subset of the language to portable C code.

We usually say "well, but not evals of course."  I've a suspicion
that this rules out a lot of code.  For example, a user guy mailed
me recently with a problem that had a quick eval answer, and I'm 
thinking that saying "no evals in compiled code" really limits
a large subset of the language.  Here's the problem:

    I'm still working with the problem I was attempting to describe to you
    last night.  It involves a simple search and replace, but the
    delimiting of the search string will vary, and I would like the new
    string to maintain the same variable delimiters.  The strings can be
    internally delimited by a combination of underscores, spaces and
    newlines, and externally by newlines and commas.  I want to replace
    with the same delimeters.  For example:

    search string:  my_search_string        new string:     this_is_it

    may be matched by:                      replace should be:
    - ------------------                    ------------------
    my search string                        this is it

    my_search_string                        this_is_it

    my_search                               this_is
    string                                  it

    my                                      this
    search string                           is it

    and so on.

    I know there must be some straightforward way to do it, but so far
    I have not figured it out.  I've got the general one word case, and
    fixed number of words, but not a variable number solution.  


The code he was trying to use was this:

    ###########################################################################
    #!/usr/bin/perl
    #
    # gl - global replace for variable format strings
    $#ARGV == 3 || die "Invalid no. of arguments";
    ($infile, $outfile, $oldexp, $newexp) = @ARGV;
    @old = split(/[ _]/,$oldexp);
    @new = split(/[ _]/,$newexp);
    open(in,"$infile") || die "Can't open $infile: $!";
    open(out,">$outfile") || die "Can't open $outfile: $!";
    $foo = <in>;
    while ( <in> ) {
	    $foo .= $_;
	    }
    #First pass, single line to single line
    # new expression may contain underscores
    if (!$#old) {
	    # The following searches for the label making sure it begins
	    # and ends with a space, comma or newline and replaces the
	    # label and whatever separators it found around it.

	    $foo =~ s/(\d\n|,\n|,)([ ]*)$oldexp([ ]*)(,|\n,|\n\d)/\1\2$newexp\3\4/g;
	    print "Finished, output in $outfile.\n";
	    }
    # Multi-line to multi line, equal size
    # Need to parameterize for any size
    if ($#old) {
	    $test = $foo;
	    $foo =~ s/(\d\n|,\n|,)([ ]*)$old[0]([ _\n])$old[1]([ _\n])$old[2]([ ]*)(,|\n,|\n\d)/\1\2$ new[0]\3$new[1]\4$new[2]\5\6/g;
	    print "Finished 2, output in $outfile.\n";
	    }
    print out $foo;
    ###########################################################################


Which I found to be pretty convoluted.  My solution was this:

    #!/usr/local/bin/perl
    # sanity checks first
    die "usage: $0 string1 string2 [files ...]" if @ARGV < 2;
    die "unbalanced underbars"
        unless ($count = $ARGV[0] =~ tr/_/_/) == ($ARGV[1] =~ tr/_/_/);
    die "too many underbars" unless $count < 10;

    ($find = shift) =~ s/[\s_]/([\\s_]+)/g;
    ($repl = shift) =~ s/[\s_]/'$'.++$i/eg;
    print STDERR "replacing all ``$find'' with ``$repl''\n";
    undef $/;
    $_ = <>;
    eval "s/$find/$repl/g";
    print;


Notice that I've used not one but two evals in this little program.
Of course, this is too short to bother wanting to compile (unless
someone has other motivations than speed for compilations), but I 
think it illustrates the problem: evals are just too darn convenient.
I don't really want to think about how I might do that if I couldn't
have an eval, but I don't know how to compile it with one either.

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

pphillip@cs.ubc.ca (Peter Phillips) (12/12/90)

In article <110306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
>In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>1. Compile some large subset of the language to portable C code.
>
>We usually say "well, but not evals of course."  I've a suspicion
>that this rules out a lot of code.  For example, a user guy mailed
>me recently with a problem that had a quick eval answer, and I'm 
>thinking that saying "no evals in compiled code" really limits
>a large subset of the language.  Here's the problem:

[ string replacing problem omitted ]

>Notice that I've used not one but two evals in this little program.
>Of course, this is too short to bother wanting to compile (unless
>someone has other motivations than speed for compilations), but I 
>think it illustrates the problem: evals are just too darn convenient.
>I don't really want to think about how I might do that if I couldn't
>have an eval, but I don't know how to compile it with one either.

For some perl scripts, eval is indispensible.  The debugger wouldn't
work without it.  For other scripts, eval can be replaced by less
powerful operations.  Eval is often used to get at the regular
expression compiler built into perl.  If perl had a regular expression
variable and a regular expression compile function, code fragments
like:

    eval "s/$find/$repl/g";

Could be replaced with the translatable-to-C code version:

    $pat1 = &compile_pattern($find);
    $pat2 = &compile_pattern($repl);
    s/$pat1/$repl/g;

Something like this could be added to perl, I think.

There are other common uses for eval, like simulating references.
I think with the right modifications, most uses of eval could be
eliminated.  Perhaps the greatest and wisest perl hackers should
get together, examine their scripts which use eval, and decide
what reasonable extensions to perl would eliminate 90% of the
use for eval.

--
Peter Phillips, pphillip@cs.ubc.ca | "It's worse than that ... He has
{alberta,uunet}!ubc-cs!pphillip    | no brain." -- McCoy, "Spock's Brain"

tchrist@convex.COM (Tom Christiansen) (12/12/90)

In article <1990Dec12.064530.22356@cs.ubc.ca> pphillip@cs.ubc.ca (Peter Phillips) writes:

:There are other common uses for eval, like simulating references.

:I think with the right modifications, most uses of eval could be
:eliminated.  Perhaps the greatest and wisest perl hackers should
:get together, examine their scripts which use eval, and decide
:what reasonable extensions to perl would eliminate 90% of the
:use for eval.

Yes, although I think in many cases you can use the *foo notation,
and it will be faster, too.  I hope that wouldn't be barred as 
well, as it's far too useful.

Two other reasons for using eval are for dynamic formats and
for the creatures that hp2h creates, although as I show in h2pl,
these can often be reduced.

Plus don't forget that s///e counts as an eval also.

Let's keep a list here.  I also suspect that there'll be a fair number
of perl hackers at USENIX next month.  More at the end than at the
beginning if I have anything to do with it. :-)

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

weisberg@hpcc01.HP.COM (Len Weisberg) (12/13/90)

Peter Phillips writes:
> For some perl scripts, eval is indispensible.  The debugger wouldn't
> work without it.  For other scripts, eval can be replaced by less
> powerful operations.  Eval is often used to get at the regular
> expression compiler built into perl. 
>    ... <some supporting details omitted> ...
> There are other common uses for eval, like simulating references.
> I think with the right modifications, most uses of eval could be
> eliminated.  Perhaps the greatest and wisest perl hackers should
> get together, examine their scripts which use eval, and decide
> what reasonable extensions to perl would eliminate 90% of the
> use for eval.

Hear, hear!!   My opinion exactly!!
Sorry for taking up bandwidth with this, but Peter has said it so well,
I just wanted to underline it.
I think the development outlined here would be a tremendous boost to
the usability and the use of perl.

- Len Weisberg - HP Corp Computing & Services - weisberg@corp.HP.COM

pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) (12/13/90)

In article <110306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
>    ..............
>    eval "s/$find/$repl/g";
>    ................
>

Gee, I've always glossed over this eval stuff.  Now that I'm paying attention
I'm befuddled.  Why is the eval needed, Tom?

Why does the substitution 1/2 work w/o the eval?  The $find is parsed and
found but the $repl gets shoved in literally.  I just hate it when I don't
have a model that will predict code's behavior and have to "just try it" to
see what it does.

>Notice that I've used not one but two evals in this little program.

Boy, I am dense.  Where's the other one?

Thanks.

Paul O'Neill                 pvo@oce.orst.edu		DoD 000006
Coastal Imaging Lab
OSU--Oceanography
Corvallis, OR  97331         503-737-3251

tneff@bfmny0.BFM.COM (Tom Neff) (12/13/90)

I won't be at USENIX but here are my thoughts on compiled Perl:

 1. Even with limited functionality it would be a godsend.

 2. For many of us, it would be enough to be able to make fast-loadable
    "Perl object files," i.e., write all data structures to disk after
    compilation & before execution.  The resulting "compiled scripts"
    would run faster because the parsing pass would be eliminated.
    Especially wonderful with large scripts!

 3. A lot of the really troublesome 'eval' examples are hacks for the
    purpose of coaxing a little faster performance out of the interpreter.
    Presumably in exchange for the inherent speed of a compiled script
    you could give some of that up.

 4. If the Perl 'eval' compiler were put into a shared library, compiled
    scripts could run and have access to a single, reentrant copy of the
    evaluator if they need it.  Scripts themselves could stay small.

-- 
Anthrax Rampant in Kirghizia:    Oo*oO      Tom Neff
  Izvestia Comment -- TASS      * *O* *     tneff@bfmny0.BFM.COM

tchrist@convex.COM (Tom Christiansen) (12/14/90)

From the keyboard of pvo@sapphire.OCE.ORST.EDU (Paul O'Neill):
:In article <110306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
:>    eval "s/$find/$repl/g";
:Gee, I've always glossed over this eval stuff.  Now that I'm paying attention
:I'm befuddled.  Why is the eval needed, Tom?
:Why does the substitution 1/2 work w/o the eval?  The $find is parsed and
:found but the $repl gets shoved in literally.  I just hate it when I don't
:have a model that will predict code's behavior and have to "just try it" to
:see what it does.

Because perl only does one level of evaluation.  If you want 
more, have you to ask for it.  There are $1 and $2 references
inside of $repl.

:>Notice that I've used not one but two evals in this little program.
:Boy, I am dense.  Where's the other one?

It's hidden in the substitute that creates repl:

    ($repl = shift) =~ s/[\s_]/'$'.++$i/eg;

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
"With a kernel dive, all things are possible, but it sure makes it hard
 to look at yourself in the mirror the next morning."  -me

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/14/90)

In article <110306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
> In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >1. Compile some large subset of the language to portable C code.
> We usually say "well, but not evals of course."

Even without evals this would make Perl a lot more useful. Of course,
half the advantage disappears if the Perl-in-C library isn't freely
redistributable---but at least that, unlike the entire language, can be
rewritten in pieces. The other half of the advantage stays in any case:
no parsing time, single executable, easy hand optimization, easy use of
fast calculation.

And there's no reason an eval can't be compiled. ``It's too much work to
stick the compiler into the library!'' you say. Well, most evals in
practice are just fixed operations applied to variable string arguments.
There's no reason your example couldn't be compiled into fixed code---
the only parsing left after compilation would be the regexp parsing.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/14/90)

In article <1990Dec12.064530.22356@cs.ubc.ca> pphillip@cs.ubc.ca (Peter Phillips) writes:
> For some perl scripts, eval is indispensible.  The debugger wouldn't
> work without it.

I imagine that the debugger would remain one of the advantages of the
interpreted language.

> Perhaps the greatest and wisest perl hackers should
> get together, examine their scripts which use eval, and decide
> what reasonable extensions to perl would eliminate 90% of the
> use for eval.

This is a good idea for any language.

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/14/90)

In article <93725765@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
>  2. For many of us, it would be enough to be able to make fast-loadable
>     "Perl object files," i.e., write all data structures to disk after
>     compilation & before execution.

Supposedly perl -u does that, but it doesn't work on many systems. As an
alternative I might suggest that you try to work my pmckpt checkpointer
into Perl. pmckpt 0.95 (which I just made available for anonymous ftp
from stealth.acf.nyu.edu) has been reported to work on (gasp) System V
machines, as well as my native environment. Both Larry and Tom seemed
slightly interested in the code a few weeks ago, but appear to have
abandoned it (sigh).

The reason pmckpt is so portable, btw, is that it doesn't use setjmp()
or longjmp(). Guess what it uses instead...

---Dan

tneff@bfmny0.BFM.COM (Tom Neff) (12/14/90)

In article <15591:Dec1323:30:2490@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <93725765@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
>>  2. For many of us, it would be enough to be able to make fast-loadable
>>     "Perl object files," i.e., write all data structures to disk after
>>     compilation & before execution.
>
>Supposedly perl -u does that, but it doesn't work on many systems. 

Perl -u is supposed to undump your core image to create a SELF CONTAINED,
executable program.  Where this does work, the result is HUGE, bigger
than Perl itself (by definition).  What I want is to store JUST the
compiled script data, suitable for immediate interpretation by the
regular Perl program.  The results should be quite small, and you save
the parsing pass later on.

I think 'checkpointing' would be a good way to go if the results stored
compactly... haven't seen Dan's invention yet, maybe that qualifies.

-- 
"We plan absentee ownership.  I'll stick to       `o'   Tom Neff
 building ships." -- George Steinbrenner, 1973    o"o   tneff@bfmny0.BFM.COM

gee@client2.DRETOR.UUCP (Thomas Gee ) (12/15/90)

In article <1990Dec12.064530.22356@cs.ubc.ca> pphillip@cs.ubc.ca (Peter Phillips) writes:
>In article <110306@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
>>In article <9592:Dec920:40:5190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>>>1. Compile some large subset of the language to portable C code.
>
>For some perl scripts, eval is indispensible.

A related point on perl compilation.  If I am correct, perl "compiles" the
input code to another internal representation, and interprets the result.  This
results in a significant pause at invocation before the program (ie perl script)
begins executing.

Would it be possible to save the internal representation to which the script
is translated and feed that directly into the interpretor?  I have at least
one system that uses a "vast" number of perl scripts which execute in 
sequence, and the overhead for the initial translation is noticeable and
non-trivial.

I believe this suggestion did come up in the last "where's my perl compiler"
flood, but was never addressed.

Thanks,
	Tom.

-------------------------------------------------------------------------------
Thomas Gee       |
Aerospace Group  | a man in search of a quote
DCIEM, DND       |
Canada           | gee@dretor.dciem.dnd.ca
-------------------------------------------------------------------------------

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/16/90)

As quoted from <93725765@bfmny0.BFM.COM> by tneff@bfmny0.BFM.COM (Tom Neff):
+---------------
|  2. For many of us, it would be enough to be able to make fast-loadable
|     "Perl object files," i.e., write all data structures to disk after
|     compilation & before execution.  The resulting "compiled scripts"
|     would run faster because the parsing pass would be eliminated.
|     Especially wonderful with large scripts!
+---------------

I mentioned this to Larry once; he pointed out that Perl's internal structures
aren't particularly easy to save/restore in a portable way.  Of course, it
might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
nonportable.

+---------------
|  4. If the Perl 'eval' compiler were put into a shared library, compiled
|     scripts could run and have access to a single, reentrant copy of the
|     evaluator if they need it.  Scripts themselves could stay small.
+---------------

..and shared libraries are another nonportable feature.  Not to mention that
I have yet to make any sense out of the SVR3 version.  (Of course, that may
simply be *my* problem, not a problem with the shared library implementation.)

I may look into compiling a *subset* of Perl.  It wouldn't accept everything,
and it might not treat everything the same as the interpreter does (i.e. "do"
would be reated as an include request... although most uses ofthis are now
subsumed by "require"), but the speed increase would probably be worth the
loss in functionality, as you say.  Of course, I need to find time to do this
(grrr!).

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/17/90)

In article <1990Dec15.161911.27401@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
> I mentioned this to Larry once; he pointed out that Perl's internal structures
> aren't particularly easy to save/restore in a portable way.  Of course, it
> might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
> nonportable.

I wrote pmckpt exactly to prove that a checkpointer *can* be portable.
pmckpt assumes all the basic UNIX process structure. It doesn't make any
allowances for systems that don't conform (except that it automatically
figures out which way your stack grows). Yet people have reported pmckpt
working on several System V variants, as well as BSD. How much more
portable can you get?

---Dan

les@chinet.chi.il.us (Leslie Mikesell) (12/18/90)

In article <1990Dec15.161911.27401@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
>As quoted from <93725765@bfmny0.BFM.COM> by tneff@bfmny0.BFM.COM (Tom Neff):
>+---------------
>|  2. For many of us, it would be enough to be able to make fast-loadable
>|     "Perl object files," i.e., write all data structures to disk after
>|     compilation & before execution.  The resulting "compiled scripts"
>|     would run faster because the parsing pass would be eliminated.
>|     Especially wonderful with large scripts!
>+---------------

>I mentioned this to Larry once; he pointed out that Perl's internal structures
>aren't particularly easy to save/restore in a portable way.  Of course, it
>might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
>nonportable.

A reasonable solution is to not require the saved copy to be portable or
even explicitly saved.  Instead, add a  statement and/or command line
option to specify a directory to cache the parsed output allowing
the usual expansions of ~/, $HOME, etc. to give a choice between saving
in a public-writable directory or making a private copy for each user.
Then, if the directory exists and some quick checks establish that
the cached copy was written later than the script on a machine with
the same variable types, the parsing pass could be skipped.  Otherwise
a parsed copy would be saved in that directory (if permissions allow)
for the new run to use.  I think this would be a big help on machines
with slow disks and demand paged executables since it would likely
avoid the need to page in a lot of the perl program that would otherwise
be needed for the compile pass.  It might chew up some disk space,
but probably nowhere near to the extent that perl -u does, and this way
you still get the advantage of shared text when multiple copies of perl
are running.

Les Mikesell
  les@chinet.chi.il.us

tneff@bfmny0.BFM.COM (Tom Neff) (12/18/90)

In article <1990Dec15.161911.27401@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
>As quoted from <93725765@bfmny0.BFM.COM> by tneff@bfmny0.BFM.COM (Tom Neff):
>|  2. For many of us, it would be enough to be able to make fast-loadable
>|     "Perl object files," i.e., write all data structures to disk after
>|     compilation & before execution.  The resulting "compiled scripts"
>|     would run faster because the parsing pass would be eliminated.
>|     Especially wonderful with large scripts!
>
>I mentioned this to Larry once; he pointed out that Perl's internal structures
>aren't particularly easy to save/restore in a portable way.  Of course, it
                                               ^^^^^^^^
>might be possible to write(savefd, etext, sbrk(0) - etext), but this is also
>nonportable.

Is portability the issue here?  This would be a proposed speed optimization
for individual sites.  Precompiled scripts would not be inherently
portable across disparate OS's or machine architectures; but neither are
today's UNDUMP executables!  Also, precompiled scripts might not be
portable across major Perl versions even on the same platform; but it
would be fairly straightforward to record the version number at the
beginning of the precompiled script file, so that Perl could check for
incompatibilities before beginning execution.

-- 
"DO NOT, repeat, DO NOT blow the hatch!"  /)\   Tom Neff
"Roger....hatch blown!"                   \(/   tneff@bfmny0.BFM.COM

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (12/20/90)

As quoted from <12432668@bfmny0.BFM.COM> by tneff@bfmny0.BFM.COM (Tom Neff):
+---------------
| In article <1990Dec15.161911.27401@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes:
| >aren't particularly easy to save/restore in a portable way.  Of course, it
|                                                ^^^^^^^^
+---------------

"Portable" may not be the word.  I have used systems where this will fail
because a different execution of a program has a few things at different
addresses, so just restoring the data and bss from a file leaves pointers
dangling.  (Consider that stdio is already initialized by the time the data
and bss are loaded.)

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

flee@cs.psu.edu (Felix Lee) (12/21/90)

Everyone seems to be giving up too easily.  I'm nearly convinced that
Perl can be effectively compiled.  I've decided to attempt a Perl to
Scheme compiler in my copious spare time (tm).  Don't hold your
breath.
--
Felix Lee	flee@cs.psu.edu