E.Ireland@massey.ac.nz (Evan Ireland) (11/26/90)
Hi, I am currently implementing a kind of run-time strictness analysis for a Lazy Functional Abstract Machine, but I have been unable to find any references to similar work. My motivation for this is the apparent inefficiency caused by the use of higher-order functions in lazy languages such as Haskell. I have found papers describing techniques for strictness analysis by abstract interpretation in the presence of higher-order functions, however these are compile-time techniques, and fail to address certain efficiency problems. I would appreciate it if anyone could send me references or information on this topic. As an example of the efficiency problems I am referring to, consider foldr :: (a -> b -> b) -> b -> [a] -> b foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs) When compiling this, we must suspend the recursive application of foldr, since it may not terminate, and f may or may not be strict in its second argument, depending on the actual f that is provided at run-time. We could compile two versions of foldr, one `conservative' and one `nice'. Then at compile-time, for some calls to foldr, we may know that it is safe to call the `nice' version to avoid unnecessary suspensions, i.e. we know we have not inadvertently introduced non-termination. However this approach may lead to excessive code generation, especially with functions taking several functions as arguments. Another problem with compile-time strictness analysis, especially for languages with Haskell-like module systems, is that strictness information has no place in a module interface (except perhaps as annotations). Assume that module B imports module A, and several implementations of module A are allowed (perhaps with differing strictness properties for some functions). Module B should only need recompilation if the interface for module A is changed, and not if we happen to change module A's implementation. This constraint, if enforced, would limit the usefulness of compile-time strictness analysis to intra-module (and standard library) optimisations. My suggested run-time approach is to have each closure (function + environment for free variables) carry strictness information, and at run-time either evaluate or suspend certain expressions depending on the strictness properties of the function they happen to be an argument of. f x = g (h x) (x + 1) Even if g is not strict in its second argument, we needn't delay the (+) call in when x is already evaluated. I would imagine that few current implementations, if any, have optimisations of this sort. ********************************************************************* A question for Haskell and other lazy f.l. implementors: What do you do about functions like foldl and foldr in the standard library? And are they treated any differently than if they were user-defined functions? Evan Ireland (E.Ireland@massey.ac.nz), School of Information Sciences, Massey University, Palmerston North, NZ.
peterson-john@cs.yale.edu (John C. Peterson) (11/28/90)
Evan Ireland (E.Ireland@massey.ac.nz) asks `lazy f.l. implementors' (parse that as you wish) about compile time and run time strictness issues and separate compilation. Here's what is going on at Yale right now. First, let me comment that laziness is probably the primary performance obstacle we have to deal with so this is certainly a problem that merits serious study. We have not had a chance to explore this problem very far, but we do have a few results. Higher-order functions present the greatest problems; it is impossible to do much to a function like foldr due to separate compilation. Since the argument to foldr is completely unknown, the compiler must plan for the worst when generating the code for foldr. Our only solution (at the moment) is to agressively inline calls to foldr, thus removing the higher-order function entirely. For small functions like foldl and foldr this is an acceptable solution. We currently supply two entry points to every function: a general entry which is completely curried and uses only non-strict (delayed) arguments and an optimized entry which is uncurried and takes strict arguments as recognized by compile time strictness analysis. Right now, we are using a flat domain analysis so this affects only the top level delay of a structure. The optimized entry is used only on direct calls with complete argument lists, all other calls (curried or higher order) use the much less efficient standard entry. This works well for programs without higher-order functions and mainly scalar data objects. We use our own interface files for separate compilation instead of the official Haskell interface files; these keep all information needed to do this optimisation across module boundaries. It should be remembered that interface files are not an essential component of the language and implementations are free to deal with separate compilation issues in manners not described in the report. We also avoid unneeded delays for simple expressions. Expressions are associated with a complexity measure which gives an upper bound on the time taken to evaluate it. When this is below some threshold, defined by the overhead in creating a delay, the expression will be calculated rather than delayed. This picks up simple expressions like x+1 (assuming that + cannot cause a runtime error and that x is strict). There are also some more unusual cases involving structures, such as f [] = [] f x = g (tail x) Here, f is strict in x. Even if g is not strict, it is easier to compute tail x (which yields a delay) than to delay the call to tail. We feel that using multiple versions of a function is a `good thing' and fills needs beyond strictness problems, such as efficient overloaded functions. Partial evaluation (including evaluation with respect to strictness constraints) can yield versions of a function which are far more efficient than the most general case; the problem is to decide which partially evaluated copies of a function to create. This is a subject of ongoing research in our group. Selecting the best function dynamicly based on run-time information is certainly a very interesting area. John Peterson peterson-john@cs.yale.edu