pfeiffer@herve.cs.wisc.edu (Phil Pfeiffer) (09/12/90)
In an earlier posting, Kwang-Keun Yi writes: > [In] "A Flexible Approach to Interprocedural Data flow Analysis > and Programs with Recursive Data Structures" by Neil Jones and Steven > Muchnick [POPL '82], they mentioned in the conclusion section that > > "Two areas of investigation that appear fruitful are to consider how > the computational complexity of the analyzer varies with the choice > of token set and the the feasibility of building an analyzer with > the choice of token set as a parameter." > > Can somebody help me to find papers on the investigations? This is a *big* question. There have been a series of papers since 1982 that amount to *specific* extensions of JM82; some of the authors have addressed complexity in their papers as well. Here's my stab at providing you with references and a (partial, incomplete) survey. Extensions to JM82 typically vary according to the proposed labeling technique and the strategy used to approximate arbitrarily large data structures (I'm using the word "label" in place of the word "token" because I like it better.) Jones and Muchnick, as I understand the paper, suggest labeling every structure x in a store s with a value such as [k] to show that x was created by the statement "at" program point [k]. Another labeling scheme that has been proposed tags x with the value ([k],n) to show that x was the nth object created at [k]. Also, more elaborate labeling schemes have been proposed that take a program's nesting structure into account -- e.g., if s is labeled ([k],m,n), then s was the nth object created during the mth iteration of statment [k]'s enclosing loop. This strategy is mentioned in a footnote in P. Hudak's "A semantic model of reference counting and its abstraction" in _Abstract Interpretation of Declarative Languages_ (pp. 45-62)). A variant of this is mentioned in a forthcoming paper by Francois Bodin (you should be able to get a copy of this from Dennis Gannon, who's also at CSRD) and in a 1989 Sigplan paper by Susan Horwitz, Phil Pfeiffer, and Tom Reps. The HPR89 paper also describes several other strategies for using labels to track a program's execution history. See also the Chase-Wegman-Zadeck paper in SIGPLAN 90, and the papers referenced in its related works section (more on CWZ90 below). Two different approximation mechanisms are discussed in JM82: one for stores, and a second for the stack. Many of the analyses that I've seen, however, simply "flatten" the stack: i.e., assume that a procedure can return to any of the procedures that can call it. More complex approximations to the stack than this are usually avoided because approximations to stores that contain dynamically-allocated objects are already quite expensive to compute. Both of the reports that I know of that *do* construct non-trivial approximations to the stack were "forced into it" by of the language under consideration. The one, Uwe Pleban's 1981 thesis (U.Kansas), describes a preexecution analysis for a dialect of Scheme that supports downward closures. The other, Jan Stransky's 1988 thesis (University de Paris-Sud, Centre d'Orsay, disser- tation in French) describes a preexecution analysis for the (dynamically- scoped language) Lisp. Different authors have given different estimates of the respective computational complexities of their analyses. One of the best references on the complexity of labeled analyses is a paper in SIGPLAN 1990 by Chase, Wegman, and Zadeck. CWZ90 deals specifically with the problem of computing an approximation to the set of stores that a program can generate, in the presence of allocation primitives (e.g., cons) and procedures. The authors also give a clever algorithm that runs much faster than the standard, "simple" store-building algorithm when a program doesn't create "too many edges". CWZ90 also quotes some complexity results from other people's theses that you might find helpful. (Their characterization of Ruggieri's work, however, is not quite right; Ruggieri, in the final chapter of her thesis, discusses implementation techniques (such as union-find) that can be used to improve the running-time of her analysis -- i.e., the one cited in the report). I'd also recommend Hendren's work on dag-like and tree-like stores (there's a reference a paper by her in the Chase-Wegman-Zadeck paper; her thesis is available from Cornell as technical report 90-1114. Hendren's work is actually based on matrix manipulation, but it is equivalent to yet another variation on store graphs if you stare at it, cross-eyed, for a suitably long time). Stransky actually built an analyzer, and experimented with different parameters for his analyzer. Stransky's thesis has some empirical observations on how well the analyzer ran for different labeling techniques and different "orders" of stack approximations. A six-word summation of Stransky's experience with his analyzer: "effective, but very slow and clumsy". This is one of the threads that runs through all of the literature on pointer analysis. The SIGPLAN paper that Susan Horwitz, Tom Reps, and I wrote also describes a technique for proving that labels are being used correctly. (It turns out that the same concern was brought up in Pleban's thesis, Stransky's thesis, and, apparently, in Cousot's thesis). -- Phil Pfeiffer UUCP: ...!{harvard,seismo,topaz,akgua,allegra,usbvax}!cs.wisc.edu!pfeiffer (608) 262-6625 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.