PEREIRA@SRI-AI.ARPA (06/05/83)
Although the author of the USENET article burdvax.781 is correct in theory about all the tricky factors than enter in any LIPS evaluation, in practice (8 years of it) the small set of benchmarks in David Warren's "Implementing Prolog" technical reports (available from the Dept. of Artificial Intelligence of Edinburgh University) seems to provide a good estimate of the performance of a Prolog system for a large variety of tasks. The figures of 40000 LIPS for DEC-20 Prolog and 1500 LIPS for C-Prolog on a VAX 780 come from timing "naive reverse" defined as follows: nrev([],[]). nrev([X|L],R) :- nrev(L,R0), conc(R0,[X],R). conc([],L,L). conc([X|L1],L2,[X|L3]) :- conc(L1,L2,L3). Although this is very simple, it tests the shallow backtracking and structure crunching which is charateristic of complex Prolog programs such as compilers, natural-language parsers, term rewrite systems (eg. algebraic simplifiers and equation solvers), theorem provers, etc. The program doesn't test deep backtracking and cut, but these seem to be comparatively fast in those systems that are good enough to be worth our trouble measuring. To get accurate figures for this kind of benchmark, we use a test program which calls nrev once to allocate space, page in, etc., then fails and goes into a fail loop calling nrev a number of times. For the benefit of those who don't have access to the "Implementing Prolog" papers, I intend to submit the collection of benchmarks and results for DEC-20 Prolog on a 2060 and C-Prolog on a VAX 780 in the near future. I hope this will provide a "de facto" standard for the evaluation of Prolog systems. Fernando Pereira -------