RISKS@CSL.SRI.COM (RISKS FORUM, Peter G. Neumann -- Coordinator) (12/01/86)
RISKS-LIST: RISKS-FORUM Digest, Sunday, 30 November 1986 Volume 4 : Issue 21 FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator Contents: Risks of Computer Modeling and Related Subjects (Mike Williams--LONG MESSAGE) The RISKS Forum is moderated. Contributions should be relevant, sound, in good taste, objective, coherent, concise, nonrepetitious. Diversity is welcome. (Contributions to RISKS@CSL.SRI.COM, Requests to RISKS-Request@CSL.SRI.COM) (Back issues Vol i Issue j available in CSL.SRI.COM:<RISKS>RISKS-i.j. MAXj: Summary Contents Vol 1: RISKS-1.46; Vol 2: RISKS-2.57; Vol 3: RISKS-3.92.) ---------------------------------------------------------------------- Date: Fri, 28 Nov 86 13:02 EST From: "John Michael (Mike) Williams" <JWilliams@DOCKMASTER.ARPA> To: RISKS@CSL.SRI.COM Subject: Risks of Computer Modeling and Related Subjects (LONG MESSAGE) Taking the meretricious "con" out of econometrics and computer modeling: "Con"juring the Witch of Endor John Michael Williams, Bethesda MD Quite a few years ago, the Club of Rome perpetrated its "Limits to Growth" public relations exercise. Although not my field, I instinctively found it bordering on Aquarian numerology to assign a quantity, scalar or otherwise, to "Quality of Life," and a gross abuse of both scientific method and scientific responsibility to the culture at large. Well after the initial report's firestorm, I heard that a researcher at McGill proved the model was not even internally consistent, had serious typographical/syntactical errors that produced at least an order of magnitude error, and that when the errors were corrected, the model actually predicted an improving, not declining "Quality of Life." I called the publisher of "Limits to Growth," into its umpteenth edition, and asked if they intended to publish a correction or retraction. They were not enthusiastic, what with Jerry Brown, as Governor and candidate for Presidential nomination, providing so much lucrative publicity. Jimmy Carter's "malaise" and other speeches suggest that these dangerously flawed theses also affected, and not for the better, both his campaign and administration. This shaman-esque misuse of computers embarrassed the computing community, but with no observable effect. On 31 October 1986, Science ran a depressing article entitled: "Asking Impossible Questions About the Economy and Getting Impossible Answers" (Gina Kolata, Research News, Vol. 234, Issue 4776, pp. 545-546). The subtitle and the sidebar insert are informative: Some economists say that large-scale computer models of the economy are no better at forecasting than economists who simply use their best judgment... "People are overly impressed by answers that come out of a computer"... Additional pertinent citations (cited with permission): "There are two things you would be better not seeing in the making-- sausages and econometric estimates," says Edward Learner, an economist at [UCLA]. These estimates are used by policymakers to decide, for example, how the new tax law will affect the economy or what would happen if a new oil import tax were imposed. They are also used by businesses to decide whether there is a demand for a new product. Yet the computer models that generate these estimates, say knowledgeable critics, have so many flaws that, in Learner's words, it is time to take the "con out of econometrics." ...[E]ven the defenders of the models... [such as e]conomists Kenneth Arrow of Stanford and Stephen McNees of the Federal Reserve Board in Boston say they believe the models can be useful but also say that one reason the models are made and their predictions so avidly purchased is that people want answers to impossible questions and are overly impressed by answers that come out of a computer... The problem, says statistician David Freedman of the University of California at Berkeley, is that "there is no economic theory that tells you exactly what the equations should look like." Some model builders do not even try to use economic theory...: most end up curve-fitting--a risky business since there are an infinite number of equations that will fit any particular data set... "What you really have," says William Ascher of Duke University, "is a man- model system." And this system, say the critics, is hardly scientific. Wassily Leontief of New York University remarks, "I'm very much in favor of mathematics, but you can do silly things with mathematics as well as with anything else." Defenders of the models point out that economists are just making the best of an impossible situation. Their theory is inadequate and it is impossible to write down a set of equations to describe the economy in any event... But the critics of the models say that none of these defenses makes up for the fact that the models are, as Leontief says, "hot air." Very few of the models predict accurately, the economic theory behind the models is extremely weak if it exists at all, in many cases the data used to build the models are of such poor quality as to be essentially useless, and the model builders, with their subjective adjustments, produce what is, according to Learner, "an uncertain mixture of data and judgment." When David Stockman made "subjective adjustments," he was reviled for cooking the numbers. It seems they may have been hash to begin with. [Douglas Hale, director of quality assurance at the (Federal) Energy Information Administration] whose agency is one of the few that regularly assess models to see how they are doing, reports that, "in many cases, the models are oversold. The scholarship is very poor, the degree of testing and peer review is far from adequate by any scientific measure, and there is very little you can point to where one piece of work is a building block for the next." For example, the Energy Information Administration looked at the accuracy of short-term forecasts for the cost of crude oil... At first glance, it looks as if they did not do too badly... But, says Hale, "what we are really interested in is how much does the price change over time. The error in predicting change is 91%" This is about the same error, to the hour, of a stopped clock. In the Washington Post for 23 November 1986, pg K1 et seq., in an interview entitled "In Defense of Public Choice," Assar Lindbeck, chairman of the Swedish Royal Academy's committee for selecting the Nobel Prize in economics, explains the committee's choice of Professor James M. Buchanan, and is asked by reporter Jane Seaberry: It seems the economics profession has come into some disrepute. Economists forecast economic growth and forecasts are wrong. The Reagan administration has really downplayed advice from economists. What do you think about the economics profession today? Chairman Lindbeck replies: Well, there's something in what you say in the following sense, I think, that in the 1960s, it was a kind of hubris development in the economic profession ... in the sense that it was an overestimation of what research and scientific knowledge can provide about the possibilities of understanding the complex economic system. And also an overestimation about the abilities of economists to give good advice and an overestimation of the abilities of politicians and public administrators to pursue public policy according to that advice. The idea about fine tuning the economy was based on an oversimplified vision of the economy. So from that point of view, for instance, economists engaged in forecasting--they are, in my opinion, very much overestimating the possibilities of making forecasts because the economic system is too complex to forecast. Buchanan has never been engaged in forecasting. He does not even give policy advice because he thinks it's quite meaningless... What econometric computer model is not "an oversimplified vision of the economy?" When is forecasting an "economic system ... too complex to forecast" not fortune-telling? To return to Kolata's article: [Victor Zarnowitz of the University of Chicago] finds that "when you combine the forecasts from the large models, and take an average, they are no better than the average of forecasts from people who just use their best judgment and do not use a model." I cannot resist noting that when a President used his own judgment, and pursued an economic policy that created the greatest Federal deficit in history but the lowest interest rates in more than a decade, the high priests of the dismal science called it "voodoo economics." It takes one to know one, I guess. Ascher finds that "econometric models do a little bit worse than judgment. And for all the elaboration over the years they haven't gotten any better. Refining the models hasn't helped." Ascher says he finds it "somewhat surprising that the models perform worse than judgment since judgment is actually part of the models; it is incorporated in when modelers readjust their data to conform to their judgment." Fascinating! Assuming the same persons are rendering "judgments," at different times perhaps, it implies that the elaboration and mathematical sophistry of the models actually cloud their judgment when expressed through the models: they appear to have lost sight of the real forest for the papier-mache trees. Another way of assessing models is to ask whether you would be better off using them, or just predicting that next year will be like this year. This is the approach taken by McNees... "I would argue that, if you average over all the periods [1974-1982] you would make smaller errors with the models [on GNP and inflation rates] than you would by simply assuming that next year will be just like this year," he says. "But the errors would not be tremendously smaller. We're talking about relatively small orders of improvement." I seem to recall that this is the secret of the Farmer's Almanac success in predicting weather, and that one will only be wrong 15% of the time if one predicts tomorrow's weather will be exactly like today's. Other investigators are asking whether the models' results are reproducible... Suprisingly the answer seems to be no. "There is a real problem with scholarship in the profession," says Hale of the Energy Information Administration. "Models are rarely documented well enough so that someone else can get the same result..." [In one study, about two-thirds of the] 62 authors whose papers were published in the [J]ournal [of Money, Credit and Banking]... were unwilling to supply their data in enough detail for replication. In those cases where the data and equations were available, [the researchers] succeeded in replicating the original results only about half the time... What a sorry testament! What has become of scientific method, peer review? "Even if you think the models are complete garbage, until there is an obviously superior alternative, people will continue to use them," [McNees] says. Saul, failing to receive a sign from Jehovah, consulted a fortune-teller on the eve of a major battle. The Witch of Endor's "model" was the wraith of Samuel, and it wasn't terribly good for the body politic either. I keep a sprig of laurel on my CRT, a "model" I gathered from the tree at Delphi, used to send the Oracle into trance, to speak Apollo's "truth." I do it as amusement and memento, not as talisman for public policy. History and literature are filled with the mischief that superstition and fortune-telling have wrought, yet some economic and computer scientists, the latter apparently as inept as the Sorcerer's Apprentice, are perpetuating these ancient evils. Are Dynamo and decendents serving as late-twentieth-century substitutes for I Ching sticks? Is the problem restricted to econometrics, or is the abuse of computer modeling widespread? Who reproduces the results of weather models, for instance? Who regularly assesses and reports on, and culls the unworthy models? Weather models are interesting because they may be among the most easily "validated," yet there remains the institutional question: when the Washington Redskins buy a weather service, for example, to predict the next game's weather, how can they objectively predetermine that they are buying acceptable, "validated" modeling rather than snake oil? After all, even snake oil can be objectively graded SAE 10W-40, or not. A posteriori "invalidation" by losing while playing in the "wrong" weather is no answer, any more than invalidation by catastrophic engine failure would be in motor oils. The Society of Automotive Engineers at least has promulgated a viscosity standard: what have we done? Where is scientific method at work in computer modeling? When peer review is necessarily limited by classification, in such applications as missile engagement modeling and war gaming, what body of standards may the closed community use to detect and eliminate profitable, or deadly, hokum? Is this just one more instance of falsified data and experiments in science generally, of the sort reported on the front page of the Washington Post as or before it hits the journals? (See: "Harvard Researchers Retract Published Medical 'Discovery;'" Boyce Rensberger, Washington Post, 22 November 1986 pg 1 et seq.; and Science, Letters, 28 November 1986.) Several reforms (based on the "publish or perish" practice that is itself in need of reform) immediately suggest themselves. I offer them both as a basis for discussion, and as a call to action, or we shall experience another aspect of Limits to Growth-- widespread rejection of the contributions of computer science, as a suspect specialty: o Refusal to supply data to a peer for purposes of replication might result in the journal immediately disclaiming the article, and temporary or permanent prohibition from publication in the journal in question. o Discovery of falsified data in one publication resulting in restriction from publication (except replies, clarification or retraction) in all publications of the affiliated societies. In computer science, this might be all IEEE publications at the first level, AFIPS, IFIPS and so on. o Widespread and continuing publication of the identities of the authors, and in cases of multiple infractions, their sponsoring institutions, in those same journals, as a databank of refuseniks and frauds. o Prohibition of the use of computer models in public policymaking (as in sworn testimony before Congress) that have not been certified, or audited, much as financial statements of publicly traded companies must now be audited. o Licensing by the state of sale and conveyance of computer models of general economic or social significance, perhaps as defined and maintained by the National Academy of Sciences. The last is extreme, of course, implying enormous bureaucracy and infrastructure to accomplish, and probably itself inevitably subject to abuse. The reforms are all distasteful in a free society. But if we do nothing to put our house in order, much worse is likely to come from the pen or word-processor of a technically naive legislator. In exchange for a profession's privileged status, society demands it be self-policing. Doctors, lawyers, CPAs and the like are expected to discipline their membership and reform their methods when (preferably before) there are gross abuses. Although some of them have failed to do so in recent years, is that an excuse for us not to? Finally, how can we ensure that McNees' prediction, that people will continue to re-engineer our society on models no better than garbage, will prove as false as the models he has described? ------------------------------ End of RISKS-FORUM Digest ************************ -------