[mod.risks] RISKS DIGEST 4.21

RISKS@CSL.SRI.COM (RISKS FORUM, Peter G. Neumann -- Coordinator) (12/01/86)

RISKS-LIST: RISKS-FORUM Digest,  Sunday, 30 November 1986  Volume 4 : Issue 21

           FORUM ON RISKS TO THE PUBLIC IN COMPUTER SYSTEMS 
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Risks of Computer Modeling and Related Subjects (Mike Williams--LONG MESSAGE)

The RISKS Forum is moderated.  Contributions should be relevant, sound, in good
taste, objective, coherent, concise, nonrepetitious.  Diversity is welcome. 
(Contributions to RISKS@CSL.SRI.COM, Requests to RISKS-Request@CSL.SRI.COM)
  (Back issues Vol i Issue j available in CSL.SRI.COM:<RISKS>RISKS-i.j.  MAXj:
  Summary Contents Vol 1: RISKS-1.46; Vol 2: RISKS-2.57; Vol 3: RISKS-3.92.)

----------------------------------------------------------------------

Date: Fri, 28 Nov 86 13:02 EST
From: "John Michael (Mike) Williams" <JWilliams@DOCKMASTER.ARPA>
To: RISKS@CSL.SRI.COM
Subject: Risks of Computer Modeling and Related Subjects (LONG MESSAGE)

  Taking the meretricious "con" out of econometrics and computer modeling:
                  "Con"juring the Witch of Endor
                John Michael Williams, Bethesda MD

Quite a few years ago, the Club of Rome perpetrated its "Limits to Growth"
public relations exercise.  Although not my field, I instinctively found it
bordering on Aquarian numerology to assign a quantity, scalar or otherwise,
to "Quality of Life," and a gross abuse of both scientific method and
scientific responsibility to the culture at large.  Well after the initial
report's firestorm, I heard that a researcher at McGill proved the model was
not even internally consistent, had serious typographical/syntactical errors
that produced at least an order of magnitude error, and that when the errors
were corrected, the model actually predicted an improving, not declining
"Quality of Life."  I called the publisher of "Limits to Growth," into its
umpteenth edition, and asked if they intended to publish a correction or
retraction.  They were not enthusiastic, what with Jerry Brown, as Governor
and candidate for Presidential nomination, providing so much lucrative
publicity.  Jimmy Carter's "malaise" and other speeches suggest that these
dangerously flawed theses also affected, and not for the better, both his
campaign and administration.

This shaman-esque misuse of computers embarrassed the computing
community, but with no observable effect.

On 31 October 1986, Science ran a depressing article entitled:  "Asking
Impossible Questions About the Economy and Getting Impossible Answers"
(Gina Kolata, Research News, Vol.  234, Issue 4776, pp.  545-546).  The
subtitle and the sidebar insert are informative:

  Some economists say that large-scale computer models of the economy are no
  better at forecasting than economists who simply use their best judgment...
  "People are overly impressed by answers that come out of a computer"...

Additional pertinent citations (cited with permission):

   "There are two things you would be better not seeing in the making--
   sausages and econometric estimates," says Edward Learner, an economist at
   [UCLA].  These estimates are used by policymakers to decide, for example,
   how the new tax law will affect the economy or what would happen if a new
   oil import tax were imposed.  They are also used by businesses to decide
   whether there is a demand for a new product.  Yet the computer models that
   generate these estimates, say knowledgeable critics, have so many flaws
   that, in Learner's words, it is time to take the "con out of econometrics."

   ...[E]ven the defenders of the models... [such as e]conomists Kenneth
   Arrow of Stanford and Stephen McNees of the Federal Reserve Board in Boston
   say they believe the models can be useful but also say that one reason the
   models are made and their predictions so avidly purchased is that people
   want answers to impossible questions and are overly impressed by answers
   that come out of a computer...

   The problem, says statistician David Freedman of the University of
   California at Berkeley, is that "there is no economic theory that tells you
   exactly what the equations should look like."  Some model builders do not
   even try to use economic theory...: most end up curve-fitting--a risky
   business since there are an infinite number of equations that will fit any
   particular data set...

   "What you really have," says William Ascher of Duke University, "is a man-
   model system."  And this system, say the critics, is hardly scientific.
   Wassily Leontief of New York University remarks, "I'm very much in favor of
   mathematics, but you can do silly things with mathematics as well as with
   anything else."

   Defenders of the models point out that economists are just making the best
   of an impossible situation.  Their theory is inadequate and it is
   impossible to write down a set of equations to describe the economy in any
   event... But the critics of the models say that none of these defenses
   makes up for the fact that the models are, as Leontief says, "hot air."
   Very few of the models predict accurately, the economic theory behind the
   models is extremely weak if it exists at all, in many cases the data used to
   build the models are of such poor quality as to be essentially useless, and
   the model builders, with their subjective adjustments, produce what is,
   according to Learner, "an uncertain mixture of data and judgment."

When David Stockman made "subjective adjustments," he was reviled for
cooking the numbers.  It seems they may have been hash to begin with.

   [Douglas Hale, director of quality assurance at the (Federal) Energy
   Information Administration] whose agency is one of the few that regularly
   assess models to see how they are doing, reports that, "in many cases, the
   models are oversold.  The scholarship is very poor, the degree of testing
   and peer review is far from adequate by any scientific measure, and there
   is very little you can point to where one piece of work is a building block
   for the next."

   For example, the Energy Information Administration looked at the accuracy
   of short-term forecasts for the cost of crude oil...  At first glance, it
   looks as if they did not do too badly...  But, says Hale, "what we are
   really interested in is how much does the price change over time.  The
   error in predicting change is 91%"

This is about the same error, to the hour, of a stopped clock.

In the Washington Post for 23 November 1986, pg K1 et seq., in an
interview entitled "In Defense of Public Choice," Assar Lindbeck,
chairman of the Swedish Royal Academy's committee for selecting the
Nobel Prize in economics, explains the committee's choice of Professor
James M. Buchanan, and is asked by reporter Jane Seaberry:

   It seems the economics profession has come into some disrepute.  Economists
   forecast economic growth and forecasts are wrong.  The Reagan administration
   has really downplayed advice from economists.  What do you think about the 
   economics profession today?

Chairman Lindbeck replies:

   Well, there's something in what you say in the following sense, I think,
   that in the 1960s, it was a kind of hubris development in the economic
   profession ... in the sense that it was an overestimation of what research
   and scientific knowledge can provide about the possibilities of
   understanding the complex economic system.  And also an overestimation
   about the abilities of economists to give good advice and an overestimation
   of the abilities of politicians and public administrators to pursue public
   policy according to that advice.

   The idea about fine tuning the economy was based on an oversimplified
   vision of the economy.  So from that point of view, for instance,
   economists engaged in forecasting--they are, in my opinion, very much
   overestimating the possibilities of making forecasts because the economic
   system is too complex to forecast.  Buchanan has never been engaged in
   forecasting.  He does not even give policy advice because he thinks it's
   quite meaningless...

What econometric computer model is not "an oversimplified vision of the
economy?" When is forecasting an "economic system ...  too complex to
forecast" not fortune-telling?

To return to Kolata's article:

   [Victor Zarnowitz of the University of Chicago] finds that "when you
   combine the forecasts from the large models, and take an average, they are
   no better than the average of forecasts from people who just use their best
   judgment and do not use a model."

I cannot resist noting that when a President used his own judgment, and
pursued an economic policy that created the greatest Federal deficit in
history but the lowest interest rates in more than a decade, the high
priests of the dismal science called it "voodoo economics." It takes one
to know one, I guess.

   Ascher finds that "econometric models do a little bit worse than judgment.
   And for all the elaboration over the years they haven't gotten any better.
   Refining the models hasn't helped."  Ascher says he finds it "somewhat
   surprising that the models perform worse than judgment since judgment is
   actually part of the models; it is incorporated in when modelers readjust
   their data to conform to their judgment."

Fascinating! Assuming the same persons are rendering "judgments," at
different times perhaps, it implies that the elaboration and mathematical
sophistry of the models actually cloud their judgment when expressed through
the models:  they appear to have lost sight of the real forest for the
papier-mache trees.

   Another way of assessing models is to ask whether you would be better off
   using them, or just predicting that next year will be like this year.  This
   is the approach taken by McNees...  "I would argue that, if you average
   over all the periods [1974-1982] you would make smaller errors with the
   models [on GNP and inflation rates] than you would by simply assuming that
   next year will be just like this year," he says.  "But the errors would not
   be tremendously smaller.  We're talking about relatively small orders of
   improvement."

I seem to recall that this is the secret of the Farmer's Almanac success
in predicting weather, and that one will only be wrong 15% of the time
if one predicts tomorrow's weather will be exactly like today's.

   Other investigators are asking whether the models' results are
   reproducible...  Suprisingly the answer seems to be no.  "There is a real
   problem with scholarship in the profession," says Hale of the Energy
   Information Administration.  "Models are rarely documented well enough so
   that someone else can get the same result..."

   [In one study, about two-thirds of the] 62 authors whose papers were
   published in the [J]ournal [of Money, Credit and Banking]... were unwilling
   to supply their data in enough detail for replication.  In those cases
   where the data and equations were available, [the researchers] succeeded in
   replicating the original results only about half the time...

What a sorry testament!  What has become of scientific method, peer review?

   "Even if you think the models are complete garbage, until there is an
   obviously superior alternative, people will continue to use them," [McNees]
   says.

Saul, failing to receive a sign from Jehovah, consulted a fortune-teller on the
eve of a major battle.  The Witch of Endor's "model" was the wraith of Samuel, 
and it wasn't terribly good for the body politic either.  I keep a sprig of
laurel on my CRT, a "model" I gathered from the tree at Delphi, used to send
the Oracle into trance, to speak Apollo's "truth." I do it as amusement and
memento, not as talisman for public policy.  History and literature are filled 
with the mischief that superstition and fortune-telling have wrought, yet
some economic and computer scientists, the latter apparently as inept as the
Sorcerer's Apprentice, are perpetuating these ancient evils.  Are Dynamo and
decendents serving as late-twentieth-century substitutes for I Ching sticks?

Is the problem restricted to econometrics, or is the abuse of computer
modeling widespread?  Who reproduces the results of weather models, for
instance?  Who regularly assesses and reports on, and culls the unworthy
models?  Weather models are interesting because they may be among the
most easily "validated," yet there remains the institutional question:
when the Washington Redskins buy a weather service, for example, to
predict the next game's weather, how can they objectively predetermine
that they are buying acceptable, "validated" modeling rather than snake
oil?  After all, even snake oil can be objectively graded SAE 10W-40, or
not.  A posteriori "invalidation" by losing while playing in the "wrong"
weather is no answer, any more than invalidation by catastrophic engine
failure would be in motor oils.  The Society of Automotive Engineers at
least has promulgated a viscosity standard:  what have we done?

Where is scientific method at work in computer modeling?  When peer review
is necessarily limited by classification, in such applications as missile
engagement modeling and war gaming, what body of standards may the closed
community use to detect and eliminate profitable, or deadly, hokum?  Is this
just one more instance of falsified data and experiments in science
generally, of the sort reported on the front page of the Washington Post as
or before it hits the journals?  (See:  "Harvard Researchers Retract
Published Medical 'Discovery;'" Boyce Rensberger, Washington Post, 22
November 1986 pg 1 et seq.; and Science, Letters, 28 November 1986.)

Several reforms (based on the "publish or perish" practice that is
itself in need of reform) immediately suggest themselves.  I offer them
both as a basis for discussion, and as a call to action, or we shall
experience another aspect of Limits to Growth-- widespread rejection of
the contributions of computer science, as a suspect specialty:

   o Refusal to supply data to a peer for purposes of replication might
result in the journal immediately disclaiming the article, and temporary
or permanent prohibition from publication in the journal in question.

   o Discovery of falsified data in one publication resulting in
restriction from publication (except replies, clarification or
retraction) in all publications of the affiliated societies.  In
computer science, this might be all IEEE publications at the first
level, AFIPS, IFIPS and so on.

   o Widespread and continuing publication of the identities of the authors,
and in cases of multiple infractions, their sponsoring institutions, in
those same journals, as a databank of refuseniks and frauds.

   o Prohibition of the use of computer models in public policymaking (as in
sworn testimony before Congress) that have not been certified, or audited,
much as financial statements of publicly traded companies must now be audited.

   o Licensing by the state of sale and conveyance of computer models of
general economic or social significance, perhaps as defined and
maintained by the National Academy of Sciences.

The last is extreme, of course, implying enormous bureaucracy and
infrastructure to accomplish, and probably itself inevitably subject to
abuse.  The reforms are all distasteful in a free society.  But if we do
nothing to put our house in order, much worse is likely to come from the
pen or word-processor of a technically naive legislator.

In exchange for a profession's privileged status, society demands it be
self-policing.  Doctors, lawyers, CPAs and the like are expected to
discipline their membership and reform their methods when (preferably
before) there are gross abuses.  Although some of them have failed to do
so in recent years, is that an excuse for us not to?

Finally, how can we ensure that McNees' prediction, that people will
continue to re-engineer our society on models no better than garbage,
will prove as false as the models he has described?

------------------------------

End of RISKS-FORUM Digest
************************
-------