[sci.research] How many people read an average research paper?

jkpachl@watdaisy.UUCP (Jan Pachl) (11/05/86)

Question:
---  How many people read an average research paper? ---

Has anyone considered (or even investigated) this question?
Obviously, the question is ill-posed; one would have to define
"research paper", "average", and "read".  Has anyone
tried to answer the question for _any_ reasonable definition
of those terms?

Perhaps a good definition of a reader for this purpose would be
"someone who has spent enough time on the paper to learn more
than what could be found from a short abstract".

Other related questions (e.g. "how many research papers quote
an average research paper?") are much easier to formulate
precisely (and to answer), but they are not as interesting.

Jan Pachl,   University of Waterloo

bogstad@brl-smoke.ARPA (William Bogstad ) (11/06/86)

In article <7966@watdaisy.UUCP> jkpachl@watdaisy.UUCP (Jan Pachl) writes:
>Question:
>---  How many people read an average research paper? ---
>

	I had a conversation a few days ago when someone mentioned
a supposed study that had been done once.  (I'm afraid he either didn't
know the details or I have forgotten them.)  In any case, papers that
had been previously published in a journal in the recent past were
sent out again to the current reviewers.  In only one instance, did
someone notice that the paper had been previously published.  (The text
remained the same only the title and author's name were changed.)
Note, this may be an apocryphal story.

				Bill Bogstad
				bogstad@hopkins-eecs-bravo.arpa

roy@phri.UUCP (Roy Smith) (11/06/86)

In article <7966@watdaisy.UUCP> jkpachl@watdaisy.UUCP (Jan Pachl) writes:
> 
> Other related questions (e.g. "how many research papers quote
> an average research paper?") are much easier to formulate
> precisely (and to answer), but they are not as interesting.

	Actually, this question has in fact been answered.  Science
Citation Index (put out by ISI Press, I believe; the same people who bring
you Current Contents) list papers according to citations.  This is usually
a excellent way to do library research -- start with a paper that you're
interested in and trace out a chain of people who have cited that paper.
ISI lists each year the papers which get cited the most often.  That's
really the only way to say "this was an important piece of work".  If more
people have cited your paper than any other paper, it's probably the most
important.
-- 
Roy Smith, {allegra,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

"you can't spell unix without deoxyribonucleic!"

gh@utai.UUCP (11/09/86)

In article <7966@watdaisy.UUCP> jkpachl@watdaisy.UUCP (Jan Pachl) writes:
>> Other related questions (e.g. "how many research papers quote
>> an average research paper?") are much easier to formulate
>> precisely (and to answer), but they are not as interesting.

In article <2483@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>	Actually, this question has in fact been answered.
>	   [blurb on Science Citation Index, but not The Answer!]

ISI found long ago that in any given year, the ratio of the number of citations
they processed to the number of articles they processed was between 1.65 and
1.7.  Fewer than 25% of all articles published are cited more than 10 times by
other articles.    See Eugene Garfield's article in /Current Contents/, 9 Feb
1976 (reprinted in his /Essays of an Information Scientist/, vol 2.)

Garfield has also reported on the question that started this discussion, how
many people read the average article (excluding the writer, editor, referees,
typesetter, etc).  The number was amazingly small; unfortunately, I have been
unable to find the reference.  Perhaps with this clue someone else may succeed.

Finally, let me commend Roy for his blurb on the Science Citation Index.
Although it is well-known and used in the hard sciences, fewer people in
the computing and mathematical sciences (whom I assume are a large slab of
the readers of these groups) seem to know of it or use it; a great pity!
-- 
\\\\   Graeme Hirst    University of Toronto	Computer Science Department
////   utcsri!utai!gh  /  gh@ai.toronto.edu  /	416-978-8747

larsen@brahms (Michael Larsen) (11/10/86)

In article <2483@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:

>Science
>Citation Index (put out by ISI Press, I believe; the same people who bring
>you Current Contents)...
>lists each year the papers which get cited the most often.  That's
>really the only way to say "this was an important piece of work".  If more
>people have cited your paper than any other paper, it's probably the most
>important.
>-- 
>Roy Smith, {allegra,philabs}!phri!roy
>System Administrator, Public Health Research Institute
>455 First Avenue, New York, NY 10016

	This is an interesting theory.  Let's see how it stands up against a
short trip through the math citations index (CMCI).  The following observations
can be duplicated by anyone with access to the 1976-80 edition.

1.  Most people would consider Newton's _Principia_ to be a work of some
importance.  CMCI gives 2 references.

2.  O.K., that example was unfair because the paper in question is fairly old.
It is hard to choose a contemporary mathematical work whose title most
people will recognize.  Nevertheless, we can use Math Reviews as an indication
of stature.  This usually staid index fairly gushed over Deligne's 1972
paper "La Conjecture de Weil pour les surfaces K3."  I quote:

	This paper is awe-inspiring.  Its powerful technique and arithmetic
	insight should recommend it to a larger audience, although its
	sophistication will cause trouble for almost all readers.

How many references did this masterpiece garner from 1976 to 1980? Four.
And that includes one by the author himself.

3.  How about Durbin-Watson, "Testing for Serial Correlation in Least Squares
Regression?"  This paper, in which a theorem of Von Neumann is rederived, has
a reputation for being often cited.  It lives up to it in CMCI: 52 citations.

4.  A random search through the columns of CMCI turned up a 1964 paper by one
J. B. Kruskal which has 116 citations.  It is quite possible that I am 
merely exposing my ignorance, but I confess to having heard of neither
the mathematician in question nor the work.

The idea that a reference count is an accurate indication of the quality of
a scholar must have a strong appeal to the bureaucratic mind.  Unfortunately,
the real world doesn't seem to work that way.

-larsen @ berkeley.edu.brahms

jin@hropus.UUCP (Jerry Natowitz) (11/11/86)

One thing that amazes my oldest sister, now an enviornmental
mutagenisist, is how often her earliest research (1960s) is still
quoted.  I guess it helps to do your dissertation on interferon ...
-- 
     Jerry Natowitz (HASA - J division)
     Bell Labs - HR 2A-214
     201-615-5178 (no CORNET yet)
     ihnp4!houxm!hropus!jin (official)
     ihnp4!opus!jin         (better)

bzs@bu-cs.UUCP (Barry Shein) (11/11/86)

When I was working in medical research at Harvard I remember getting
into a big argument with someone about what the chances were that,
given a random sample of papers, some number of them were wrong and
would be later disproved. I threatened to do a T-test of such
overturned findings on his group's papers and publish that I had
discarded the null hypothesis and proven, with a p << .001, that
everything they have ever said or would say was wrong.

I don't speak to that person any more...

	-Barry Shein, Boston University

shor@sphinx.UChicago.UUCP (Melinda Shore) (11/11/86)

[]
Citation analysis is a popular research area among library science Ph.Ds.
Much of what they've found has been consistent with what you'd intuitively
expect.  Researchers in the sciences don't cite as heavily as researchers
in the social sciences and humanities, and are less likely to make
obligatory citations to standard works.  One thing we're seeing more of is
people citing themselves heavily, as citation count is beginning to be
considered in tenure decisions.
-- 
Melinda Shore                               ..!ihnp4!gargoyle!sphinx!shor
University of Chicago Computation Center    XASSHOR@UCHIMVS1.Bitnet

dickey@ssc-vax.UUCP (Frederick J Dickey) (11/11/86)

> In article <2483@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
> 
> >Science
> >Citation Index (put out by ISI Press, I believe; the same people who bring
> >you Current Contents)...
> >lists each year the papers which get cited the most often.  That's
> >really the only way to say "this was an important piece of work".  If more
> >people have cited your paper than any other paper, it's probably the most
> >important.
> >-- 
> >Roy Smith, {allegra,philabs}!phri!roy
> >System Administrator, Public Health Research Institute
> >455 First Avenue, New York, NY 10016

I read an interesting article (in Science, I think) a few years ago that
is somwhat relevant to this discussion. My recollection of the article
follows.

It dealt with the subject of LPU's. LPU = Least Publishable Unit. Years
ago someone suggested that the number of citations of a paper might be
a way of measuring the significance of the paper. At that time, it might
have been. However, many researchers said to themselves, "WOW! I can
increase the significance of my paper if it is cited a lot. I can increase
my significance if I get lots of papers of mine cited." So these guys 
started splitting up their papers into atomic units (LPUs) so that they
could get lots of citations. They also insisted on being listed as an
co-author if they made any contribution at all to a paper, however minute.
This is why you see papers with a zillion authors. To ensure the papers
got cited, they worked out deals, "I'll cite you if you cite me." The upshot
seems to be that number of citations may reflect political rather than
technical acumen.

---f.j. dickey

berman@psuvax1.UUCP (Piotr Berman) (11/12/86)

In article <236@cartan.Berkeley.EDU> larsen@brahms (Michael Larsen) writes:
>In article <2483@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>
>>Science
>>Citation Index (put out by ISI Press, I believe; the same people who bring
>>you Current Contents)...
>>lists each year the papers which get cited the most often.  That's
>>really the only way to say "this was an important piece of work".  If more
>>people have cited your paper than any other paper, it's probably the most
>>important.
>
>	This is an interesting theory.  Let's see how it stands up against a
>short trip through the math citations index (CMCI).  The following observations
>can be duplicated by anyone with access to the 1976-80 edition.
>
>1.  Most people would consider Newton's _Principia_ to be a work of some
>importance.  CMCI gives 2 references.
>
>2.  O.K., that example was unfair because the paper in question is fairly old.
>It is hard to choose a contemporary mathematical work whose title most
>people will recognize.  Nevertheless, we can use Math Reviews as an indication
>of stature.  This usually staid index fairly gushed over Deligne's 1972
>paper "La Conjecture de Weil pour les surfaces K3."  I quote:
>
>	This paper is awe-inspiring.  Its powerful technique and arithmetic
>	insight should recommend it to a larger audience, although its
>	sophistication will cause trouble for almost all readers.
>
>How many references did this masterpiece garner from 1976 to 1980? Four.
>And that includes one by the author himself.
>
>3.  How about Durbin-Watson, "Testing for Serial Correlation in Least Squares
>Regression?"  This paper, in which a theorem of Von Neumann is rederived, has
>a reputation for being often cited.  It lives up to it in CMCI: 52 citations.
>
>4.  A random search through the columns of CMCI turned up a 1964 paper by one
>J. B. Kruskal which has 116 citations.  It is quite possible that I am 
>merely exposing my ignorance, but I confess to having heard of neither
>the mathematician in question nor the work.
>
I do not know the paper either, but the paper

  J.B. Kruskal, On the shortest spanning subtree of a graph and
		the travelling salesman problem

is cited by any textbook on data structure and algorithms.  
In general, if someone has a very deep and difficult theorem which 'closes'
certain topic, it will not be cited very much.  On the other hand, even
a weak paper which 'opens' an area of research which becomes very popular,
will be cited very often (often without reading, I guess, many times people
cite citations of others).

>The idea that a reference count is an accurate indication of the quality of
>a scholar must have a strong appeal to the bureaucratic mind.  Unfortunately,
>the real world doesn't seem to work that way.
>
>-larsen @ berkeley.edu.brahms

Here is the catch: there is not such a thing as a precise indication of
quality or importance.  But imprecise indicators have their value, if used
with care.

Piotr Berman

mae@weitek.UUCP (Mike Ekberg) (11/13/86)

In article <236@cartan.Berkeley.EDU> larsen@brahms (Michael Larsen) writes:
>The idea that a reference count is an accurate indication of the quality of
>a scholar must have a strong appeal to the bureaucratic mind.  Unfortunately,
>the real world doesn't seem to work that way.
>
>-larsen @ berkeley.edu.brahms

I think the reference count does not necessarily indicate the quality of
a scholar. But reference counts may be used to determine the areas of 
current work in a given field. 

As an example, you could generate a 'citation' index for usenet. You might 
find that article <236@cartan.Berkeley.EDU has been cited several times
in the last two weeks. Was it a good article? Who knows? but I do know
that in this news group there are several people pursuing the topic
of Citation Indices. Maybe i'll unsubscribe if this topic continues:-}.

mike - {cae780,turtlevax}/weitek/mae

braner@batcomputer.tn.cornell.edu (braner) (11/13/86)

[]

Another interesting twist to the glorified "citation index" is that
if you write a provocative enough paper, you're bound to have it cited
(as a bad example).  So a high index does not prove it was a GOOD paper!
Sort of like well-known political candidates having to respond to
negative claims about them made by unknown candidates, thus giving the latter
much-needed free publicity.  Of course, they (both) may deserve it!

- Moshe Braner

roy@phri.UUCP (Roy Smith) (11/13/86)

In article <937@ssc-vax.UUCP> dickey@ssc-vax.UUCP (Frederick J Dickey) writes:
> I read an interesting article (in Science, I think) a few years ago that
> is somwhat relevant to this discussion. [...] It dealt with the subject
> of LPU's. LPU = Least Publishable Unit.

	Quoting from a recent issue of a computer science journal (the
names have been changed to protect the innocent and to protect me from
lawsuits):

	J.P. Foobar received the Ph.D. degree in computer science
	from Random University in 1975.  [...] Dr. Foobar has
	published over 100 papers.

	How does that strike you?  My initial impression was "Hmm, over 100
papers in 11 years?  That's like 1 every 6 weeks!  Something's fishy here".
Maybe I'm wrong (I havn't read most of Dr. Foobar's papers), but I just
can't believe *anybody* can do something worth publishing every 6 weeks.

	A common (and, in my opinion, disreputable) practice in biology is
to get your name on a paper by providing some technical service, trumped up
as a collaborative effort.  "Sure, I'll give your sample to my technician
and tell him to run it through my Amino Acid Sequenator if you make me a
co-author on your paper".
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

"you can't spell deoxyribonucleic without unix!"

jbk@alice.UUCP (11/21/86)

In article <236@cartan.Berkeley.EDU> 
larsen@brahms.berkeley.edu (Michael Larsen) writes:

>4.  A random search through the columns of CMCI turned up a 1964 paper by one
>J. B. Kruskal which has 116 citations.  It is quite possible that I am 
>merely exposing my ignorance, but I confess to having heard of neither
>the mathematician in question nor the work.

In article <2326@psuvax1.UUCP> berman@psuvax1.UUCP (Piotr Berman) writes:

>I do not know the paper either, but the paper
>
> J.B. Kruskal, On the shortest spanning subtree of a graph and
>		the travelling salesman problem
>
>is cited by any textbook on data structure and algorithms.  
>In general, if someone has a very deep and difficult theorem which 'closes'
>certain topic, it will not be cited very much.  On the other hand, even
>a weak paper which 'opens' an area of research which becomes very popular,
>will be cited very often (often without reading, I guess, many times people
>cite citations of others).

To Michael Larsen: The 1964 paper you cite concerns non-metric multidimensional 
scaling.  This method has achieved routine use in psychology, marketing, and 
some other fields.  (And I haven't heard of you either, darling!)

To Piotr Berman: The 1956 "shortest spanning subtree" paper you cite was
written while I was a graduate student at Princeton, and was only my second 
published paper.  (May you write many papers as weakly popular as this one.)

A finitized form of a theorem from my Ph.D. thesis was the first proposition of
genuine mathematical interest to be demonstrated as undecidable in a formal
system (work by Harvey Friedman, using the first new method for demonstrating
undecidability since Goedel introduced the concept).

Sic transit gloria mundi.

Joseph B Kruskal

larsen@brahms (Michael Larsen) (11/21/86)

>In article <236@cartan.Berkeley.EDU> 
>larsen@brahms.berkeley.edu (Michael Larsen) writes:
>
>>4.  A random search through the columns of CMCI turned up a 1964 paper by one
>>J. B. Kruskal which has 116 citations.

>>	(*)	It is quite possible that I am merely exposing my ignorance,

>>but I confess to having heard of neither
>>the mathematician in question nor the work.
>
>To Michael Larsen: The 1964 paper you cite concerns non-metric multidimensional 
>scaling.  This method has achieved routine use in psychology, marketing, and 
>some other fields.  (And I haven't heard of you either, darling!)

My apologies to Dr. Kruskal for gratuitously bringing his name into a
discussion of the merits of citation counting.  I can only attribute the
selection to the immense popularity of his paper (his fault) and the accident
of my not recognizing his name (my fault).  I imagine he would be gratified
by the number of sci.math subscribers who brought my attention to the
validity of (*).

>Sic transit gloria mundi.
>Joseph B Kruskal

Indeed.  Happy for Isaac Newton that he was long dead on the day that his 1687
treatise on laws of motion (which has achieved routine use in physics, 
engineering, and some other fields) fell ingloriously before a paper on
non-metric multidimensional scaling.

Michael J. Larsen @ berkeley.brahms.edu

berman@psuvax1.UUCP (Piotr Berman) (11/25/86)

>In article <236@cartan.Berkeley.EDU> 
>larsen@brahms.berkeley.edu (Michael Larsen) writes:
>
>>4.  A random search through the columns of CMCI turned up a 1964 paper by one
>>J. B. Kruskal which has 116 citations.  It is quite possible that I am 
>>merely exposing my ignorance, but I confess to having heard of neither
>>the mathematician in question nor the work.
>
>In article <2326@psuvax1.UUCP> berman@psuvax1.UUCP (Piotr Berman) writes:
>
>>I do not know the paper either, but the paper
>>
>> J.B. Kruskal, On the shortest spanning subtree of a graph and
>>		the travelling salesman problem
>>
>>is cited by any textbook on data structure and algorithms.  
>>In general, if someone has a very deep and difficult theorem which 'closes'
>>certain topic, it will not be cited very much.  On the other hand, even
>>a weak paper which 'opens' an area of research which becomes very popular,
>>will be cited very often (often without reading, I guess, many times people
>>cite citations of others).
>
>To Michael Larsen: The 1964 paper you cite concerns non-metric multidimensional
>scaling.  This method has achieved routine use in psychology, marketing, and 
>some other fields.  (And I haven't heard of you either, darling!)
>
>To Piotr Berman: The 1956 "shortest spanning subtree" paper you cite was
>written while I was a graduate student at Princeton, and was only my second 
>published paper.  (May you write many papers as weakly popular as this one.)
>
>A finitized form of a theorem from my Ph.D. thesis was the first proposition of
>genuine mathematical interest to be demonstrated as undecidable in a formal
>system (work by Harvey Friedman, using the first new method for demonstrating
>undecidability since Goedel introduced the concept).
>
>Sic transit gloria mundi.
>
>Joseph B Kruskal

Sorry for a clumsy formulation.  I LIKE KRUSKAL ALGORITHM.  Any former
student of Comp. Sc. must now it, so I was surprised that your name was
unfamiliar to someone here.

But you must admit that it was not a most difficult result of yours.
And would I quote you, I would do it by quoting the reference from a
textbook, without reading this paper.  I would conjecture that many
people writing on applications of your 1964 paper read about the result,
and then requoted the reference.  This perhaps indicates that the question
should be 'how many people learn an average mathematical result' rather
then 'how many people read an average paper'.  Very few people quote
Pitagoras, for example.

Sorry that I am ignorant of your thesis, may be I should read it over the
Christmass as a pennance.

Piotr Berman