[net.sport.baseball] Lineup dependency

david@fisher.UUCP (David Rubin) (09/17/85)

You just KNEW I wasn't going to let Paul's demonstration pass without
challenge, didn't you??

First, let me say right off that while I disagree with what most of
Paul wrote, if I countered all his points,
	(a) this article would be another monster, and
	(b) general principles would be lost among specifics.

Much of Paul's arguments are anecdotal in nature: he brings up a case
which he believes supports his position, and concludes that, since his
explanation is CONSISTENT with his own observations, it must be TRUE.
As an example, he credits McGee's year to Coleman; he is satisfied
that since his explanation makes sense,

	(1) he may disregard alternate explanations of the event, and
	(2) he need not further investigate.


I shall limit myself, therefore, to the general comment (call it
Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE
EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT
HAS ACTUALLY OCCURRED.  All of Paul's explanations mean little,
therefore, until he establishes that what his explanations explain has
indeed happened!  Only in the case of Mattingly does he attempt to
actually demonstrate that a lineup effect exists, and I will therefore
concentrate on it.  Elsewhere, he merely shows lineup effects are
consistent with his selected observations without either showing other
explanations are inconsistent or that the observations would be
inexplicable without lineup effects. Unfortunately, he places far
more interpretive weight on his statistics than they can bear; odd,
considering his previously expressed worry that I, as a Statistician,
was likely to be taken in by a spurious (and superficial) correlation.

>              Mattingly's stats       Yankees record
>               BA     Slugging         W    L     Pct.
>Batting 2nd   .402      .715          27    8    .771
>Batting 3rd   .303      .495          42   40    .525
>   or 4th
>We can see that not only are personal stats highly dependent
>on the other people in the lineup, but also dependent on the order
>in which those people bat.

			CONFOUNDMENT?

We can only say this if we know the ONLY thing that is varying is the
lineup.  It may be that Mattingly has batted second ONLY against
right-handed pitching or that some OTHER factor is responsible for the
difference.  In other words, a simple breakdown such as this is
worthless (possibly even worse: it may be misleading) unless we also
know that the circumstances of the two categories (batting 2nd vs.
batting 3rd or 4th) are otherwise similar; otherwise, it may be some
other factor (such as lefty-righty, home-away, grass-turf, day-night,
etc.), strongly correlated with the categories, that is driving the
discrepancy (Statisticians refer to this confusion of one cause with
another as "confounding").

			AMOUNT OF DATA?

Moreover, even if Paul COULD assure us that this was so, he does not
have nearly enough data.  Examine, in particular, the data for batting
second: it is based on 35 games, i.e. about 100-150 at bats.  Most
fans will not put much store in a player's average after 35 games
(early May), and for good reason: the player has not yet accumulated
enough at bats for us to form any reasonable opinion as to his likely
seasonal productivity.  We are talking about guessing whether a player
is hitting .300 or .400 based on that many at bats: it would not be at
all unusual for the difference (10 to 15 hits) to be due to a "hot" or
"cold" streak (what Statisticians conveniently label "random", but we
may understand as being that which is beyond our knowledge).  We would
need to have many more at bats (perhaps in a couple of more seasons we
will) before we could say that the difference is due to the position
in the lineup rather than a propitious hot streak.  To put it another
way, if a lifetime .300 hitter were to have a .400 average on May 5th,
would you tentatively conclude (until further info was available) that
the man would bat .400 for the season?  Of course not.  You would
correctly conclude that he is more likely to hit .300 from June
through September than .400.  He may just have had a good April...

			LIMITED APPLICATION?

Even if it were established for Mattingly, it would hold only for
Don Mattingly with the current Yankees: to apply it to, say, Tony
Pena, it would have to be demonstrated for a wide variety of players on
a wide variety of teams.  Still, it would be quite a surprise to me if
anyone could get even that far.

			TSN BIAS!!!!

Finally, the selection is biased.  The Sporting News didn't say,
"Let's check on Mattingly's stats and publish regardless", as they
would have to if we were to have any hope that Mattingly was somehow
typical; they certainly perused all the available stats and published
the one(s) they considered most "interesting" or "newsworthy".  We can
be certain that the discrepancy in Mattingly's stats are therefore
unusually large.  If that is the greatest discrepancy available among
the 300 or so regular players, Mattingly's extra 10-15 hits and 20-30
extra bases in his 150 at bats, then I am very unimpressed: such
discrepancies would probably be as large in a similarly sized sample
broken down into phases of the moon.  Make no mistake: it is Sporting
News's job to publish discrepancies such as this because they are
among the largest, as their readers demand the unusual, not the
typical.

			PENA-CARTER

Yes, this came up again, and I have to point out that
	
	(1) My arguments were based entirely on pre-1985, and 
	(2) Pre-1985, Pena's team was about as productive as Carter's

so that Paul's argument (again) about Hernandez and Strawberry being
responsible (again) for Carter's stats are irrelevant (again).  And
even if we WERE to consider them, why does Paul believe that Carter
has his stats inflated by Hernandez, Strawberry, and Foster when NONE
of those three show any substantial increase in production over last
year?  I suppose Paul believes Carter has a special dispensation: in
moving from the Expos to the Mets, he gains by being surrounded by
Keith, Darryl, and George, while those three do NOT gain from Gary's
presence.  The fact is, the production of all four has remained about
the same over the past two years, an argument AGAINST lineup effects.
My apologies for not being able to resist the anecdotal argument..

			CONCLUSION

For Paul to demonstrate lineup effects, he will need

	(1) More data (more players, more at bats),
	(2) Better data (some effort to exclude other factors;
	      however, it may suffice to simply have more data, so
	      that we may reasonably expect to have other factors
	      balance out), and
	(3) Unbiased data (players selected because, a priori, we
	      think their records will be most illuminating; a
	      posteriori selection, a la TSN, is invalid).

Neither Paul nor I has the time nor resources to do this.  Some people
do, and are supposedly doing it (the folks at SABR...).  They have, so
far, according to Pete Palmer, found "no evidence" of lineup effects.
This does not "disprove" lineup effects; however, it detracts from
human understanding to accept as truth all that is not disproven.

					David Rubin
			{allegra|astrovax|princeton}!fisher!david

dpb@philabs.UUCP (Paul Benjamin) (09/24/85)

Alright, folks, here's another exceedingly long posting for anyone
who cares to keep track of this argument over what baseball
statistics can and cannot mean. It consists of a point-by-point
rebuttal of a posting by David Rubin.

> First, let me say right off that while I disagree with what most of
> Paul wrote, if I countered all his points,
>       (a) this article would be another monster, and
>       (b) general principles would be lost among specifics.
> 
> Much of Paul's arguments are anecdotal in nature: he brings up a case
> which he believes supports his position, and concludes that, since his
> explanation is CONSISTENT with his own observations, it must be TRUE.
> As an example, he credits McGee's year to Coleman; he is satisfied
> that since his explanation makes sense,
> 
>       (1) he may disregard alternate explanations of the event, and
>       (2) he need not further investigate.

I wish that, for once, you would read what I wrote. The points I presented
were not of my own making. They are the opinions of, among others,
Billy Martin, and the author of the article. Have you read the article?

Also note that everything you have said above can be said about you!
You disregard my explanation of the events, and have not proven, in any
sense, that on-base average and slugging average are independent of
factors such as who is batting in front or behind you. Your evidence
is completely anecdotal. You embrace those stats without showing
that any strong correlation exists between them and scoring runs (or
more precisely, that a stronger correlation exists than for, say, the
stat R + RBI - HR.) It is not just my responsibility to
prove that lineup dependencies exist. It is also yours to prove that
they don't!

> I shall limit myself, therefore, to the general comment (call it
> Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE
> EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT
> HAS ACTUALLY OCCURRED.  

Perhaps you should take a course or two in prob&stat and learn the
actual laws, instead of making up your own.

> All of Paul's explanations mean little,
> therefore, until he establishes that what his explanations explain has
> indeed happened!  Only in the case of Mattingly does he attempt to
> actually demonstrate that a lineup effect exists, and I will therefore
> concentrate on it.  Elsewhere, he merely shows lineup effects are
> consistent with his selected observations without either showing other
> explanations are inconsistent or that the observations would be
> inexplicable without lineup effects. 
> In other words, a simple breakdown such as this is
> worthless (possibly even worse: it may be misleading) unless we also
> know that the circumstances of the two categories (batting 2nd vs.
> batting 3rd or 4th) are otherwise similar; otherwise, it may be some
> other factor (such as lefty-righty, home-away, grass-turf, day-night,
> etc.), strongly correlated with the categories, that is driving the
> discrepancy (Statisticians refer to this confusion of one cause with
> another as "confounding").

Marvelous! Perfect! I'm SO glad you said this. It's much easier to
shoot down someone's argument when he provides the ammunition
himself.

This is exactly the point I have been making for weeks. "it may be
misleading unless we know that the circumstances of the two categories 
are otherwise similar..." Two players for different teams do not
satisfy this criterion, and thus their stats are not directly
comparable. For example, many, including myself, like Guerrero for
the MVP, but I don't favor him because he leads the NL in slugging
and on-base average. Those stats are irrelevant, since you can't compare
them to, say, Dale Murphy's stats. Why not? It's simple. 18 times
a year, Dale Murphy has to face the great Dodger pitching staff, which
is clearly the best in the league, while Guerrero faces the Braves' staff,
which is one of the worst. That's over 11% of the season. This is in
addition to other differences, such as the number of day/night games,
the different stadiums they play in, the number of double-headers they
play in, the number of day games after night games, etc. They don't even
play the exact same other teams, either! After all, if a team played
most of its games against Philadelphia earlier in the year, they faced a
much easier opponent than a team whose schedule calls for them to
face Phila now. The reverse is true for the Cubs. Playing them
before all their starters were injured is different than playing
them afterwards.

So, unless you can correct for ALL these factors, and others, to
ensure that your circumstances are similar, all the analyses that
you have posted are "worthless (possibly even worse: (they) may be
misleading".

The only attempt you have made to correct your stats is to include
a ratio which takes into account the differences between stadiums,
and how hard they are for hitters. But even this attempt showed your
statistical inexperience. Saying, for example, that park A is 10% percent
harder to hit in than park B because the overall averages (of say, slugging)
are 10% lower, is a valuable and meaningful stat when applied to the whole
group of hitters - it provides information on the park to its owners.
But it is TOTALLY MEANINGLESS to apply this stat to individual batters
in this park. One must also know the shape of the distribution. It could
be that almost nobody hits 10% worse in that park - that many hit much worse
or better, and it averages out to 10%. For example, if a country's families
have 2.3 children on the average, it doesn't mean that anyone has 2.3
children, or even that most families have 2 or 3 children. Bivariate
distributions are not uncommon, and in these, almost noone is around
the mean.

Furthermore, the reason that I use only Mattingly is that these stats
are rarely available. It's much easier to compute personal averages
such as batting average, slugging average, runs, RBI, etc. than to
compute how much a batter tends to improve the stats of those batting
ahead of him or behind him, etc. We almost never see these stats. We
don't often enough see stats such as batting average with runners in
scoring position, etc. You criticize me for the deficiencies of baseball
statisticians everywhere. It's not my fault, so don't criticize me
for it.

> Moreover, even if Paul COULD assure us that this was so, he does not
> have nearly enough data.  Examine, in particular, the data for batting
> second: it is based on 35 games, i.e. about 100-150 at bats.  Most
> fans will not put much store in a player's average after 35 games
> (early May), and for good reason: the player has not yet accumulated
> enough at bats for us to form any reasonable opinion as to his likely
> seasonal productivity.  We are talking about guessing whether a player
> is hitting .300 or .400 based on that many at bats: it would not be at
> all unusual for the difference (10 to 15 hits) to be due to a "hot" or
> "cold" streak (what Statisticians conveniently label "random", but we
> may understand as being that which is beyond our knowledge).  We would
> need to have many more at bats (perhaps in a couple of more seasons we
> will) before we could say that the difference is due to the position
> in the lineup rather than a propitious hot streak.  To put it another
> way, if a lifetime .300 hitter were to have a .400 average on May 5th,
> would you tentatively conclude (until further info was available) that
> the man would bat .400 for the season?  Of course not.  You would
> correctly conclude that he is more likely to hit .300 from June
> through September than .400.  He may just have had a good April...

Again I wish you would actually read the article before you respond to
it! Of course, I know you already know everything :-) Mattingly's hot
stats for the second position were not compiled in one streak. He started
the season batting 3-4, then moved him to 2 in May for 17 games. He was
then moved back to 3-4, but occasionally in June and July batted 2. The
article does not give stats for those instances alone, but states that
it "worked like a charm". He was still usually batting 3-4, but was
moved to 2 on August 5, when Martin became aware of the stats for his
earlier production in the 2 spot. So, it is NOT the result of a hot streak. 
As for right-handed vs. left-handed opposition, I checked the games from
August 5 on. There were both right-handed and left-handed opponents.
He is playing full-time in that spot, so he faces all types of pitching.
Martin moved him to 2 on August 5 because of his excellent production
in that spot before.

> Even if it were established for Mattingly, it would hold only for
> Don Mattingly with the current Yankees: to apply it to, say, Tony
> Pena, it would have to be demonstrated for a wide variety of players on
> a wide variety of teams.  Still, it would be quite a surprise to me if
> anyone could get even that far.

I see! Whenever I come up with evidence, it counts only for that
case, but you have never detailed an instance of a player changing,
say, his lineup position and keeping the same OBA and slugging pct.,
but I am supposed to swallow your arguments!

It's interesting. When I respond to your postings, I feel like I'm trying
to explain baseball to a Martian. You know so little about the game!
EVERYBODY knows that lineups are interdependent! Try watching a game
sometime (instead of just reading numbers). You'll see that when a runner
is on base, it affects (among other things):

    1) the way the pitcher throws. Using the stretch instead of a full
       windup definitely hurts most pitchers' performances. Otherwise, 
       there would be no need for anyone to ever windup.

    2) the pitch selection;

    3) the defensive alignment.

Thus, if the batter ahead of, say Mattingly, gets on base more often,
is a threat to steal, and gets in scoring position more often, he
can (and does) affect whether Mattingly gets a hit or not.
Perhaps we should just forget this whole argument. You will continue
to emphasize the individual aspects of the game, and I will continue
to emphasize the team aspects. After all, if we both enjoy the game,
that's the purpose of baseball anyway.

By the way, if you still doubt the existence of lineup dependency (which
you undoubtedly still do) then answer the following question:

    If there were no lineup interaction, then all managers would bat their
    best hitter first, then their second-best, etc. to give them the
    most opportunities to hit. Thus, according to your criteria (OBA and
    slugging pct), the way to optimize the team's OBA and slugging pct
    is to bat the best in these categories first, the next-best second,
    etc. We would see Carter batting leadoff for the Mets, and Coleman
    would not be the leadoff hitter for St. Louis, McGee would be, followed
    by Clark. Coleman would be somewhere around 6 or 7. Come to think of it,
    since Cedeno has been playing for the Cards and the way he has been
    hitting, he would be batting leadoff. Also, Guerrero would be
    hitting leadoff for LA (absurd!).

    As ANY real baseball fan knows, managers carefully
    pick the order to help run production, e.g. alternating left-handed
    and right-handed batters, and putting speedsters in front of hitters
    who hit well with men in scoring position. WHY WOULD THEY BOTHER TO 
    DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly
    leadoff, to get him more atbats? Maybe the fact that he would be
    batting behind a much weaker hitter just MIGHT have a teeny-weeny
    little bit to do with it?!

    Thus, we see that some excellent managers, such as Whitey Herzog,
    deliberately put a player like Coleman, who has a lower OBA and
    slugging average than McGee, in the spot where he will get the most 
    at-bats, thus effectively reducing the overall OBA and slugging pct of 
    his team. Do you really think he is deliberately reducing the run-scoring
    ability of his team? Or do you just think that all these baseball
    professionals are sadly misguided? The only other alternative is
    that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH
    TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them
    up to be. There must be other factors, e.g., speed.

    To rephrase this point, so that you will have less chance of
    misinterpreting it, if Guerrero's slugging avg and OBA are what
    are most important to the Dodgers, then he should bat leadoff,
    so as to maximize the team's slugging avg and OBA. He doesn't,
    and the very idea seems preposterous. Either Lasorda doesn't
    understand the game as you do, or your emphasis on OBA and
    slugging is wrong. Which is it?

The lineup can even affect the selection of relief pitchers. And haven't
you ever heard a manager say that what he really needs is a left-handed
power-hitter (or more speed in the lineup, etc.)? Why are these things
important to managers if the players in lineups don't interact?

> And even if we WERE to consider them, why does Paul believe that Carter
> has his stats inflated by Hernandez, Strawberry, and Foster when NONE
> of those three show any substantial increase in production over last
> year?  

WRONG. Strawberry is having a much better year than last year. Note that
Strawberry bats directly behind Carter, just as Mattingly bats directly
behind Henderson. See below.

> I suppose Paul believes Carter has a special dispensation: in
> moving from the Expos to the Mets, he gains by being surrounded by
> Keith, Darryl, and George, while those three do NOT gain from Gary's
> presence.  The fact is, the production of all four has remained about
> the same over the past two years, an argument AGAINST lineup effects.

Or an argument that Carter is about as productive as Hubie Brooks is.
Hubie Brooks is a very productive hitter, and is having a fine year
batting cleanup for Montreal. And I never said Carter didn't help the 
others. Don't put words in my mouth and then criticize me for saying them. 
But you are completely wrong about the production of all four Met players.

Strawberry is having a better years, and all the other three are down,
except for Carter's HR rate:
(these 1985 stats are as of 9/12; the 1985f stats are approximations to
what they would have at the end of the season if their surrent averages 
continue)

              BA     HR    RBI  
Strawberry:
     1984    .251    26    97
     1985    .282    23    66
(don't forget he missed 7 weeks injured; prorating his stats over that
time gives him about 34 HRs and 95+ RBI already)
     1985f   .282    27    77  (.282    38    113)

Hernandez:
     1984    .311    15    94
     1985    .291    10    79
     1985f   .291    12    92

Foster:
     1984    .269    24    86
     1985    .254    17    66
     1985f   .254    20    77

Carter:
     1984    .294    27   106
     1985    .281    26    77
     1985f   .281    30    90

franka@mmintl.UUCP (Frank Adams) (09/24/85)

In article <453@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>By the way, if you still doubt the existence of lineup dependency (which
>you undoubtedly still do) then answer the following question:
>
>    If there were no lineup interaction, then all managers would bat their
>    best hitter first, then their second-best, etc. to give them the
>    most opportunities to hit. [...]

You are mixing apples and oranges here.  Of course lineup order in this
sense matters: a walk followed by a home run is two runs, while a home
run followed by a walk is one run.  I thought (up to this point) that
the discussion was about whether players hit better depending on where
they bat in the order.  I don't *think* anyone ever claimed that OBA and
slugging pct give a complete description of a team's offensive abilities;
just that they are the two best readily available statistics.  Which they
are.

I am unconvinced by the Mattingly data.  There is just not enough there
to be statistically significant.

On the other hand, batters definitely DO hit better with men on base.
The book put out by the Elias Sports Bureau (it has their name in the
title) has statistics on this for the entire major leagues last year.
As I remember (the book is not here) the effect was about 20 points in
terms of batting average.  So clearly there is an advantage to batting
after a player who gets on base a lot.  Although the statistics for it
are not available, it seems likely that this is enhanced when batting
after good base stealers.

I am much more dubious about the claimed advantages of batting *before* a
good hitter.  This very likely affects the number of walks a player gets
(certainly the number of intentional walks, but probably others as well).
I doubt it much affects the overall performance.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

dpb@philabs.UUCP (Paul Benjamin) (09/25/85)

Frank Adams writes:

>In article <453@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>>By the way, if you still doubt the existence of lineup dependency (which
>>you undoubtedly still do) then answer the following question:

>>    If there were no lineup interaction, then all managers would bat their
>>    best hitter first, then their second-best, etc. to give them the
>>    most opportunities to hit. [...]
>
>You are mixing apples and oranges here.  Of course lineup order in this
>sense matters: a walk followed by a home run is two runs, while a home
>run followed by a walk is one run.  I thought (up to this point) that
>the discussion was about whether players hit better depending on where
>they bat in the order.  

Not just where in the order, but who is batting ahead of them and
behind them. And as you point out later in this posting, batters
do bat better when men are on base, so I'm not really mixing apples
and oranges at all - lineup order can affect personal stats.

>I don't *think* anyone ever claimed that OBA and
>slugging pct give a complete description of a team's offensive abilities;
>just that they are the two best readily available statistics.  Which they
>are.

You contradict yourself! You state that "of course lineup order matters
in this sense", and then state that personal stats such as OBA and
slugging are the best. How about a stat such as "how many runs you
contribute to", measured by runs you score, drive in, or help advance
the runners, or even better, how much better you are at that than
others batting in similar positions? These stats are virtually impossible
to compute from box scores, because so much information is lost, such as
if anyone was in scoring position when a player made an out, or whether
an out advanced a runner. In this sense, I can agree with you that
OBA and slugging may be the best available, but I'm saying that this
means that the available stats are not very good (we need some new
categories).

>I am unconvinced by the Mattingly data.  There is just not enough there
>to be statistically significant.

Of course. But I didn't say that this proved conclusively that all
players' stats are highly order-dependent. I just showed the existence
of stats that support the belief in lineup dependency. Again, just
because these stats are not often kept is not my fault.

It's the old "garbage in, garbage out" phenomenon. If you only input
personal stats into your model-generation process, then you will
produce only models which emphasize individual performances, and
of course, you will be able to find no evidence of interdependencies.
To be able to find interdependency, you must consider stats which
can reflect it.

>On the other hand, batters definitely DO hit better with men on base.
>The book put out by the Elias Sports Bureau (it has their name in the
>title) has statistics on this for the entire major leagues last year.
>As I remember (the book is not here) the effect was about 20 points in
>terms of batting average.  So clearly there is an advantage to batting
>after a player who gets on base a lot.  Although the statistics for it
>are not available, it seems likely that this is enhanced when batting
>after good base stealers.

Great. I'd love to see what the effects are on slugging, RBI, R, etc.

>I am much more dubious about the claimed advantages of batting *before* a
>good hitter.  This very likely affects the number of walks a player gets
>(certainly the number of intentional walks, but probably others as well).
>I doubt it much affects the overall performance.

Again, one case I cite is the Pirates of the late 70's. Nobody wanted
to pitch to Stargell with men on base, so, as you say, people in front
of him were rarely walked. But this means that they saw more fastballs,
and less nibbling around the corner of the plate. That gave good
fastball hitters, like Madlock, more fat pitches. Note that the
difference need be quite small to still produce a good effect. Over,
say 500 atbats, say about 3000 pitches a season, a hitter in such a
nice spot might get only 30 more fat pitches to hit (1%). This could
lead to several HRs, doubles, more RBI, more R, and higher OBA and
slugging.

Again, I have no printed stats for the Madlock case. It is based on
my personal observation at the time, which was that Madlock was 
put into the 6 spot when he was acquired, and became a steady .280
hitter. He was quoted at the time as saying he didn't care, as long
as the team won. When he was moved to 3 (in front of Stargell) he
immediately became the .320+ hitter he had been before.

This is not the only case I know of. Repeatedly, in reading quotes
of managers, I have run across things like "...he is a fastball hitter,
so I put him in front of (big slugger), so he'll see more fastballs".
Now, I haven't clipped and saved all these quotes, because I never
saw myself getting into an argument about it, but my memory is actually
quite good, and I'm sure that, if we keep our eyes open, we'll see
more quotes like this.

Finally (sigh!) note that if hitting can be affected by the player
in front, then it means that it can be affected by the player behind, too.
After all, if player A bats in front of player B, and B is known to
hit much better when men are on base, then the pitcher can be expected
to try very hard to keep A off the bases. Thus, B's presence affects
A's stats. This is as opposed to the
situation in which a strong hitter bats in front of a weaker hitter.
The pitcher might not care whether the strong one is walked, because
he is not afraid of the weak hitter, particularly if there are two out, 
so he avoids giving the strong hitter anything too good to hit.

franka@mmintl.UUCP (Frank Adams) (09/26/85)

In article <455@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>>You are mixing apples and oranges here.  Of course lineup order in this
>>sense matters: a walk followed by a home run is two runs, while a home
>>run followed by a walk is one run.  I thought (up to this point) that
>>the discussion was about whether players hit better depending on where
>>they bat in the order.  
>
>Not just where in the order, but who is batting ahead of them and
>behind them. And as you point out later in this posting, batters
>do bat better when men are on base, so I'm not really mixing apples
>and oranges at all - lineup order can affect personal stats.

Yes, lineup order can affect personal stats -- but the paragraph above
is NOT an argument to that effect.

>>I don't *think* anyone ever claimed that OBA and
>>slugging pct give a complete description of a team's offensive abilities;
>>just that they are the two best readily available statistics.  Which they
>>are.
>
>You contradict yourself! You state that "of course lineup order matters
>in this sense", and then state that personal stats such as OBA and
>slugging are the best. How about a stat such as "how many runs you
>contribute to", measured by runs you score, drive in, or help advance
>the runners, or even better, how much better you are at that than
>others batting in similar positions? These stats are virtually impossible
>to compute from box scores, because so much information is lost, such as
>if anyone was in scoring position when a player made an out, or whether
>an out advanced a runner. In this sense, I can agree with you that
>OBA and slugging may be the best available, but I'm saying that this
>means that the available stats are not very good (we need some new
>categories).

Well yes, that's what I said.  "The two best readily available statistics."
In particular, if I know a player's on base and slugging averages, I don't
much care what his batting average is.  In fact, it is better if the batting
average is lower, with the same on base and slugging averages.

Yes, we do need better stats.  I have my doubts about your proposal; it is
highly lineup dependent.  A batter will participate in more runs on a good
offensive team than on a poor one -- this is the main problem with RBIs.
(That assessment sounds harsher than I really mean it to be.  This would
be a useful statistic -- certainly better than "runs produced" or "game
winning RBI.  (I don't really understand why the statistic isn't "go ahead
RBI" instead -- the batter putting his team ahead cannot be affected by
whether they will stay ahead.)  But if I could only get one statistic about
a player, I would rather know the sum of his on base and slugging, than
the number of runs (total, per game, or per at bat, your choice) that he
contributed to.)

If you can get a copy, do look the Elias book (_The_1985_Elias_Baseball_
Analyst_).  It has batting and pitching statistics broken down by whether
the bases are empty, leading off an inning, with runners on base, with
runners in scoring position, and with runners in scoring position with
two out.  It also has statistics for batting in late inning pressure
situations (defined as the seventh inning or later, with the player's
team tied, behind by not more than three runs, or behind by four runs
with the bases loaded), broken down similarily.  It also has home vs
away, grass vs turf, and day vs night breakdowns.

Some other statistics I would like to see: how often does a runner take
an extra base on hit?  And how often is he out trying to do so?  Another
interesting statistic would be bases advanced out of the number possible
(a grand slam is ten out of ten; a bases empty single is one out of four).
The ratio of bases advanced to outs made would also be interesting.
None of these statistics is perfect, of course.

>>I am unconvinced by the Mattingly data.  There is just not enough there
>>to be statistically significant.
>
>Of course. But I didn't say that this proved conclusively that all
>players' stats are highly order-dependent. I just showed the existence
>of stats that support the belief in lineup dependency. Again, just
>because these stats are not often kept is not my fault.

Let me put that a bit differently.  While Mattingly undoubtably hits
better batting after Henderson (who has a very good on base percentage
and fantastic speed), it is unlikely that the effect is anywhere near
as large as in those sample statistics.  And whoever hits after Henderson
can expect an improvement.

>Again, one case I cite is the Pirates of the late 70's. Nobody wanted
>to pitch to Stargell with men on base, so, as you say, people in front
>of him were rarely walked. But this means that they saw more fastballs,
>and less nibbling around the corner of the plate. That gave good
>fastball hitters, like Madlock, more fat pitches. Note that the
>difference need be quite small to still produce a good effect. Over,
>say 500 atbats, say about 3000 pitches a season, a hitter in such a
>nice spot might get only 30 more fat pitches to hit (1%). This could
>lead to several HRs, doubles, more RBI, more R, and higher OBA and
>slugging.

The player will have more strikes thrown at him (which tends to mean
more fastballs).  Since he is getting more good pitches, he is likely
to hit for better average and power.  But he can be expected to walk
*less*, and thus have a lower on base average.  If the on base average
is truly higher in such a case, the opposing pitchers are making a
mistake -- they should be pitching to the batter the same as they
normally would.  I am unconvinced that batters do significantly better
on balance in such situations.

This is one reason the on base and slugging averages make such a good pair.
When a player is pitched to cautiously, the on base average goes up and
the slugging average goes down.  In the reverse case, the opposite happens.

>Again, I have no printed stats for the Madlock case. It is based on
>my personal observation at the time, which was that Madlock was 
>put into the 6 spot when he was acquired, and became a steady .280
>hitter. He was quoted at the time as saying he didn't care, as long
>as the team won. When he was moved to 3 (in front of Stargell) he
>immediately became the .320+ hitter he had been before.

Again, the batting average I would expect to be affected.  What happened
to his on base average?  It is certainly true that batting average is
overemphasized in the baseball world as a whole.  There are probably
a good many managers and players who make this mistake.  (Earl Weaver
doesn't.  Bill Madlock probably does.)

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

david@fisher.UUCP (David Rubin) (09/26/85)

[">>"," " = me,">" = (shudder) him :-)]

Don't panic, folks!  Only about half of it is new material!
 
>You disregard my explanation of the events, and have not proven, in any
>sense, that on-base average and slugging average are independent of
>factors such as who is batting in front or behind you.

I can demonstrate that OBA and SA together will predict very well the
run production of a team.  All I have demanded is the same standard of
evidence be applied to lineup effects: that you demonstrate that the
consideration of such an effect improve our ability to project or
predict run production, and that you provide some rationale for it.
You've done the latter without doing the former.

>................ You embrace those stats without showing
>that any strong correlation exists between them and scoring runs (or
>more precisely, that a stronger correlation exists than for, say, the
>stat R + RBI - HR.)

I can demonstrate such a correlation for OBA and SA; for your benefit,
I will post them (this weekend, probably).

As for R+RBI-HR, please note that it adds nothing to our understanding
of run production to predict runs using runs!!!  We have already noted
that R's and RBI's are heavily and DIRECTLY dependent on one's
teammates' actions: if the question we are considering is how does an
individual player contribute, we must free him from the
burden/benefit of his teammates and figure out how much each of the
events he could contribute ALONE (outs, walks, hits, etc.) would
contribute to producing runs on some "typical" team.  You, too,
recognize this principle, as it is the rationale for your argument for
"lineup effects".  The problem is not one of goals, but of methods.
"Lineup effects" are, I suggest,  illusions caused by using the wrong
statistics to evaluate offensive performance.  It is because you are
tied to measuring INDIIVIDUAL performance with tools meant to evaluate
TEAM performance that you must deal with something as archane and diffuse
as lineup effects; when one considers statistics that are not directly
influenced by one's teammates, one finds that there is no discernable
lineup effect.

In other words, if you took a player who remains with one team over
the course of his career's prime, you would likely find the player's
RBI and R totals fluctuating with the team's fortunes (strongly 
correlated), but his SA and OBA fluctuating "randomly" (weakly
correlated).  If you were to focus on the RBI's, you would persuade
yourself there was such a thing as "lineup effect", because RBI's
measure what the guys in front of you did as well as what you did.  If
you looked at SA, you would remain agnostic concerning lineup effects,
because SA appears to fluctuate with little regard to the quality of
the team.

>.......................................It is not just my responsibility to
>prove that lineup dependencies exist. It is also yours to prove that
>they don't!

You can never prove something doesn't exist (how does one proceed in a
disproof of existance?).  It is considered sensible in most circles to
keep one's explanation of events as simple as possible: we need only
consider new factors if they somehow improve our understanding of
events.  It is therefore the burden of one who wishes to include an
"effect" to show its inclusion improves our knowledge or understanding,
for if we can do as well without it, we have no reason to use it.

>> I shall limit myself, therefore, to the general comment (call it
>> Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE
>> EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT
>> HAS ACTUALLY OCCURRED.  

>Perhaps you should take a course or two in prob&stat and learn the
>actual laws, instead of making up your own.

Are you serious?  Taking shots at my statistical sophistication is
inappropriate (as well as incorrect), and serves only as an excuse
for ignoring the truth of the statement that I referred to,
tongue-in-cheek, as Rubin's Law of Empirics.  

>> All of Paul's explanations mean little,
>> therefore, until he establishes that what his explanations explain has
>> indeed happened!  Only in the case of Mattingly does he attempt to
>> actually demonstrate that a lineup effect exists, and I will therefore
>> concentrate on it.  Elsewhere, he merely shows lineup effects are
>> consistent with his selected observations without either showing other
>> explanations are inconsistent or that the observations would be
>> inexplicable without lineup effects. 

>This is exactly the point I have been making for weeks. "it may be
>misleading unless we know that the circumstances of the two categories 
>are otherwise similar..." Two players for different teams do not
>satisfy this criterion, and thus their stats are not directly
>comparable. For example, many, including myself, like Guerrero for
>the MVP, but I don't favor him because he leads the NL in slugging
>and on-base average. Those stats are irrelevant, since you can't compare
>them to, say, Dale Murphy's stats. Why not? It's simple. 18 times
>a year, Dale Murphy has to face the great Dodger pitching staff, which
>is clearly the best in the league, while Guerrero faces the Braves' staff,
>which is one of the worst. That's over 11% of the season. This is in
>addition to other differences, such as the number of day/night games,
>the different stadiums they play in, the number of double-headers they
>play in, the number of day games after night games, etc. They don't even
>play the exact same other teams, either! After all, if a team played
>most of its games against Philadelphia earlier in the year, they faced a
>much easier opponent than a team whose schedule calls for them to
>face Phila now. The reverse is true for the Cubs. Playing them
>before all their starters were injured is different than playing
>them afterwards.

Strange, but when I did try to adjust for these effects in the
Carter-Pena discussion, you protested vociferously!  I am all for
adjusting for effects whose existance is demonstrable, and thus had
called earlier for the inclusion of Palmer's "Park Factor", which
considered, explictly or implicitly, park dimensions, day/night
balance at the home field, and the quality of the hitter's own
pitching staff.  If it could be shown that some complex scheme to
correct for the changing quality of the opposition is necessary (most
teams remain about as talented in August as they were in May, and the
ones that don't may not have a substantial effect), I would certainly
entertain that correction.  At the time I first brought up the matter
of such adjustments, you held your hands up to your ears and screamed
that he didn't want to hear about such stuff; as those factors did not
strongly affect the relative offensive merits of Carter and Pena, I
didn't press the issue then.  Naturally, I'm stunned by your reversal;
stunned, but not surprised.

Incidentally, it is likely that Murphy derives more benefit from
Fulton County Stadium than Guerrero derives from not having to face
his own staff; adjusted statistics will likely favor Guerrero even
more than the unadjusted ones do!

>So, unless you can correct for ALL these factors, and others, to
>ensure that your circumstances are similar, all the analyses that
>you have posted are "worthless (possibly even worse: (they) may be
>misleading".

I will adjust for all the factors that can be demonstrated;
"adjusting" for a factor that has not been demonstrated (and therefore
cannot be quantified) is a theological exercise.  Rather than asking
ourselves how much a factor affects our statistics, we wind up asking
ourselves how much we BELIEVE a factor affects our statistics.

>The only attempt you have made to correct your stats is to include
>a ratio which takes into account the differences between stadiums,
>and how hard they are for hitters.

You did not read, then, how the "Park Factor" was derived.  Tsk, tsk.
It measured how difficult it was to produce runs in a particular park,
and therefore implicitly considered dimensions, elevation, day/night
games, etc, etc, and corrected for the prowess (or lack thereof) of
the home staff.

>................................But even this attempt showed your
>statistical inexperience. Saying, for example, that park A is 10% percent
>harder to hit in than park B because the overall averages (of say, slugging)
>are 10% lower, is a valuable and meaningful stat when applied to the whole
>group of hitters - it provides information on the park to its owners.
>But it is TOTALLY MEANINGLESS to apply this stat to individual batters
>in this park. One must also know the shape of the distribution. It could
>be that almost nobody hits 10% worse in that park - that many hit much worse
>or better, and it averages out to 10%. For example, if a country's families
>have 2.3 children on the average, it doesn't mean that anyone has 2.3
>children, or even that most families have 2 or 3 children. Bivariate
>distributions are not uncommon, and in these, almost noone is around
>the mean.

You are correct, but need not worry.  It is necessary to check that
the detriment/advantage supplied by a home park effects the players
equally (or that deviations from equality are random, rather than
systematic).  You will be pleased, therefore, to hear that such
deviations are binomially/normally distributed, and that where
individual players fall on these distributions appears random, and
that the distribution of ALL players is tighter once these effects are
taken out.

>Furthermore, the reason that I use only Mattingly is that these stats
>are rarely available. It's much easier to compute personal averages
>such as batting average, slugging average, runs, RBI, etc. than to
>compute how much a batter tends to improve the stats of those batting
>ahead of him or behind him, etc. We almost never see these stats. We
>don't often enough see stats such as batting average with runners in
>scoring position, etc. You criticize me for the deficiencies of baseball
>statisticians everywhere. It's not my fault, so don't criticize me
>for it.

I know it's not your fault.  I'd love to see such breakdowns myself;
supposedly, that's what Bill James's "Project Scoresheet" is in the
process of doing.  If this enlarged data base should provide someone
with the means to prove a "lineup effect", I will change my tune,
naturally.  However, it is difficult (and unwise) to believe a
dramatic effect could "hide" in currently available data; if there is
some "lineup effect", it likely to far smaller (and possibly even of a
far different nature) than you believe.

>> Moreover, even if Paul COULD assure us that this was so, he does not
>> have nearly enough data.  Examine, in particular, the data for batting
>> second: it is based on 35 games, i.e. about 100-150 at bats.  Most
>> fans will not put much store in a player's average after 35 games
>> (early May), and for good reason: the player has not yet accumulated
>> enough at bats for us to form any reasonable opinion as to his likely
>> seasonal productivity.  We are talking about guessing whether a player
>> is hitting .300 or .400 based on that many at bats: it would not be at
>> all unusual for the difference (10 to 15 hits) to be due to a "hot" or
>> "cold" streak (what Statisticians conveniently label "random", but we
>> may understand as being that which is beyond our knowledge).  We would
>> need to have many more at bats (perhaps in a couple of more seasons we
>> will) before we could say that the difference is due to the position
>> in the lineup rather than a propitious hot streak.  To put it another
>> way, if a lifetime .300 hitter were to have a .400 average on May 5th,
>> would you tentatively conclude (until further info was available) that
>> the man would bat .400 for the season?  Of course not.  You would
>> correctly conclude that he is more likely to hit .300 from June
>> through September than .400.  He may just have had a good April...

>..............................................Mattingly's hot
>stats for the second position were not compiled in one streak...
>....................................So, it is NOT the result of a hot streak. 

You misunderstand what I mean by a "hot" streak.  I mean any random
fluctuation caused by the smallness of the sample.  Perhaps I should
have clarified as follows: pick, at random, ANY 120 of Mattingly's
AB's.  Calculate his BA.  For all his AB's, his BA is about .320 (last
time I looked).  You will find, if you do this, say, 100 times, that
five to ten times you will get a .400+ BA for Mattingly.  In other
words, there's a greater than 5% chance that any particular .320
hitter will hit over .400 for 120 RANDOM at bats; if we check out,
say, 20 major leaguers who are batting between .300 and .340, and pick
120 at bats for each of them at random, we would only be surprised if
NONE of them hit over .400 during that span.  Since it is likely this
is what TSN did, intentionally or unintentionally, the statistic
carried nothing to contradict my inclination that the .400 average
meant nothing.

>As for right-handed vs. left-handed opposition, I checked the games from
>August 5 on. There were both right-handed and left-handed opponents.
>He is playing full-time in that spot, so he faces all types of pitching.
>Martin moved him to 2 on August 5 because of his excellent production
>in that spot before.

The question, though, is whether Mattingly faced lefty/righty pitching
in the same proportion as a #2 hitter as he did as a #3 hitter.
Moreover, I don't think he is REGULARLY batting #2, as every time I
watch the Yankees (about half a dozen times in the past month), he's
batting third.  Too bad, too: if he batted second the rest of the way,
we may have had enough data to say something about Mattingly in '85.

>> Even if it were established for Mattingly, it would hold only for
>> Don Mattingly with the current Yankees: to apply it to, say, Tony
>> Pena, it would have to be demonstrated for a wide variety of players on
>> a wide variety of teams.  Still, it would be quite a surprise to me if
>> anyone could get even that far.

>I see! Whenever I come up with evidence, it counts only for that
>case, but you have never detailed an instance of a player changing,
>say, his lineup position and keeping the same OBA and slugging pct.,

I could come up with a wide variety of such instances.  Shall I spend
an hour with my Baseball Encyclopedia?  To establish a general
principle, I won't require that you prove it in every instance, but I
don't feel I'd be unreasonable to remain dubious even if it were
proved for one.

>EVERYBODY knows that lineups are interdependent!

"Everybody" "knows" this, because "everybody" evaluates players on the
basis of RBI's and R's.  Yes, lineups are interdependent in scoring
runs.  No, lineups do not substantially effect INDIVIDUAL performance.
That Fred Xyzz is more likely to bat in a run with a runner on first
is what "everybody" DOES know; that Fred Xyzz is more likely to hit a
double with a runner on first is something that is not known by
"everybody"; certainly, it is not yet known by me.  

>.......................................................Try watching a game
>sometime (instead of just reading numbers). 

I watch, on TV or in person, about 100 games a season.  

>.................................................You'll see that when a runner
>is on base, it affects (among other things):
>    1) the way the pitcher throws. Using the stretch instead of a full
>       windup definitely hurts most pitchers' performances. Otherwise, 
>       there would be no need for anyone to ever windup.
>    2) the pitch selection;
>    3) the defensive alignment.

No doubt, but none of these things is done often enough to
substantially effect a player's OB or SA.  Let's say, for example,
that a player gets 500 AB's.  On a really lousy team, there's a runner
on when he bats, say (these are only guesses; if you have the real
numbers, go ahead and substitute, as I doubt that I am SO far off as
to invalidate my argument) 25% of the time, while with a really good 
team, it might be 50% of the time.  The lucky player gets an extra 125
AB's with runners on.  Consider #3;  this lucky player, if he's a
right-handed pull or straight-away hitter gets the secondbaseman in a
position where the second baseman is less likely to make the play.
Let's say he is a contact-hitter who NEVER strikesout, and he hits
lots of groundballs, with few down the line.  Then he might hit
a groundball toward the secondbaseman about 20% of the time, and the
secondbaseman may now convert only two thirds of them, rather than 
three quarters of them, into outs.  So we have 125*.2*(.75-.67) is an
extra four or so singles over the course of the season.  If the batter
in question strikes out some, or hits a lot of fly balls, than the
difference is even less.  Of course, with 125 extra shots at an RBI,
THAT total will rise substantially.

I could argue similarly on the other points.

My point is not that these things are fiction, only that it is
unlikely that they SUBSTANTIALLY affect a player's SA or OBA.  The
numbers I use are unimportant; what is important is the plausibility
of my argument that we ought to be careful not to confuse existance
with significance.

>By the way, if you still doubt the existence of lineup dependency (which
>you undoubtedly still do) then answer the following question:
>    If there were no lineup interaction, then all managers would bat their
>    best hitter first, then their second-best, etc. to give them the
>    most opportunities to hit. Thus, according to your criteria (OBA and
>    slugging pct), the way to optimize the team's OBA and slugging pct
>    is to bat the best in these categories first, the next-best second,
>    etc. We would see Carter batting leadoff for the Mets, and Coleman
>    would not be the leadoff hitter for St. Louis, McGee would be, followed
>    by Clark. Coleman would be somewhere around 6 or 7. Come to think of it,
>    since Cedeno has been playing for the Cards and the way he has been
>    hitting, he would be batting leadoff. Also, Guerrero would be
>    hitting leadoff for LA (absurd!).

There is lineup interaction on a team's run production; I only deny
its significance in judging individual performance.  Using my
criteria, a manager would be disposed to bat his top OBA men near the top
of the order and his top SA men in the middle.  I would not have
Carter bat leadoff (you are a silly one, aren't you?), but I would
drop Wilson from the top of the order.  I would also switch Coleman
and McGee around, but as long we were going with two table setters,
both would be secure near the top of the order.  You are confused: I
do NOT say that lineup doesn't affect a team's performance, only that
it has precious little effect on an individual's performance.

>    As ANY real baseball fan knows, managers carefully
>    pick the order to help run production, e.g. alternating left-handed
>    and right-handed batters, and putting speedsters in front of hitters
>    who hit well with men in scoring position. WHY WOULD THEY BOTHER TO 
>    DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly
>    leadoff, to get him more atbats? Maybe the fact that he would be
>    batting behind a much weaker hitter just MIGHT have a teeny-weeny
>    little bit to do with it?!

Nyahh.  The reason that we don't bat Mattingly lead-off is not that we
fear his production will drop, but because we fear his production will
be wasted.  There is a difference.

>    Thus, we see that some excellent managers, such as Whitey Herzog,
>    deliberately put a player like Coleman, who has a lower OBA and
>    slugging average than McGee, in the spot where he will get the most 
>    at-bats, thus effectively reducing the overall OBA and slugging pct of 
>    his team. Do you really think he is deliberately reducing the run-scoring
>    ability of his team? Or do you just think that all these baseball
>    professionals are sadly misguided?

I think Herzog is making a mistake.  Not a big one, but probably one
that will cost him a few runs over the course of the season.  Herzog
is not sadly misguided, just slightly in error.  Herzog makes
mistakes, Benjamin makes mistakes, even Rubin makes mistakes!  That we
HOPE that Herzog makes them less frequently is no guarantee of his
infallibility.  I vaguely recall Herzog being fired from a couple of
jobs.  Perhaps he did make mistakes...or do you believe that the
professionals running the Rangers and the Royals did??  Some of these
professionals must have erred if a firing was necessary.....Of course,
you will argue that Herzog knows so much, I cannot question him.  Thus
I ask you: if there thirty professional managers who, in a given
situation, would do ten different things, does that make most of them "sadly
misguided"?  Of course not; men of good faith can disagree without
calling one another idiots.  I reserve the phrase "sadly misguided"
for those who will not even examine alternatives.  Maybe Benjamin
would call me sadly misguided for batting McGee ahead of Coleman, but
I doubt that Herzog would do so.  As for team OBA/SA vs. individual
OBA/SA, see below.

>    The only other alternative is
>    that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH
>    TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them
>    up to be. There must be other factors, e.g., speed.

There are other factors.  They just don't provide many runs that are
not already accounted for by OB and SA.  Coleman has stolen 100+
bases, and has been caught, say, 30 times.  By the best estimate
available, Coleman's base stealing has given the Cardinals an extra
.3*(100-2*30)= 12 runs. (The forumula was empirically derived;  it is
how many runs an average team would gain if a player had 100SB, 40CS
rather than just wait on first for the next player to put the ball in
play.  How many runs he has meant to the Cards this year may be
somewhat higher (or lower), but we who do not have score sheets for
all Card games cannot otherwise make a better guess.  Of course,
anyone who knows how many CS Coleman has can improve matters by
substituting for my guess)

>    To rephrase this point, so that you will have less chance of
>    misinterpreting it, if Guerrero's slugging avg and OBA are what
>    are most important to the Dodgers, then he should bat leadoff,
>    so as to maximize the team's slugging avg and OBA. He doesn't,
>    and the very idea seems preposterous. Either Lasorda doesn't
>    understand the game as you do, or your emphasis on OBA and
>    slugging is wrong. Which is it?

Lasorda and I both agree that much of Guererro's SA will be wasted if
there are no men on base.  As Lasorda and I have found that it's far
easier to scrape someone up who has a decent OBA then it is to get
someone who has a good SA, we both place a greater premium on
Guererro's power.  Certainly, it is NOT true maximizing a team's OBA
and/or SA is the SAME as maximizing the teams run production, and I
have never said that it was.  I have suggested it's pretty darn close,
though.  The relationship between team OBA, SA, and run production is
close, but not exact.  It would cost the Dodgers some runs to bat
Guerrero lead-off, but not because Guerrero wouldn't be a good lead off
man.  You've merely shown that OBA, SA, and runs are not identical: 
another straw man bites the dust!

>The lineup can even affect the selection of relief pitchers. And haven't
>you ever heard a manager say that what he really needs is a left-handed
>power-hitter (or more speed in the lineup, etc.)? Why are these things
>important to managers if the players in lineups don't interact?

Again, you misunderstand what I am saying.  The new left-handed power
hitter may see big changes in his RBI totals, and his new team may see
a surge in runs scored, but the new player is unlikely to see any
substantial change in his OBA and SA, once those two are properly
adjusted.

>> I suppose Paul believes Carter has a special dispensation: in
>> moving from the Expos to the Mets, he gains by being surrounded by
>> Keith, Darryl, and George, while those three do NOT gain from Gary's
>> presence.  The fact is, the production of all four has remained about
>> the same over the past two years, an argument AGAINST lineup effects.

>Or an argument that Carter is about as productive as Hubie Brooks is.

Correct.  It says a lot about lineup effects if they indicate that
Carter is about the same hitter as Brooks is.  It says just how off
the wall they are...

Of course, I should have expected this.  Brooks is about as productive
a player as Pena, and so Paul must assert that Brooks is about on par
with Carter.  That is, of course, why the Mets were obliged to throw
in Youmans, Fitzgerald, and Winningham into a deal involving palyers
of equal value.  Well, Paul, if you're right, the Mets and Expos
managements must be mistaken about Carter's value vis a vis Brooks.
So you, too, find yourself in contradiction with baseball "authority".
Let us all savor this moment: it is as if the Pope were found guilty of
heresy!

>Strawberry is having a better years, and all the other three are down,
>except for Carter's HR rate:
>(these 1985 stats are as of 9/12; the 1985f stats are approximations to
>what they would have at the end of the season if their surrent averages 
>continue)
>              BA     HR    RBI  
>Strawberry:
>     1984    .251    26    97
>     1985    .282    23    66
>     1985f   .282    27    77  (.282    38    113)
>Hernandez:
>     1984    .311    15    94
>     1985    .291    10    79
>     1985f   .291    12    92
>Foster:
>     1984    .269    24    86
>     1985    .254    17    66
>     1985f   .254    20    77
>Carter:
>     1984    .294    27   106
>     1985    .281    26    77
>     1985f   .281    30    90

Read my lips:

I HAVE NEVER NEVER NEVER NEVER DENIED THAT LINEUPS EFFECT RBI'S!!!!!

To show an increase in RBI's shows the TEAM has had a better (or
worse) year, not that the player has had a better or worse year.  

Looking at the specifics, you'll find that Hernandez, Carter, and
Foster all show something of a drop-off from the last two years and
Strawberry shows a definite improvement (this is apparent when looking
at SA and OBA; BA and HR give us a glimpse of it).  You would argue
that Carter's introduction strengthened/weakened the Mets' lineup, but
this would lead to general rise/fall in INDIVIDUAL production.  There
is no such general rise/fall; as a GROUP, one would be hard pressed to
say the four were doing better or worse than last year.  What we do see is

	(1) Carter and Hernandez are having "typical" seasons.  Their
	    slight drop is due to the fact they both had outstanding
	    seasons the previous year.
	(2) Foster fell off a bit.  This is expected from 36 year olds.
	(3) Strawberry has improved.  He was expected to, with or
	    without Carter, and will likely further improve next year. 

Fact is, these are the kind of outputs we would have expected from all
four had Carter remained in Montreal.......

					David Rubin
			{allegra|astrovax|princeton}!fisher!david

P.S.  Remember, Paul, that I deny lineup effects only with regard to
SA and OBA, and that are argument is over INDIVIDUAL, not team,
production.  Repeat this to yourself five times before you write a
rebuttal.

P.P.S.  Paul, you also dropped a lot of smiley faces, e.g. when you
declared me to be statistically naive and understanding baseball as
well as a Martian.  Fortunately, I KNOW you didn't mean to insult, but
shouldn't you be more careful for the sake of others who are not as
intimately familiar with your tolerant nature?

P.P.P.S.  There is something called Linear Weights that does even
better with run production that OBA and SA; it includes things you
object to having left out, such as SB's.  It is a SLIGHT improvement,
while being a GREAT increase in complexity.  The increased complexity,
in my view, is too great to be justified by this slight improvement.
You may well think otherwise.

dpb@philabs.UUCP (Paul Benjamin) (09/27/85)

> >How about a stat such as "how many runs you
> >contribute to", measured by runs you score, drive in, or help advance
> >the runners, or even better, how much better you are at that than
> >others batting in similar positions? These stats are virtually impossible
> >to compute from box scores, because so much information is lost, such as
> >if anyone was in scoring position when a player made an out, or whether
> >an out advanced a runner. In this sense, I can agree with you that
> >OBA and slugging may be the best available, but I'm saying that this
> >means that the available stats are not very good (we need some new
> >categories).
> 
> Well yes, that's what I said.  "The two best readily available statistics."
> In particular, if I know a player's on base and slugging averages, I don't
> much care what his batting average is.  In fact, it is better if the batting
> average is lower, with the same on base and slugging averages.

Well, I think that is lineup dependent! Specifically, a cleanup hitter should
get a lot of hits - his OBA is not terribly important. He's supposed to
be driving in runs. A perfect example of this is Jason Thompson. His OBA
is among the best in the league, but his BA is low. I would much rather
see a higher BA, even at the cost of a lower OBA. He just doesn't drive
in runners. So who cares if we walks that much (he was leading the NL the
last time I saw the numbers) - that just passes the RBI duty along to #5,
and the Pirates haven't had a good #5 in a long time (George Hendrick??).
But I basically agree with you.

> Yes, we do need better stats.  I have my doubts about your proposal; it is
> highly lineup dependent.  A batter will participate in more runs on a good
> offensive team than on a poor one -- this is the main problem with RBIs.
> (That assessment sounds harsher than I really mean it to be.  This would
> be a useful statistic -- certainly better than "runs produced" or "game
> winning RBI.  (I don't really understand why the statistic isn't "go ahead
> RBI" instead -- the batter putting his team ahead cannot be affected by
> whether they will stay ahead.)  But if I could only get one statistic about
> a player, I would rather know the sum of his on base and slugging, than
> the number of runs (total, per game, or per at bat, your choice) that he
> contributed to.)

Specifically, what I had mentioned in previous postings, but had not repeated,
(the posting was long enough!) was the percentage of a team's runs that a
player figures in. I think this might be a very meaningful stat, particularly
where MVP awards are being discussed. This would parallel the hockey stat,
in which we can see, for instance, someone like Gretzky participating in
a very high percentage of his team's goals.
> 
> If you can get a copy, do look the Elias book (_The_1985_Elias_Baseball_
> Analyst_).  It has batting and pitching statistics broken down by whether
> the bases are empty, leading off an inning, with runners on base, with
> runners in scoring position, and with runners in scoring position with
> two out.  It also has statistics for batting in late inning pressure
> situations (defined as the seventh inning or later, with the player's
> team tied, behind by not more than three runs, or behind by four runs
> with the bases loaded), broken down similarily.  It also has home vs
> away, grass vs turf, and day vs night breakdowns.

Thanks for the suggestion.

> Some other statistics I would like to see: how often does a runner take
> an extra base on hit?  And how often is he out trying to do so?  Another
> interesting statistic would be bases advanced out of the number possible
> (a grand slam is ten out of ten; a bases empty single is one out of four).
> The ratio of bases advanced to outs made would also be interesting.
> None of these statistics is perfect, of course.

No, no stat is. But I like the way you think. The idea of bases advanced
seems nice to me. I am particularly thinking about people who get singles
with the bases loaded instead of solo HRs. I have always felt that they
got the short end of the stick statistically, particularly from people like
David Rubin, who ignore such things as R and RBI. The single with the
bases loaded contributes, say, 5 bases, and the solo HR contributes 4.
It doesn't seem quite fair to give the HR 4 bases and the single 1 (in
slugging avg), when the TIMING of the hit is all important. By the way,
when you read "solo HR" above, read "Gary Carter". Almost all of his HR's
are solo - thus his low RBI total (he doesn't get many RBI singles, either.)
In his recent HR binge, he hit 9 HR, with 15 RBI. Actually, he had 3 HR,
6 RBI in one game, and 6 HR, 9 RBI during the rest of the streak. If you
look around the league, you will find other players who go on RBI binges,
and do it without that many HRs. But they don't get the slugging average
boost that HRs give - Carter picked up about 45 points on his season average
in 2 games! (5 HRs)

> >>I am unconvinced by the Mattingly data.  There is just not enough there
> >>to be statistically significant.
> >
> >Of course. But I didn't say that this proved conclusively that all
> >players' stats are highly order-dependent. I just showed the existence
> >of stats that support the belief in lineup dependency. Again, just
> >because these stats are not often kept is not my fault.
> 
> Let me put that a bit differently.  While Mattingly undoubtably hits
> better batting after Henderson (who has a very good on base percentage
> and fantastic speed), it is unlikely that the effect is anywhere near
> as large as in those sample statistics.  And whoever hits after Henderson
> can expect an improvement.

I stated in that posting that the magnitude of the difference was undoubtedly
exceptional. I agree with you, that whoever hits after Henderson (Coleman,
etc.) can expect an improvement.

> This is one reason the on base and slugging averages make such a good pair.
> When a player is pitched to cautiously, the on base average goes up and
> the slugging average goes down.  In the reverse case, the opposite happens.

It's true that they complement each other well. But they are both terribly
inadequate to begin with, so who cares?

franka@mmintl.UUCP (Frank Adams) (10/01/85)

[Not food]

In article <458@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>> >How about a stat such as "how many runs you
>> >contribute to", measured by runs you score, drive in, or help advance
>> >the runners,

One question.  Suppose the lead off batter singles and steals seconds.
The next two batters make outs.  The cleanup hitter walks.  The number
five hitter singles, bringing in the runner from second.  Finally, the
number six hitter strikes out.  Shouldn't the cleanup hitter get credit
for "contributing" to the run?  If he had made an out, it wouldn't have
scored.

>> In particular, if I know a player's on base and slugging averages, I don't
>> much care what his batting average is.  In fact, it is better if the batting
>> average is lower, with the same on base and slugging averages.
>
>Well, I think that is lineup dependent! Specifically, a cleanup hitter should
>get a lot of hits - his OBA is not terribly important.

Yes, but if you fix the OBA and SA, and decrease the BA, he gets more extra
base hits.  This is likely to mean more RBI, not fewer.

>He's supposed to
>be driving in runs. A perfect example of this is Jason Thompson. His OBA
>is among the best in the league, but his BA is low. I would much rather
>see a higher BA, even at the cost of a lower OBA. He just doesn't drive
>in runners. So who cares if we walks that much (he was leading the NL the
>last time I saw the numbers) - that just passes the RBI duty along to #5,
>and the Pirates haven't had a good #5 in a long time (George Hendrick??).

If your number five hitter can't drive in runs, don't blame it on the
cleanup hitter.  And do you really want a higher BA with the same SA?
Also, I don't have the statistics handy, but I believe Thompson scores
a fair number of runs.  It doesn't matter whether they are scored the
way they are "supposed" to be.

>It's true that they complement each other well. But they are both terribly
>inadequate to begin with, so who cares?

This is where we disagree.  I would say "reasonable but not ideal", not
"terribly inadequate".

By the way, an interesting statistic from the Elias book: looking at all
teams in the majors, the most runs per inning and the greatest chance of
scoring in an inning occurs when the number 3 hitter leads off the inning.
This suggests that the "traditional" batting order may not be the best
after all.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

david@fisher.UUCP (David Rubin) (10/03/85)

Frank Adams has kept the issues clear.  I'd like to comment on one of
his contributions:

> On the other hand, batters definitely DO hit better with men on base.
> The book put out by the Elias Sports Bureau (it has their name in the
> title) has statistics on this for the entire major leagues last year.
> As I remember (the book is not here) the effect was about 20 points in
> terms of batting average.  So clearly there is an advantage to batting
> after a player who gets on base a lot.  Although the statistics for it
> are not available, it seems likely that this is enhanced when batting
> after good base stealers.

What this says is that if a player played on a team that had a runner
on every time he hit, he could expect to hit 20 points better than if
he never had a runner on.  Applying this to my rough guess that the
best teams have runners on about half the time, and the worst about a
quarter of the time, the advantage to be gained is no more than
20*(.5-.25) = 5 BA points.  Exactly what I mean when I suggest that
the difference is not something we ought to lose sleep over...

						David Rubin

dpb@philabs.UUCP (Paul Benjamin) (10/03/85)

Frank Adams writes:

> In article <458@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
> >> >How about a stat such as "how many runs you
> >> >contribute to", measured by runs you score, drive in, or help advance
> >> >the runners,
> 
> One question.  Suppose the lead off batter singles and steals seconds.
> The next two batters make outs.  The cleanup hitter walks.  The number
> five hitter singles, bringing in the runner from second.  Finally, the
> number six hitter strikes out.  Shouldn't the cleanup hitter get credit
> for "contributing" to the run?  If he had made an out, it wouldn't have
> scored.

But it's an awfully small contribution. It's more of a non-negative
act, not a positive one, so I can't see really giving him a positive stat.
But you are right in general - evaluating individual contributions and
separating them from team scoring is difficult.

>>> In particular, if I know a player's on base and slugging averages, I don't
>>> much care what his batting average is. In fact, it is better if the batting
>>> average is lower, with the same on base and slugging averages.
>>
>>Well, I think that is lineup dependent! Specifically, a cleanup hitter should
>>get a lot of hits - his OBA is not terribly important.
> 
> Yes, but if you fix the OBA and SA, and decrease the BA, he gets more extra
> base hits.  This is likely to mean more RBI, not fewer.
> 
Maybe. In some situations, yes, but sometimes, no. We are actually talking
about different things, though. I was making the statement that I'd
prefer to see a cleanup hitter have a higher BA at the expense of his OBA.

> >He's supposed to
> >be driving in runs. A perfect example of this is Jason Thompson. His OBA
> >is among the best in the league, but his BA is low. I would much rather
> >see a higher BA, even at the cost of a lower OBA. He just doesn't drive
> >in runners. So who cares if we walks that much (he was leading the NL the
> >last time I saw the numbers) - that just passes the RBI duty along to #5,
> >and the Pirates haven't had a good #5 in a long time (George Hendrick??).
> 
> If your number five hitter can't drive in runs, don't blame it on the
> cleanup hitter.  And do you really want a higher BA with the same SA?
> Also, I don't have the statistics handy, but I believe Thompson scores
> a fair number of runs.  It doesn't matter whether they are scored the
> way they are "supposed" to be.
> 
There's more to the Jason Thompson story than this, though. He is guilty
of failing to drive in runs in many situations where he has the opportunity.
He strikes out or pops up when the runners are in scoring position, and
gets his hits when the bases are empty. I am saying that he doesn't make
up for this annual lack of production by getting walks (he has a great
batting eye). He may have the great batting eye, and not swing at many
bad pitches, but he doesn't do too well with the good ones. The walks
may be crucial for 1 or 2 or 3 hitters, but you like to see your 4-5-6
hitters driving in the runs. He doesn't do this too often, certainly
not in correlation with his OBA+SA.

> >It's true that they complement each other well. But they are both terribly
> >inadequate to begin with, so who cares?
> 
> This is where we disagree.  I would say "reasonable but not ideal", not
> "terribly inadequate".
> 
> By the way, an interesting statistic from the Elias book: looking at all
> teams in the majors, the most runs per inning and the greatest chance of
> scoring in an inning occurs when the number 3 hitter leads off the inning.
> This suggests that the "traditional" batting order may not be the best
> after all.
> 
But if you move the 1 and 2 hitters to 7 and 8, say, and improve the
runs scored in the innings in which 3 leads off, you may lose runs in
the innings in which 1 or 2 lead off. The net may be worse than originally.
You have to be VERY careful of this kind of stat. For instance, I
remember reading the obvious stat that more runs score with 1 or 2 outs than
with no outs (obvious when you think about it). This shouldn't be
interpreted as meaning that, for example, a batter should intentionally
strike out with the bases loaded and no outs, to improve his team's
odds of scoring!

This is just another example of the dangers of trying to separate stats
from their context. Just because a stat can be computed (in this case,
the odds of scoring when a particular batting position leads off the inning)
and just because it has a correlation with team scoring, doesn't mean 
that it corresponds to anything in the real world.

Interpretation is everything with statistics. Unfortunately (or
fortunately, depending on your philosophical inclination) interpretation
is a subjective art. A person without any real baseball knowledge
might reasonably infer that the batter in the above situation should
intentionally strike out. It requires knowledge of the real world to
see why this stat occurred, and to understand the situation.

A similar case leaps to mind. Several years ago, someone published an
analysis of football in Sports Illustrated. I think his name might
have been Goode, or something. Anyway, he showed that the single stat
with the highest correlation to winning was the number of rushing
attempts per game. Thus, he concluded, the running game was the most
important aspect of football, and furthermore, it wasn't so much the
yardage gained, as the number of attempts that mattered. But given a
little knowledge about the real world of football, another interpretation
is easily possible: teams that already have a game wrapped up tend to 
run the clock out by running the football. They don't care at this
point about yardage, first downs, etc. This inflates the rushing
attempts, and could account for the high correlation with winning,
since teams who are losing won't resort to this strategy. But this
says nothing about how the winning teams got so far ahead. They might
not have done it with a strict running game. They might have mixed
things up a lot. So, the analyst should have recomputed his data,
ignoring what happened after a team had already built a good lead.
This may have led to different results.

Now, this is strongly reminiscent of attempts to come up with one stat, 
say OBA+SA, and correlate it with team runs (even though the correlation 
has not been mathematically shown yet.) This example also shows why I
insist on depending upon expert advice, rather than our own
interpretation of the stats. The baseball experts know much more
than we do, and can possibly give completely different interpretations
to the numbers. I am not a baseball expert, since I have never played
or coached professionally. Neither is anyone else on this net, to my
knowledge.

dpb@philabs.UUCP (Paul Benjamin) (10/09/85)

> Frank Adams has kept the issues clear.  I'd like to comment on one of
> his contributions:
> 
> > On the other hand, batters definitely DO hit better with men on base.
> > The book put out by the Elias Sports Bureau (it has their name in the
> > title) has statistics on this for the entire major leagues last year.
> > As I remember (the book is not here) the effect was about 20 points in
> > terms of batting average.  So clearly there is an advantage to batting
> > after a player who gets on base a lot.  Although the statistics for it
> > are not available, it seems likely that this is enhanced when batting
> > after good base stealers.
> 
> What this says is that if a player played on a team that had a runner
> on every time he hit, he could expect to hit 20 points better than if
> he never had a runner on.  Applying this to my rough guess that the
> best teams have runners on about half the time, and the worst about a
> quarter of the time, the advantage to be gained is no more than
> 20*(.5-.25) = 5 BA points.  Exactly what I mean when I suggest that
> the difference is not something we ought to lose sleep over...
> 
> 						David Rubin

Another case of bad reasoning. This may be the average over all players,
but certain players bat much more than 20 points better with men on
base. For examples, Boggs batted .418 this year with runners in scoring
position. This is about 45 points above his overall average, and about
55-60 points above his average with the bases empty. I certainly don't
lose sleep over this, but it is significant.

dday@gymble.UUCP (Dennis Doubleday) (10/15/85)

In article <472@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>> he never had a runner on.  Applying this to my rough guess that the
>> best teams have runners on about half the time, and the worst about a
>> quarter of the time, the advantage to be gained is no more than
>> 20*(.5-.25) = 5 BA points.  Exactly what I mean when I suggest that
>> the difference is not something we ought to lose sleep over...
>> 
>> 						David Rubin
>
>Another case of bad reasoning. This may be the average over all players,
>but certain players bat much more than 20 points better with men on
>base. For examples, Boggs batted .418 this year with runners in scoring
>position. This is about 45 points above his overall average, and about
>55-60 points above his average with the bases empty. I certainly don't
>lose sleep over this, but it is significant.

I hesitate to stick my nose into this (and I am not taking sides) but 
let me make one point about men on base.  If Wade Boggs did the above
(I don't question it), couldn't this say at least at much about the 
pitchers he faced in those situations as it does about him?  Boggs
is much more likely to come to the plate with men on base against, say,
Dennis Martinez than he is against, say, Ron Guidry.  The simple reason
is that *everybody* on the Red Sox is likely to get more hits (and thus
be on base more) against the inferior pitchers. And so Wade Boggs is
more likely to bat with men on against Dennis Martinez and with the 
bases empty against Ron Guidry.  This might go a long way toward
explaining the differential.

-- 

UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!dday    Dept. of Computer Science
CSNet:	dday@umcp-cs				 University of Maryland
ARPA:	dday@maryland				 College Park, MD 20742
						 (301) 454-4247

dpb@philabs.UUCP (Paul Benjamin) (10/16/85)

> In article <472@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
> >> he never had a runner on.  Applying this to my rough guess that the
> >> best teams have runners on about half the time, and the worst about a
> >> quarter of the time, the advantage to be gained is no more than
> >> 20*(.5-.25) = 5 BA points.  Exactly what I mean when I suggest that
> >> the difference is not something we ought to lose sleep over...
> >> 
> >> 						David Rubin
> >
> >Another case of bad reasoning. This may be the average over all players,
> >but certain players bat much more than 20 points better with men on
> >base. For examples, Boggs batted .418 this year with runners in scoring
> >position. This is about 45 points above his overall average, and about
> >55-60 points above his average with the bases empty. I certainly don't
> >lose sleep over this, but it is significant.
>  
> I hesitate to stick my nose into this (and I am not taking sides) but 
> let me make one point about men on base.  If Wade Boggs did the above
> (I don't question it), couldn't this say at least at much about the 
> pitchers he faced in those situations as it does about him?  Boggs
> is much more likely to come to the plate with men on base against, say,
> Dennis Martinez than he is against, say, Ron Guidry.  The simple reason
> is that *everybody* on the Red Sox is likely to get more hits (and thus
> be on base more) against the inferior pitchers. And so Wade Boggs is
> more likely to bat with men on against Dennis Martinez and with the 
> bases empty against Ron Guidry.  This might go a long way toward
> explaining the differential.
> 
> UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!dday    Dept. of Computer Science
> CSNet:	dday@umcp-cs				 University of Maryland
> ARPA:	dday@maryland				 College Park, MD 20742
> 						 (301) 454-4247

I agree completely. These stats, as well as others, are influenced by
the difference between starters and relievers. This has often been
noted to be one of the major differences between modern baseball and
that of previous eras - the emergence of the relief specialist. There
is no question that everybody tends to face different pitchers in hot
spots than during the rest of a game. But this could actually make it
harder to hit with runners in scoring position than it otherwise would
be, since managers usually try to bring in relievers who are best suited 
to face specific batters, e.g., lefties against lefties.

But then, your point about facing Guidry with the bases empty can be
rephrased as, "When a pitcher is doing very well, then batters will tend
to face him less often with runners in scoring position, so that the
situations with runners in scoring position will often be against starters
who are in trouble." This makes good sense to me.

So we have two opposing tendencies. The net is that (as someone posted)
hitters tend to bat about 20 points better with men on. This could well
reflect that the second tendency outweighs the first. However, this is
not directly relevant to what the original argument was about, since I
was stating that there are individuals who consistently perform above the
average with men on base, and there are those who consistently perform
worse, so that this factor cannot be dismissed by saying "the effect is
randomly distributed", or something else to that effect. The individual
differences between players' performances need to be taken into
consideration.