[net.sport.baseball] playoff slugging + onbase avg.

dpb@philabs.UUCP (Paul Benjamin) (10/23/85)

This is just a short (thank God!) note on team slugging and
on-base averages. The Cards did beat LA in those stats, as well
as on the field, but the opposite is true for KC vs. Tor, and
so far in the World Series.

         BA      SA     OBA     SA+OBA

KC     .225     .366   .294     .660
Tor    .269     .372   .319     .681

After two games of the Series, we have:

KC     .270     .393   .343     .736
StL    .203     .313   .282     .595

These are based on data in the NY Times of 10/22. By the way, KC has
outstolen StL 2 to 1 ! But there is another speed-related stat that
is VERY important - StL has turned 4 DPs, KC has turned none. That
sure does cancel that edge in OBA. It actually cancels the edge in
BA, which is reflected twice - in both OBA and SA, thus cancelling
the entire difference in SA+OBA ! (The two teams really have the same
number of extra bases (7) so that the difference in SA+OBA is just
about twice the difference in BA.)

Also note that KC is doing much better in the Series than in the ALCS,
according to this stat. In reality, they are scoring 1.5 runs per game,
as compared with 3.71 runs per game in the ALCS.

Doesn't look like team OBA+SA is so important, does it?

                                 Paul Benjamin

abgamble@water.UUCP (Bruce Gamble) (10/27/85)

> This is just a short (thank God!) note on team slugging and
> on-base averages. The Cards did beat LA in those stats, as well
> as on the field, but the opposite is true for KC vs. Tor, and
> so far in the World Series.
> 
>          BA      SA     OBA     SA+OBA
> 
> KC     .225     .366   .294     .660
> Tor    .269     .372   .319     .681

> 
> Doesn't look like team OBA+SA is so important, does it?
> 
>                                  Paul Benjamin

Two quick observations.

1)   The results of one seven game series are not going to convince
   the average person of anything. You may remember that in the 1960
   W.S. the Yankees outscored the Pirates by about 30 or so runs,
   yet Pittsburgh won it in seven games. By your reasoning we could
   conclude that scoring runs isn't so important.

2)   I don't believe that anyone has suggested that we should actually
   look at the sum of OBA and SA (someone please correct me if I'm
   wrong). I was under the impression that OBA+SA was intended to
   mean "OBA and SA", not "OBA plus SA". Combining the two into one
   number loses much of the information that they contain.
     If anyone insists, however, on combining them into one number, it
   would make a lot more sense to multiply them rather than add them.
   This would give a more accurate measure of a player's offensive
   value.
-- 

                          - Bruce Gamble  (abgamble@water.UUCP)

dpb@philabs.UUCP (Paul Benjamin) (10/29/85)

> > This is just a short (thank God!) note on team slugging and
> > on-base averages. The Cards did beat LA in those stats, as well
> > as on the field, but the opposite is true for KC vs. Tor, and
> > so far in the World Series.
> > 
> >          BA      SA     OBA     SA+OBA
> > 
> > KC     .225     .366   .294     .660
> > Tor    .269     .372   .319     .681
> 
> > 
> > Doesn't look like team OBA+SA is so important, does it?
> > 
> >                                  Paul Benjamin
> 
> Two quick observations.
> 
> 1)   The results of one seven game series are not going to convince
>    the average person of anything. You may remember that in the 1960
>    W.S. the Yankees outscored the Pirates by about 30 or so runs,
>    yet Pittsburgh won it in seven games. By your reasoning we could
>    conclude that scoring runs isn't so important.

Actually, I would agree that just the total of runs is not important.
What counts is when they are scored. The NY-Pitt series you mention is
the most extreme example of this. But it is definitely true that the
gross total (or differential) of runs is not a strong indicator of 
winning games. The same holds in other sports, such as tennis, where it
is often the case that the winner has won fewer games, but won more sets.
This is particularly true between strong players.

So, no we cannot conclude by my reasoning that scoring runs is not
important, because it is awfully hard to win without scoring runs! But
we can conclude that just scoring more runs over the course of the series
is not important. A 12-1 win is no more important than a 2-1 win.

Also note that you omit the stats that I consider most important. The
whole point of my posting is that run scoring is not at all dependent
on SA+OBA. After all, KC was actually doing better in SA+OBA in the series 
than against Toronto, but scoring fewer runs. And this is not just the
results of one seven-game series. You mention another, so that makes
two. It may be the case that there are a number of others (just find
a series in which the losing team had a lopsided win.)

> 2)   I don't believe that anyone has suggested that we should actually
>    look at the sum of OBA and SA (someone please correct me if I'm
>    wrong). I was under the impression that OBA+SA was intended to
>    mean "OBA and SA", not "OBA plus SA". Combining the two into one
>    number loses much of the information that they contain.
>      If anyone insists, however, on combining them into one number, it
>    would make a lot more sense to multiply them rather than add them.
>    This would give a more accurate measure of a player's offensive
>    value.
>                           - Bruce Gamble  (abgamble@water.UUCP)

No. The stat espoused by David Rubin is OBA plus SA. This is the
stat printed in the NY Times on occasion. 

				Paul Benjamin

P.S. There's a typo in my stats. The Tor total is .691, not .681.

franka@mmintl.UUCP (Frank Adams) (11/02/85)

In article <489@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>> > This is just a short (thank God!) note on team slugging and
>> > on-base averages. The Cards did beat LA in those stats, as well
>> > as on the field, but the opposite is true for KC vs. Tor, and
>> > so far in the World Series.
>> > 
>> >          BA      SA     OBA     SA+OBA
>> > 
>> > KC     .225     .366   .294     .660
>> > Tor    .269     .372   .319     .681
>> 
>> > 
>> > Doesn't look like team OBA+SA is so important, does it?
>> > 
>> >                                  Paul Benjamin
>> 
>> Two quick observations.
>> 
>> 1)   The results of one seven game series are not going to convince
>>    the average person of anything. You may remember that in the 1960
>>    W.S. the Yankees outscored the Pirates by about 30 or so runs,
>>    yet Pittsburgh won it in seven games. By your reasoning we could
>>    conclude that scoring runs isn't so important.
>
>Actually, I would agree that just the total of runs is not important.
>What counts is when they are scored. The NY-Pitt series you mention is
>the most extreme example of this. But it is definitely true that the
>gross total (or differential) of runs is not a strong indicator of 
>winning games. The same holds in other sports, such as tennis, where it
>is often the case that the winner has won fewer games, but won more sets.
>This is particularly true between strong players.

The relation between statistics like SA+OBA and scoring runs is pretty
much like that between scoring runs and winning games.  I think we are
getting close to the meat of the argument here.

Let's look at the 60 series here.  The question, I believe, is the following:
(to put it baldly) was the outcome the result of luck or skill?  That is,
which of the following descriptions of the series is more accurate:

(1) The Pirates proved themselves the better team by their ability to score
    runs in clutch situations.  Although the Yankees were better at getting
    men to cross the plate, Pittsburgh got them when they needed them.

(2) Although it is hard to tell from such a short series, the Yankees
    dominance in all statistical departments makes it seem quite likely
    that they have the better team.  However, the Pirates were fortunate
    enough to win all the close ones and only lose blowouts, and so won
    the series.

Let me paint the second picture in more extreme terms.  From this point of
view, all that matters in each at bat is the talents of the pitcher, the
batter, and the fielders.  Everything else is randomness.  Sometimes the
batter gets lucky, sometimes the pitcher does.  (I have left base stealing
out for simplicity; this point of view would hold that the chances of the
runner stealing depend on the runner and the relevant fielders, and that
whether the runner attempts a steal or not, and whether he is successful
or not, does not affect the batter.)

A similarly extreme picture of the first option holds that nothing is random.
Everything happens as it must happen, given the players and the situation
they are involved in.

Now, I think it is obvious that neither of these extremes is correct.  But
I think that number 2 is closer to the truth than number 1.

The main argument for randomness is that it suffices to explain the kind of
effects that are being talked about.  Some fraction of the time, one team
will score more runs than the other, yet lose the series.  Some fraction of
the time, a team will get more men on base and slug better, yet score fewer
runs.  These things don't just happen occasionally, either; they are fairly
common JUST ON THE ASSUMPTION OF RANDOMNESS.

Now, in principle, non-randomness could either increase or decrease the
frequency of such events.  But all the kinds of non-randomness I have seen
proposed (some players or teams perform better in certain kinds of
situations) will in fact increase this frequency.  So in principle, a
statistical analysis should be able to determine to what extent such
factors are present.

But a fairly large sample is required for such a study.  All the World Series
played are not nearly enough for a study of runs scored vs winning the series
to be statistically significant.  That *might* be enough for a study of
OBA+SA vs. runs scored, but it might not.  (The effective sample size is
higher in the latter case, being approximately the number of games played,
whereas in the latter, it is the number of Series played.) (It is not at all
clear what number of runs per game to predict from a given OBA+SA; the
prediction of wins from runs is simpler, but also non-trivial.  For hockey,
the last calculation is fairly easy, but runs in baseball are not always scored
one at a time.)  LOOKING JUST AT THREE OR FOUR SERIES IS COMPLETELY
MEANINGLESS.

A complete solution to the expected number of runs scored given certain
probabilities for each event involves solving a system of 24 simultaneous
equations (8 possible states of having runners on base times 3 possible
numbers of outs).  Doing so requires some numbers not generally available,
such as the chance that a runner will advance from first to third on a
single.

*** begin digression ***

I did this a few years ago, using typical major league numbers
for available statistics, and guessing at those that were unavailable.
By taking derivatives, one can get estimates of the values of each possible
result in the context of a typical offense.

The raw results of this computation are not currently available to me.  I do
remember some scaled and rounded results, which are as follows:

Walk:    8
Single: 10
Double: 14
Triple: 17
Homer:  22
DP ball:-1
Out:     0

(A DP ball is a ball which will result in a DP if there is a runner on first
and zero or one out.  Otherwise it is a ground out.  An "Out" is an out which
does not change the positions of base runners.)

By scaled, I mean that if you take the frequency with which a batter does
each of these things (as well as others, e.g., hit a possible sacrifice fly)
times the factor above, multiply by an appropriate constant, and add (actually
subtract) an appropriate constant, you get an estimate of how many runs that
batter will produce per game.

Note that this is approximated fairly well by 2*SA+3*OBA (with scaling),
except that walks are underestimated thereby.  A better approximation is
2*SA+4*OBA-BA.

*** end digression ***

This method could be expanded on a bit to compute standard deviations in
number of expected runs for an offense, as well as the means.  It would be
interesting to see such a study done for the entire history of the World
Series, comparing expected and actual runs scored.

Of course, this calculation is still not fully what the "randomness" theory
predicts, since it assumes each player has the same chance of producing
each result.  A more accurate calculation would have 216 equations (24*9),
for each situation and each hitter.  This still pretends all pitchers are
the same, and ignores pinch-hitting, platooning, and other lineup changes.
It also ignores the different stealing abilities of different runners.

---------------------

There are two established variances from the randomness theory.  One is
that left-handed batters hit better against right-handed pitchers, and
right-handed batters hit better against left-handed pitchers.  Another is
that batters hit better with runners on base.  The former effect is fairly
significant, and seems to be different for different players.  (So that it
would be more accurate to talk about a player's hitting or pitching ability
vs. lefties and vs. righties seperately, rather than together.)  The
latter is comparable in size, perhaps a bit smaller.  I do not believe it
has been established that the effect depends on the individuals, or how
large that effect might be.

It is not yet well established that some players are better in the clutch,
but based on the Elias data, it appears that this is the case.  The size of
this effect appears to be about .020 to .040 points, measured in terms of
batting average, for the most extreme players.  It would take half a dozen
such players to significantly affect a teams winning probabilities.

This ran on much longer than I intended for it to.  Thank you to those of
you who read it all.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

dpb@philabs.UUCP (Paul Benjamin) (11/04/85)

> In article <489@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
> >> > This is just a short (thank God!) note on team slugging and
> >> > on-base averages. The Cards did beat LA in those stats, as well
> >> > as on the field, but the opposite is true for KC vs. Tor, and
> >> > so far in the World Series.
> >> > 
> >> >          BA      SA     OBA     SA+OBA
> >> > 
> >> > KC     .225     .366   .294     .660
> >> > Tor    .269     .372   .319     .681
> >> 
> >> > 
> >> > Doesn't look like team OBA+SA is so important, does it?
> >> > 
> >> >                                  Paul Benjamin
> >> 
> >> Two quick observations.
> >> 
> >> 1)   The results of one seven game series are not going to convince
> >>    the average person of anything. You may remember that in the 1960
> >>    W.S. the Yankees outscored the Pirates by about 30 or so runs,
> >>    yet Pittsburgh won it in seven games. By your reasoning we could
> >>    conclude that scoring runs isn't so important.
> >
> >Actually, I would agree that just the total of runs is not important.
> >What counts is when they are scored. The NY-Pitt series you mention is
> >the most extreme example of this. But it is definitely true that the
> >gross total (or differential) of runs is not a strong indicator of 
> >winning games. The same holds in other sports, such as tennis, where it
> >is often the case that the winner has won fewer games, but won more sets.
> >This is particularly true between strong players.
> 
> The relation between statistics like SA+OBA and scoring runs is pretty
> much like that between scoring runs and winning games.  I think we are
> getting close to the meat of the argument here.
> 
> Let's look at the 60 series here.  The question, I believe, is the following:
> (to put it baldly) was the outcome the result of luck or skill?  That is,
> which of the following descriptions of the series is more accurate:
> 
> (1) The Pirates proved themselves the better team by their ability to score
>     runs in clutch situations.  Although the Yankees were better at getting
>     men to cross the plate, Pittsburgh got them when they needed them.
> 
> (2) Although it is hard to tell from such a short series, the Yankees
>     dominance in all statistical departments makes it seem quite likely
>     that they have the better team.  However, the Pirates were fortunate
>     enough to win all the close ones and only lose blowouts, and so won
>     the series.
> 
		...
> 
> Now, I think it is obvious that neither of these extremes is correct.  But
> I think that number 2 is closer to the truth than number 1.
> 
		...
>
> LOOKING JUST AT THREE OR FOUR SERIES IS COMPLETELY
> MEANINGLESS.
> 

I disagree completely. just consider the numbers posted by Dave Van Handel:

"Also, regarding the (SA+OB) argument, I looked it up for all World Series
from 1940-1981 (the year of my Baseball Encyclopedia).  The results follow:

1940's :  7-3
1950's :  7-3
1960's :  5-5
1970's :  5-5
80 & 81:  0-2
-------------
42 years 24-18

The team with the greater (SA+OB) has won 24/42 of the series.  I was
very surprised that it wasn't 30 or 35/42.

It *appears* that since the return of stolen bases and the advent of
relief pitchers, (SA+OB) is no longer a good indicator of winning.
The verdict is still out on whether or not it is a good indicator of
run production.

Dave Van Handel"

In recent years (last 20+), SA+OBA has been totally independent of winning. So
much for the luck theory. This considers much more than 3 or 4 series. Also
note the circular nature of your argument #2. You state that the Yankees
dominated in all statistical departments. This applies only to those stats
in which the Yankees dominated! They did not dominate in such stats as
hitting with men in scoring position with the score tied. When you realize
this, then you see the circular nature of these arguments with statistics.
If you compute only certain statistics, then you can always explain away
contrary results as "luck". But you can always compute new stats that
match the results perfectly, i.e., you can retrofit the stats to the data.
This points out the futility of statistical arguments.

The only thing we can say with certainty is that SA+OBA clearly does not
correlate with winning a short series in the last 20 or so years (since
artificial turf, night baseball, etc.). This casts doubt on its importance
in evaluating players and their contributions to their teams.

					Paul Benjamin

franka@mmintl.UUCP (Frank Adams) (11/05/85)

In article <495@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>consider the numbers posted by Dave Van Handel:
>
>"Also, regarding the (SA+OB) argument, I looked it up for all World Series
>from 1940-1981 (the year of my Baseball Encyclopedia).  The results follow:
>
>>1940's :  7-3
>>1950's :  7-3
>>1960's :  5-5
>>1970's :  5-5
>>80 & 81:  0-2
>>-------------
>>42 years 24-18
>>
>
>In recent years (last 20+), SA+OBA has been totally independent of winning. So
>much for the luck theory. This considers much more than 3 or 4 series.

Did you read the rest of what I wrote?  42 series aren't statistically
significant either.  It takes hundreds.

>Also
>note the circular nature of your argument #2. You state that the Yankees
>dominated in all statistical departments. This applies only to those stats
>in which the Yankees dominated! They did not dominate in such stats as
>hitting with men in scoring position with the score tied. When you realize
>this, then you see the circular nature of these arguments with statistics.
>If you compute only certain statistics, then you can always explain away
>contrary results as "luck". But you can always compute new stats that
>match the results perfectly, i.e., you can retrofit the stats to the data.
>This points out the futility of statistical arguments.

But batting average, slugging average, on base average, earned run average,
and runs scored weren't retrofitted to the data.  These are standard
statistics which are generally applied.  Since the measures are pre-selected,
the argument is not circular.

I tried to give some suggestions about how one could actually measure luck
vs. clutch hitting and related factors.  Until such measures are actually
made, we can only speculate.  Arguments from insufficient data can only
confuse the issue.

As to luck, do you really think that if the Yankees and Pirates in 1960 had
taken a few days off, then played another series just as important as the
first, that the Yankees could be expected to outscore the Pirates but lose
the series?

>The only thing we can say with certainty is that SA+OBA clearly does not
>correlate with winning a short series in the last 20 or so years (since
>artificial turf, night baseball, etc.).

The only thing we can say with certainty is that we don't know.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

dpb@philabs.UUCP (Paul Benjamin) (11/07/85)

> 
> In article <495@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
> >consider the numbers posted by Dave Van Handel:
> >
> >"Also, regarding the (SA+OB) argument, I looked it up for all World Series
> >from 1940-1981 (the year of my Baseball Encyclopedia).  The results follow:
> >
> >>1940's :  7-3
> >>1950's :  7-3
> >>1960's :  5-5
> >>1970's :  5-5
> >>80 & 81:  0-2
> >>-------------
> >>42 years 24-18
> >>
> >
> >In recent years (last 20+), SA+OBA has been totally independent of winning. So
> >much for the luck theory. This considers much more than 3 or 4 series.
> 
> Did you read the rest of what I wrote?  42 series aren't statistically
> significant either.  It takes hundreds.

42 Series aren't significant?!?! That's over an entire season's worth of
games! Perhaps you should look up the definition of statistically
significant. If we ignore these stats, we might as well ignore all season
stats.

> >Also
> >note the circular nature of your argument #2. You state that the Yankees
> >dominated in all statistical departments. This applies only to those stats
> >in which the Yankees dominated! They did not dominate in such stats as
> >hitting with men in scoring position with the score tied. When you realize
> >this, then you see the circular nature of these arguments with statistics.
> >If you compute only certain statistics, then you can always explain away
> >contrary results as "luck". But you can always compute new stats that
> >match the results perfectly, i.e., you can retrofit the stats to the data.
> >This points out the futility of statistical arguments.
> 
> But batting average, slugging average, on base average, earned run average,
> and runs scored weren't retrofitted to the data.  These are standard
> statistics which are generally applied.  Since the measures are pre-selected,
> the argument is not circular.

Think again. They dominated only in the stats in which they dominated. Also
please note that those "standard" stats are highly redundant - they all
are different ways of saying similar things. For example, team runs and the
opposing team's ERA are very similar. And note that there are stats in which
the Pirates led, such as game-winning RBI.

Also realize that these statistical categories were not handed down by
God. They arose because they were retrofitted at one time to previous data.
Thus, they were never pre-selected. BA and ERA did not exist before
baseball!

> >The only thing we can say with certainty is that SA+OBA clearly does not
> >correlate with winning a short series in the last 20 or so years (since
> >artificial turf, night baseball, etc.).
> 
> The only thing we can say with certainty is that we don't know.

No. We DO know that SA+OBA does not correlate with winning a short series
in the last 20 years or so, which is EXACTLY what I said. Thus, what
evidence there is does NOT support the position that SA+OBA is a
great statistic. Note that I have been very careful here. I have NOT
said that SA+OBA has been proven to be bad, nor that it may not be a
useful stat at times. All I said in the original posting is that the
evidence does not support those who worship this stat. If you have been
following the discussion over this stat, then you know that some people
feel it is the best stat available. Showing that the evidence does not
support this in series play is sufficient to discredit this.

					Paul Benjamin

bob@pedsgd.UUCP (Robert A. Weiler) (11/11/85)

Organization : Perkin-Elmer DSG, Tinton Falls NJ
Keywords: 

In article <778@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
{the continuing saga of OBA + SA versus gut feeling in predicting runs scored}
>
>Did you read the rest of what I wrote?  42 series aren't statistically
>significant either.  It takes hundreds.
>
{ bunch of stuff deleted }
>
{ >> = P. Benjamin ? }
>>The only thing we can say with certainty is that SA+OBA clearly does not
>>correlate with winning a short series in the last 20 or so years (since
>>artificial turf, night baseball, etc.).
>
>The only thing we can say with certainty is that we don't know.
>
>Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
>Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

I have to agree with Frank on this, there just isnt enough data to draw any
conclusions. However, I think we can speculate that SA+OBA is irrelevant
in modern series play due to current pitching practice. It seems logical
that in an era when starting pitchers worked every 3 days and usually went
the distance that the team with the better offensive statistics was probably
the winner. The deviation in pitching quality was small. In modern baseball
the situation is quite different. It is entirely possible that a team with
2 excellent starters and an otherwise mediocre pitching staff could win
4 low scoring games and get blown out in the other 3.

None of this has any bearing on whether OBA + SA is a good statistic for
rating an individuals contribution to his team. In fact, when I looked
at the final statistics for the year, OBA +  SA DID correlate very
strongly with my gut feeling as to how important individuals were to their
team.

As an aside to Paul B. who wondered some months ago why Hubie Brooks had
about the same number of RBI's as G. Carter despite a 10% worse team
batting average, Brooks also had about 10% more at bats. In addition,
Montreal has perhaps the best lead-off man in the NL, Tim Raines, who
along with stealing 50 bases has, surprise, one of the best OBA's in
the league. So what we see in this case is exactly what D. Rubin has
claimed; RBI's are influenced by lineup effects, OBA and SA are not.

Just trying to generate some heat for the winter.

Bob Weiler

franka@mmintl.UUCP (Frank Adams) (11/15/85)

In article <500@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>> >"Also, regarding the (SA+OB) argument, I looked it up for all World Series
>> >from 1940-1981 (the year of my Baseball Encyclopedia).  The results follow:
>> >
>> >>1940's :  7-3
>> >>1950's :  7-3
>> >>1960's :  5-5
>> >>1970's :  5-5
>> >>80 & 81:  0-2
>> >>-------------
>> >>42 years 24-18
>>
>> Did you read the rest of what I wrote?  42 series aren't statistically
>> significant either.  It takes hundreds.
>
>42 Series aren't significant?!?! That's over an entire season's worth of
>games! Perhaps you should look up the definition of statistically
>significant. If we ignore these stats, we might as well ignore all season
>stats.

If you are looking only at who wins the series, you only have 42 cases.  If
you want the results to reflect the number of games, you have to have the
statistics by game, not by series.

Also, the statistics for the 42 series *do* tend to support the importance
of the statistic.  Not as strongly as I would have expected, but well
within the normal range of variation.  If the expected number is 30 out
of 42, the standard deviation is about 2.9.  Thus 24 is not much more than
two standard deviations away.  About a one in twenty shot.

As for season stats, most of the variation in a player's batting average
from season to season is explainable by statistical fluctuation.

>> >Also
>> >note the circular nature of your argument #2. You state that the Yankees
>> >dominated in all statistical departments. This applies only to those stats
>> >in which the Yankees dominated!
>> But batting average, slugging average, on base average, earned run average,
>> and runs scored weren't retrofitted to the data.  These are standard
>> statistics which are generally applied.  Since the measures are pre-
>> selected, the argument is not circular.
>
>Think again. They dominated only in the stats in which they dominated. Also
>please note that those "standard" stats are highly redundant - they all
>are different ways of saying similar things. For example, team runs and the
>opposing team's ERA are very similar. And note that there are stats in which
>the Pirates led, such as game-winning RBI.
>
>Also realize that these statistical categories were not handed down by
>God. They arose because they were retrofitted at one time to previous data.
>Thus, they were never pre-selected. BA and ERA did not exist before
>baseball!

They were pre-selected *for that series*.  That is, they were the established
criteria by which the play in the series would be judged, when it was played.
Game winning RBI, by contrast, is a retro-fit for that series.  (It also
bears such a trivial relationship to winning that one can hardly regard it
as a *predictor* of victory.  Any more than pitcher's win/loss records are.)

>> >The only thing we can say with certainty is that SA+OBA clearly does not
>> >correlate with winning a short series in the last 20 or so years (since
>> >artificial turf, night baseball, etc.).
>> 
>> The only thing we can say with certainty is that we don't know.
>
>No. We DO know that SA+OBA does not correlate with winning a short series
>in the last 20 years or so, which is EXACTLY what I said.

But that data is not statistically significant, so we don't know; which is
EXACTLY what I said.  (By the way, night baseball goes back to the 30's.)

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

dpb@philabs.UUCP (Paul Benjamin) (11/15/85)

> >> 1940's :  7-3
> >> 1950's :  7-3
> >> 1960's :  5-5
> >> 1970's :  5-5
> >> 80 & 81:  0-2
> >> -------------
> >> 42 years 24-18
> >>
> >42 Series aren't significant?!?! That's over an entire season's worth of
> >games! Perhaps you should look up the definition of statistically
> >significant. If we ignore these stats, we might as well ignore all season
> >stats.
> 
> If you are looking only at who wins the series, you only have 42 cases.  If
> you want the results to reflect the number of games, you have to have the
> statistics by game, not by series.
> 
> Also, the statistics for the 42 series *do* tend to support the importance
> of the statistic.  Not as strongly as I would have expected, but well
> within the normal range of variation.  If the expected number is 30 out
> of 42, the standard deviation is about 2.9.  Thus 24 is not much more than
> two standard deviations away.  About a one in twenty shot.

But it's closer to 21 out of 42 than to 30 out of 42. If you really like
statistical arguments, how can you prefer an expectation of 30/42 to 21/42 
unless you are previously biased?

> >> >Also
> >> >note the circular nature of your argument #2. You state that the Yankees
> >> >dominated in all statistical departments. This applies only to those stats
> >> >in which the Yankees dominated!
> >> But batting average, slugging average, on base average, earned run average,
> >> and runs scored weren't retrofitted to the data.  These are standard
> >> statistics which are generally applied.  Since the measures are pre-
> >> selected, the argument is not circular.
> >
> >Think again. They dominated only in the stats in which they dominated. Also
> >please note that those "standard" stats are highly redundant - they all
> >are different ways of saying similar things. For example, team runs and the
> >opposing team's ERA are very similar. And note that there are stats in which
> >the Pirates led, such as game-winning RBI.
> >
> >Also realize that these statistical categories were not handed down by
> >God. They arose because they were retrofitted at one time to previous data.
> >Thus, they were never pre-selected. BA and ERA did not exist before
> >baseball!
> 
> They were pre-selected *for that series*.  That is, they were the established
> criteria by which the play in the series would be judged, when it was played.
> Game winning RBI, by contrast, is a retro-fit for that series.  (It also
> bears such a trivial relationship to winning that one can hardly regard it
> as a *predictor* of victory.  Any more than pitcher's win/loss records are.)

You're missing the point. The '60 Yankees dominated in stats which the
papers find easy to compute from boxscores. These stats are highly redundant.
There exist many other stats which could be computed. I am not talking about
game-winning hits. I am referring to things like "BA with men in scoring
position", "BA when your team is losing or tied or 1 run ahead", etc. Stats
like this reduce the impact of blowouts. After all, a HR when your team is
8 runs ahead in the late innings is worth less than a single when the
score is tied. I have always, and will always object to simple-minded
statistics. Your postings reveal that you understand more than a little
about statistics - you know about standard deviations, etc. Why do you
like a simple average like SA+OBA so much? If you were to try to build a
mathematical model of the game, would you include only statistical means,
or would you include more complex statistics? The papers aren't going to
try to compute things like "BA with team behind, tied, or ahead by 1 run"
or a more complicated nonlinear scheme, such as weighting runs by the
probability that the other team will come from behind. Does this mean than
the stats the papers publish are the best?

> >> >The only thing we can say with certainty is that SA+OBA clearly does not
> >> >correlate with winning a short series in the last 20 or so years (since
> >> >artificial turf, night baseball, etc.).
> >> 
> >> The only thing we can say with certainty is that we don't know.
> >
> >No. We DO know that SA+OBA does not correlate with winning a short series
> >in the last 20 years or so, which is EXACTLY what I said.
> 
> But that data is not statistically significant, so we don't know; which is
> EXACTLY what I said.  (By the way, night baseball goes back to the 30's.)

But what you said was in response to my statement that the correlation does
not exist. EXACTLY what I said is "SA+OBA clearly does not correlate with
winning a short series in the last 20 or so years." I did not state a
negative correlation. I stated that the correlation doesn't exist for those
20 years. It doesn't matter if the data is insignificant or not! If the
data is insignificant, then an existing correlation could be put in doubt,
but since the correlation does not exist, then there is no evidence to
support SA+OBA from short series results. Again, I have not stated that
this disproves the importance of SA+OBA, I have only said that it means that
there is no evidence to support SA+OBA. That is all I have to show. Those
who wish to proclaim the importance of a stat must provide evidence for it.
In a sense, we are both right, because we are saying different things. I
am saying that there is no evidence for SA+OBA from recent short series
results, and you are saying that there aren't enough data points to make
any evidence either way - which still means that there is no correlation,
based upon the data, to support SA+OBA.

					Paul Benjamin

franka@mmintl.UUCP (Frank Adams) (11/18/85)

In article <513@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>But it's closer to 21 out of 42 than to 30 out of 42. If you really like
>statistical arguments, how can you prefer an expectation of 30/42 to 21/42 
>unless you are previously biased?

I never claimed to be unbiased.  I claimed that the results are consistent
with my belief.  A statistically insignificant test proves nothing.  If
you do not expect a positive result, it lets you go on not expecting a
positive result.  But don't use that test to back up your argument; it is
irrelevant.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

dpb@philabs.UUCP (Paul Benjamin) (11/19/85)

Frank Adams writes:
> 
> In article <513@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
> >But it's closer to 21 out of 42 than to 30 out of 42. If you really like
> >statistical arguments, how can you prefer an expectation of 30/42 to 21/42 
> >unless you are previously biased?
> 
> I never claimed to be unbiased.  

Obviously.

> I claimed that the results are consistent with my belief.  

If you are willing to go enough standard deviations away, ANY results are
consistent with any belief. Just the use of the word "belief" illustrates
the difference between our approaches to statistics. You choose those that
back up your preconceptions; I try to fit my ideas as closely as possible
to the observations.

> A statistically insignificant test proves nothing.  

You haven't shown it is statistically insignificant. You try to draw
such a fine line between evidence supporting my position, and evidence
supporting noone, but you obviously don't apply such rigorous standards
to yourself. Where is your evidence that the results of the last 25 series
do not constitute enough data? And don't forget, we can always go to the
game level. In other words, during the last series, StL won the first
two games in spite of KC having a higher SA+OBA in those two games. Since
the 25 series' results do show a lot of teams with higher SA+OBA losing
the series, there are probably a good deal of games in which the team with
higher SA+OBA lost. The number of games in those  25 series is nearly a
whole season's games, which is not likely insignificant.

> If you do not expect a positive result, it lets you go on not expecting a
> positive result.  But don't use that test to back up your argument; it is
> irrelevant.

Not if my argument is that those who expect a positive result have no
evidence. Do you remember the original postings, in which David Rubin
posted screen after screen of numbers which supposedly showed that
one player was superior to another on the basis of SA+OBA? My argument
is that there is no evidence to support SA+OBA as such an important
determinant of quality. The lack of a positive result in the last 25
World Series IS evidence to support my position.

Since you seem to be having trouble grasping this logical concept, let's
make an analogy. Suppose someone were to try to convince you that aliens
have been visiting the earth in flying saucers. Now, we have NO known
verified sightings of saucers (fragments in museums, etc.) so that there
is no positive evidence. This does not constitute negative evidence,
i.e., we have no evidence that there haven't been flying saucers, just a
lack of positive evidence. You are stating that this means it is OK to
believe in flying saucers, since "we don't know." I am stating that it
is not OK to believe in them, and particularly not OK to base decisions
on this belief, since there is no positive evidence for that belief.

If you were trying to program a computer to think, e.g., to be able to
examine data and form hypotheses, would you want it to form conclusions
which had no evidence backing them up?

					Paul Benjamin

franka@mmintl.UUCP (Frank Adams) (11/22/85)

In article <516@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>> A statistically insignificant test proves nothing.  [Me]
>
>You haven't shown it is statistically insignificant. You try to draw
>such a fine line between evidence supporting my position, and evidence
>supporting noone, but you obviously don't apply such rigorous standards
>to yourself. Where is your evidence that the results of the last 25 series
>do not constitute enough data? And don't forget, we can always go to the
>game level. In other words, during the last series, StL won the first
>two games in spite of KC having a higher SA+OBA in those two games. Since
>the 25 series' results do show a lot of teams with higher SA+OBA losing
>the series, there are probably a good deal of games in which the team with
>higher SA+OBA lost. The number of games in those  25 series is nearly a
>whole season's games, which is not likely insignificant.

Sorry, if you want to argue from the game by game evidence, you will have to
actually present it.  The inference from series to games doesn't wash -- the
statistical validity of the sample depends on the size of the sample, not
on the size of any underlying statistic.

>> If you do not expect a positive result, it lets you go on not expecting a
>> positive result.  But don't use that test to back up your argument; it is
>> irrelevant.
>
>Not if my argument is that those who expect a positive result have no
>evidence. Do you remember the original postings, in which David Rubin
>posted screen after screen of numbers which supposedly showed that
>one player was superior to another on the basis of SA+OBA? My argument
>is that there is no evidence to support SA+OBA as such an important
>determinant of quality. The lack of a positive result in the last 25
>World Series IS evidence to support my position.

No, because the evidence for the use of the statistic *isn't statistical*.
It is based on theoretical analysis, based on the assumption that situational
variations in performance is mostly random; that any personal differences
from this (clutch hitters, etc.) are small if they exist at all.  *This is
a reasonable, although unproven, assumption.*  Reference to statistical
results is made primarily to show that they do not invalidate the hypothesis.
Pointing out that those statistics don't support the conclusion is no reason
to reject it, because there are other reasons for believing it.  There are,
by contrast, no reasons for believing in aliens in flying saucers.

I note that although the Elias Sports Bureau did find statistically signif-
icant differences in clutch performance, those differences are not large
enough to invalidate the analysis.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

dpb@philabs.UUCP (Paul Benjamin) (12/05/85)

> Sorry, if you want to argue from the game by game evidence, you will have to
> actually present it.  The inference from series to games doesn't wash -- the
> statistical validity of the sample depends on the size of the sample, not
> on the size of any underlying statistic.

I will present it. It's interesting that I have to present evidence to
argue from it, whereas you state below that YOUR point of view is reasonable
though unproven. Double standard, eh? No wonder I can't take your arguments
seriously.

> >> If you do not expect a positive result, it lets you go on not expecting a
> >> positive result.  But don't use that test to back up your argument; it is
> >> irrelevant.
> >
> >Not if my argument is that those who expect a positive result have no
> >evidence. Do you remember the original postings, in which David Rubin
> >posted screen after screen of numbers which supposedly showed that
> >one player was superior to another on the basis of SA+OBA? My argument
> >is that there is no evidence to support SA+OBA as such an important
> >determinant of quality. The lack of a positive result in the last 25
> >World Series IS evidence to support my position.
> 
> No, because the evidence for the use of the statistic *isn't statistical*.
> It is based on theoretical analysis, based on the assumption that situational
> variations in performance is mostly random; that any personal differences
> from this (clutch hitters, etc.) are small if they exist at all.  *This is
> a reasonable, although unproven, assumption.*  Reference to statistical
> results is made primarily to show that they do not invalidate the hypothesis.
> Pointing out that those statistics don't support the conclusion is no reason
> to reject it, because there are other reasons for believing it.  There are,
> by contrast, no reasons for believing in aliens in flying saucers.

Take a basic logic course! The evidence for the use of the statistic
isn't statistical?!?! It's based on theoretical arguments?!?!?!?! WHAT
theoretical arguments? I have yet to see a mathematical model of the
game of baseball that WORKS, i.e., that predicts winners. Don't expect
a mathematician to swallow handwaving arguments.

And you even invalidate your own arguments! How can I take this
seriously? (I don't.) You invalidate your argument by stating "It is based
on the assumption that situational variations in performance is mostly
random". This is what I have been disagreeing with all along! If you 
are going to assume that I am wrong, then of course it is easy to 
show that I am wrong! What DRIVEL!!!

franka@mmintl.UUCP (Frank Adams) (12/08/85)

In article <528@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes:
>> Sorry, if you want to argue from the game by game evidence, you will have to
>> actually present it.  The inference from series to games doesn't wash -- the
>> statistical validity of the sample depends on the size of the sample, not
>> on the size of any underlying statistic.
>
>I will present it. It's interesting that I have to present evidence to
>argue from it, whereas you state below that YOUR point of view is reasonable
>though unproven. Double standard, eh? No wonder I can't take your arguments
>seriously.

No, your point of view IS reasonable though unproven.  I have never presented
evidence which claims to refute it.  You HAVE presented evidence which claims
to refute my point of view, but doesn't.  Until better studies are done,
we won't really know.

>Take a basic logic course! The evidence for the use of the statistic
>isn't statistical?!?! It's based on theoretical arguments?!?!?!?! WHAT
>theoretical arguments? I have yet to see a mathematical model of the
>game of baseball that WORKS, i.e., that predicts winners. Don't expect
>a mathematician to swallow handwaving arguments.

Your condition for a model that "WORKS" presupposes your conclusion.  If
my position is correct, the winner of a seven game playoff is essentially
a 50-50 proposition.  Which of several good teams actually wins a pennant
race is mostly random.  You CAN'T predict winners.

I would say, if anything, the absence of such a model supports my position.
(But not strongly enough that I advance this as a serious argument.)

By the way, I am a mathematician.

>And you even invalidate your own arguments! How can I take this
>seriously? (I don't.) You invalidate your argument by stating "It is based
>on the assumption that situational variations in performance is mostly
>random". This is what I have been disagreeing with all along! If you 
>are going to assume that I am wrong, then of course it is easy to 
>show that I am wrong! What DRIVEL!!!

OK.  This is what we are disagreeing about.  If you want to present
statistical evidence for your point of view, feel free to do so.  But
don't present non-evidence and claim it is evidence.

The only evidence I have presented for my position is that I have claimed
that my assumption accounts for the variation in results.  I will admit
that I have no hard statistics to support this point of view.

I am getting tired of this.  Unless you have something new to say, I will
not make any further postings on the subject.

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108