dpb@philabs.UUCP (Paul Benjamin) (10/23/85)
This is just a short (thank God!) note on team slugging and on-base averages. The Cards did beat LA in those stats, as well as on the field, but the opposite is true for KC vs. Tor, and so far in the World Series. BA SA OBA SA+OBA KC .225 .366 .294 .660 Tor .269 .372 .319 .681 After two games of the Series, we have: KC .270 .393 .343 .736 StL .203 .313 .282 .595 These are based on data in the NY Times of 10/22. By the way, KC has outstolen StL 2 to 1 ! But there is another speed-related stat that is VERY important - StL has turned 4 DPs, KC has turned none. That sure does cancel that edge in OBA. It actually cancels the edge in BA, which is reflected twice - in both OBA and SA, thus cancelling the entire difference in SA+OBA ! (The two teams really have the same number of extra bases (7) so that the difference in SA+OBA is just about twice the difference in BA.) Also note that KC is doing much better in the Series than in the ALCS, according to this stat. In reality, they are scoring 1.5 runs per game, as compared with 3.71 runs per game in the ALCS. Doesn't look like team OBA+SA is so important, does it? Paul Benjamin
abgamble@water.UUCP (Bruce Gamble) (10/27/85)
> This is just a short (thank God!) note on team slugging and > on-base averages. The Cards did beat LA in those stats, as well > as on the field, but the opposite is true for KC vs. Tor, and > so far in the World Series. > > BA SA OBA SA+OBA > > KC .225 .366 .294 .660 > Tor .269 .372 .319 .681 > > Doesn't look like team OBA+SA is so important, does it? > > Paul Benjamin Two quick observations. 1) The results of one seven game series are not going to convince the average person of anything. You may remember that in the 1960 W.S. the Yankees outscored the Pirates by about 30 or so runs, yet Pittsburgh won it in seven games. By your reasoning we could conclude that scoring runs isn't so important. 2) I don't believe that anyone has suggested that we should actually look at the sum of OBA and SA (someone please correct me if I'm wrong). I was under the impression that OBA+SA was intended to mean "OBA and SA", not "OBA plus SA". Combining the two into one number loses much of the information that they contain. If anyone insists, however, on combining them into one number, it would make a lot more sense to multiply them rather than add them. This would give a more accurate measure of a player's offensive value. -- - Bruce Gamble (abgamble@water.UUCP)
dpb@philabs.UUCP (Paul Benjamin) (10/29/85)
> > This is just a short (thank God!) note on team slugging and > > on-base averages. The Cards did beat LA in those stats, as well > > as on the field, but the opposite is true for KC vs. Tor, and > > so far in the World Series. > > > > BA SA OBA SA+OBA > > > > KC .225 .366 .294 .660 > > Tor .269 .372 .319 .681 > > > > > Doesn't look like team OBA+SA is so important, does it? > > > > Paul Benjamin > > Two quick observations. > > 1) The results of one seven game series are not going to convince > the average person of anything. You may remember that in the 1960 > W.S. the Yankees outscored the Pirates by about 30 or so runs, > yet Pittsburgh won it in seven games. By your reasoning we could > conclude that scoring runs isn't so important. Actually, I would agree that just the total of runs is not important. What counts is when they are scored. The NY-Pitt series you mention is the most extreme example of this. But it is definitely true that the gross total (or differential) of runs is not a strong indicator of winning games. The same holds in other sports, such as tennis, where it is often the case that the winner has won fewer games, but won more sets. This is particularly true between strong players. So, no we cannot conclude by my reasoning that scoring runs is not important, because it is awfully hard to win without scoring runs! But we can conclude that just scoring more runs over the course of the series is not important. A 12-1 win is no more important than a 2-1 win. Also note that you omit the stats that I consider most important. The whole point of my posting is that run scoring is not at all dependent on SA+OBA. After all, KC was actually doing better in SA+OBA in the series than against Toronto, but scoring fewer runs. And this is not just the results of one seven-game series. You mention another, so that makes two. It may be the case that there are a number of others (just find a series in which the losing team had a lopsided win.) > 2) I don't believe that anyone has suggested that we should actually > look at the sum of OBA and SA (someone please correct me if I'm > wrong). I was under the impression that OBA+SA was intended to > mean "OBA and SA", not "OBA plus SA". Combining the two into one > number loses much of the information that they contain. > If anyone insists, however, on combining them into one number, it > would make a lot more sense to multiply them rather than add them. > This would give a more accurate measure of a player's offensive > value. > - Bruce Gamble (abgamble@water.UUCP) No. The stat espoused by David Rubin is OBA plus SA. This is the stat printed in the NY Times on occasion. Paul Benjamin P.S. There's a typo in my stats. The Tor total is .691, not .681.
franka@mmintl.UUCP (Frank Adams) (11/02/85)
In article <489@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >> > This is just a short (thank God!) note on team slugging and >> > on-base averages. The Cards did beat LA in those stats, as well >> > as on the field, but the opposite is true for KC vs. Tor, and >> > so far in the World Series. >> > >> > BA SA OBA SA+OBA >> > >> > KC .225 .366 .294 .660 >> > Tor .269 .372 .319 .681 >> >> > >> > Doesn't look like team OBA+SA is so important, does it? >> > >> > Paul Benjamin >> >> Two quick observations. >> >> 1) The results of one seven game series are not going to convince >> the average person of anything. You may remember that in the 1960 >> W.S. the Yankees outscored the Pirates by about 30 or so runs, >> yet Pittsburgh won it in seven games. By your reasoning we could >> conclude that scoring runs isn't so important. > >Actually, I would agree that just the total of runs is not important. >What counts is when they are scored. The NY-Pitt series you mention is >the most extreme example of this. But it is definitely true that the >gross total (or differential) of runs is not a strong indicator of >winning games. The same holds in other sports, such as tennis, where it >is often the case that the winner has won fewer games, but won more sets. >This is particularly true between strong players. The relation between statistics like SA+OBA and scoring runs is pretty much like that between scoring runs and winning games. I think we are getting close to the meat of the argument here. Let's look at the 60 series here. The question, I believe, is the following: (to put it baldly) was the outcome the result of luck or skill? That is, which of the following descriptions of the series is more accurate: (1) The Pirates proved themselves the better team by their ability to score runs in clutch situations. Although the Yankees were better at getting men to cross the plate, Pittsburgh got them when they needed them. (2) Although it is hard to tell from such a short series, the Yankees dominance in all statistical departments makes it seem quite likely that they have the better team. However, the Pirates were fortunate enough to win all the close ones and only lose blowouts, and so won the series. Let me paint the second picture in more extreme terms. From this point of view, all that matters in each at bat is the talents of the pitcher, the batter, and the fielders. Everything else is randomness. Sometimes the batter gets lucky, sometimes the pitcher does. (I have left base stealing out for simplicity; this point of view would hold that the chances of the runner stealing depend on the runner and the relevant fielders, and that whether the runner attempts a steal or not, and whether he is successful or not, does not affect the batter.) A similarly extreme picture of the first option holds that nothing is random. Everything happens as it must happen, given the players and the situation they are involved in. Now, I think it is obvious that neither of these extremes is correct. But I think that number 2 is closer to the truth than number 1. The main argument for randomness is that it suffices to explain the kind of effects that are being talked about. Some fraction of the time, one team will score more runs than the other, yet lose the series. Some fraction of the time, a team will get more men on base and slug better, yet score fewer runs. These things don't just happen occasionally, either; they are fairly common JUST ON THE ASSUMPTION OF RANDOMNESS. Now, in principle, non-randomness could either increase or decrease the frequency of such events. But all the kinds of non-randomness I have seen proposed (some players or teams perform better in certain kinds of situations) will in fact increase this frequency. So in principle, a statistical analysis should be able to determine to what extent such factors are present. But a fairly large sample is required for such a study. All the World Series played are not nearly enough for a study of runs scored vs winning the series to be statistically significant. That *might* be enough for a study of OBA+SA vs. runs scored, but it might not. (The effective sample size is higher in the latter case, being approximately the number of games played, whereas in the latter, it is the number of Series played.) (It is not at all clear what number of runs per game to predict from a given OBA+SA; the prediction of wins from runs is simpler, but also non-trivial. For hockey, the last calculation is fairly easy, but runs in baseball are not always scored one at a time.) LOOKING JUST AT THREE OR FOUR SERIES IS COMPLETELY MEANINGLESS. A complete solution to the expected number of runs scored given certain probabilities for each event involves solving a system of 24 simultaneous equations (8 possible states of having runners on base times 3 possible numbers of outs). Doing so requires some numbers not generally available, such as the chance that a runner will advance from first to third on a single. *** begin digression *** I did this a few years ago, using typical major league numbers for available statistics, and guessing at those that were unavailable. By taking derivatives, one can get estimates of the values of each possible result in the context of a typical offense. The raw results of this computation are not currently available to me. I do remember some scaled and rounded results, which are as follows: Walk: 8 Single: 10 Double: 14 Triple: 17 Homer: 22 DP ball:-1 Out: 0 (A DP ball is a ball which will result in a DP if there is a runner on first and zero or one out. Otherwise it is a ground out. An "Out" is an out which does not change the positions of base runners.) By scaled, I mean that if you take the frequency with which a batter does each of these things (as well as others, e.g., hit a possible sacrifice fly) times the factor above, multiply by an appropriate constant, and add (actually subtract) an appropriate constant, you get an estimate of how many runs that batter will produce per game. Note that this is approximated fairly well by 2*SA+3*OBA (with scaling), except that walks are underestimated thereby. A better approximation is 2*SA+4*OBA-BA. *** end digression *** This method could be expanded on a bit to compute standard deviations in number of expected runs for an offense, as well as the means. It would be interesting to see such a study done for the entire history of the World Series, comparing expected and actual runs scored. Of course, this calculation is still not fully what the "randomness" theory predicts, since it assumes each player has the same chance of producing each result. A more accurate calculation would have 216 equations (24*9), for each situation and each hitter. This still pretends all pitchers are the same, and ignores pinch-hitting, platooning, and other lineup changes. It also ignores the different stealing abilities of different runners. --------------------- There are two established variances from the randomness theory. One is that left-handed batters hit better against right-handed pitchers, and right-handed batters hit better against left-handed pitchers. Another is that batters hit better with runners on base. The former effect is fairly significant, and seems to be different for different players. (So that it would be more accurate to talk about a player's hitting or pitching ability vs. lefties and vs. righties seperately, rather than together.) The latter is comparable in size, perhaps a bit smaller. I do not believe it has been established that the effect depends on the individuals, or how large that effect might be. It is not yet well established that some players are better in the clutch, but based on the Elias data, it appears that this is the case. The size of this effect appears to be about .020 to .040 points, measured in terms of batting average, for the most extreme players. It would take half a dozen such players to significantly affect a teams winning probabilities. This ran on much longer than I intended for it to. Thank you to those of you who read it all. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
dpb@philabs.UUCP (Paul Benjamin) (11/04/85)
> In article <489@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: > >> > This is just a short (thank God!) note on team slugging and > >> > on-base averages. The Cards did beat LA in those stats, as well > >> > as on the field, but the opposite is true for KC vs. Tor, and > >> > so far in the World Series. > >> > > >> > BA SA OBA SA+OBA > >> > > >> > KC .225 .366 .294 .660 > >> > Tor .269 .372 .319 .681 > >> > >> > > >> > Doesn't look like team OBA+SA is so important, does it? > >> > > >> > Paul Benjamin > >> > >> Two quick observations. > >> > >> 1) The results of one seven game series are not going to convince > >> the average person of anything. You may remember that in the 1960 > >> W.S. the Yankees outscored the Pirates by about 30 or so runs, > >> yet Pittsburgh won it in seven games. By your reasoning we could > >> conclude that scoring runs isn't so important. > > > >Actually, I would agree that just the total of runs is not important. > >What counts is when they are scored. The NY-Pitt series you mention is > >the most extreme example of this. But it is definitely true that the > >gross total (or differential) of runs is not a strong indicator of > >winning games. The same holds in other sports, such as tennis, where it > >is often the case that the winner has won fewer games, but won more sets. > >This is particularly true between strong players. > > The relation between statistics like SA+OBA and scoring runs is pretty > much like that between scoring runs and winning games. I think we are > getting close to the meat of the argument here. > > Let's look at the 60 series here. The question, I believe, is the following: > (to put it baldly) was the outcome the result of luck or skill? That is, > which of the following descriptions of the series is more accurate: > > (1) The Pirates proved themselves the better team by their ability to score > runs in clutch situations. Although the Yankees were better at getting > men to cross the plate, Pittsburgh got them when they needed them. > > (2) Although it is hard to tell from such a short series, the Yankees > dominance in all statistical departments makes it seem quite likely > that they have the better team. However, the Pirates were fortunate > enough to win all the close ones and only lose blowouts, and so won > the series. > ... > > Now, I think it is obvious that neither of these extremes is correct. But > I think that number 2 is closer to the truth than number 1. > ... > > LOOKING JUST AT THREE OR FOUR SERIES IS COMPLETELY > MEANINGLESS. > I disagree completely. just consider the numbers posted by Dave Van Handel: "Also, regarding the (SA+OB) argument, I looked it up for all World Series from 1940-1981 (the year of my Baseball Encyclopedia). The results follow: 1940's : 7-3 1950's : 7-3 1960's : 5-5 1970's : 5-5 80 & 81: 0-2 ------------- 42 years 24-18 The team with the greater (SA+OB) has won 24/42 of the series. I was very surprised that it wasn't 30 or 35/42. It *appears* that since the return of stolen bases and the advent of relief pitchers, (SA+OB) is no longer a good indicator of winning. The verdict is still out on whether or not it is a good indicator of run production. Dave Van Handel" In recent years (last 20+), SA+OBA has been totally independent of winning. So much for the luck theory. This considers much more than 3 or 4 series. Also note the circular nature of your argument #2. You state that the Yankees dominated in all statistical departments. This applies only to those stats in which the Yankees dominated! They did not dominate in such stats as hitting with men in scoring position with the score tied. When you realize this, then you see the circular nature of these arguments with statistics. If you compute only certain statistics, then you can always explain away contrary results as "luck". But you can always compute new stats that match the results perfectly, i.e., you can retrofit the stats to the data. This points out the futility of statistical arguments. The only thing we can say with certainty is that SA+OBA clearly does not correlate with winning a short series in the last 20 or so years (since artificial turf, night baseball, etc.). This casts doubt on its importance in evaluating players and their contributions to their teams. Paul Benjamin
franka@mmintl.UUCP (Frank Adams) (11/05/85)
In article <495@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >consider the numbers posted by Dave Van Handel: > >"Also, regarding the (SA+OB) argument, I looked it up for all World Series >from 1940-1981 (the year of my Baseball Encyclopedia). The results follow: > >>1940's : 7-3 >>1950's : 7-3 >>1960's : 5-5 >>1970's : 5-5 >>80 & 81: 0-2 >>------------- >>42 years 24-18 >> > >In recent years (last 20+), SA+OBA has been totally independent of winning. So >much for the luck theory. This considers much more than 3 or 4 series. Did you read the rest of what I wrote? 42 series aren't statistically significant either. It takes hundreds. >Also >note the circular nature of your argument #2. You state that the Yankees >dominated in all statistical departments. This applies only to those stats >in which the Yankees dominated! They did not dominate in such stats as >hitting with men in scoring position with the score tied. When you realize >this, then you see the circular nature of these arguments with statistics. >If you compute only certain statistics, then you can always explain away >contrary results as "luck". But you can always compute new stats that >match the results perfectly, i.e., you can retrofit the stats to the data. >This points out the futility of statistical arguments. But batting average, slugging average, on base average, earned run average, and runs scored weren't retrofitted to the data. These are standard statistics which are generally applied. Since the measures are pre-selected, the argument is not circular. I tried to give some suggestions about how one could actually measure luck vs. clutch hitting and related factors. Until such measures are actually made, we can only speculate. Arguments from insufficient data can only confuse the issue. As to luck, do you really think that if the Yankees and Pirates in 1960 had taken a few days off, then played another series just as important as the first, that the Yankees could be expected to outscore the Pirates but lose the series? >The only thing we can say with certainty is that SA+OBA clearly does not >correlate with winning a short series in the last 20 or so years (since >artificial turf, night baseball, etc.). The only thing we can say with certainty is that we don't know. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
dpb@philabs.UUCP (Paul Benjamin) (11/07/85)
> > In article <495@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: > >consider the numbers posted by Dave Van Handel: > > > >"Also, regarding the (SA+OB) argument, I looked it up for all World Series > >from 1940-1981 (the year of my Baseball Encyclopedia). The results follow: > > > >>1940's : 7-3 > >>1950's : 7-3 > >>1960's : 5-5 > >>1970's : 5-5 > >>80 & 81: 0-2 > >>------------- > >>42 years 24-18 > >> > > > >In recent years (last 20+), SA+OBA has been totally independent of winning. So > >much for the luck theory. This considers much more than 3 or 4 series. > > Did you read the rest of what I wrote? 42 series aren't statistically > significant either. It takes hundreds. 42 Series aren't significant?!?! That's over an entire season's worth of games! Perhaps you should look up the definition of statistically significant. If we ignore these stats, we might as well ignore all season stats. > >Also > >note the circular nature of your argument #2. You state that the Yankees > >dominated in all statistical departments. This applies only to those stats > >in which the Yankees dominated! They did not dominate in such stats as > >hitting with men in scoring position with the score tied. When you realize > >this, then you see the circular nature of these arguments with statistics. > >If you compute only certain statistics, then you can always explain away > >contrary results as "luck". But you can always compute new stats that > >match the results perfectly, i.e., you can retrofit the stats to the data. > >This points out the futility of statistical arguments. > > But batting average, slugging average, on base average, earned run average, > and runs scored weren't retrofitted to the data. These are standard > statistics which are generally applied. Since the measures are pre-selected, > the argument is not circular. Think again. They dominated only in the stats in which they dominated. Also please note that those "standard" stats are highly redundant - they all are different ways of saying similar things. For example, team runs and the opposing team's ERA are very similar. And note that there are stats in which the Pirates led, such as game-winning RBI. Also realize that these statistical categories were not handed down by God. They arose because they were retrofitted at one time to previous data. Thus, they were never pre-selected. BA and ERA did not exist before baseball! > >The only thing we can say with certainty is that SA+OBA clearly does not > >correlate with winning a short series in the last 20 or so years (since > >artificial turf, night baseball, etc.). > > The only thing we can say with certainty is that we don't know. No. We DO know that SA+OBA does not correlate with winning a short series in the last 20 years or so, which is EXACTLY what I said. Thus, what evidence there is does NOT support the position that SA+OBA is a great statistic. Note that I have been very careful here. I have NOT said that SA+OBA has been proven to be bad, nor that it may not be a useful stat at times. All I said in the original posting is that the evidence does not support those who worship this stat. If you have been following the discussion over this stat, then you know that some people feel it is the best stat available. Showing that the evidence does not support this in series play is sufficient to discredit this. Paul Benjamin
bob@pedsgd.UUCP (Robert A. Weiler) (11/11/85)
Organization : Perkin-Elmer DSG, Tinton Falls NJ Keywords: In article <778@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: {the continuing saga of OBA + SA versus gut feeling in predicting runs scored} > >Did you read the rest of what I wrote? 42 series aren't statistically >significant either. It takes hundreds. > { bunch of stuff deleted } > { >> = P. Benjamin ? } >>The only thing we can say with certainty is that SA+OBA clearly does not >>correlate with winning a short series in the last 20 or so years (since >>artificial turf, night baseball, etc.). > >The only thing we can say with certainty is that we don't know. > >Frank Adams ihpn4!philabs!pwa-b!mmintl!franka >Multimate International 52 Oakland Ave North E. Hartford, CT 06108 I have to agree with Frank on this, there just isnt enough data to draw any conclusions. However, I think we can speculate that SA+OBA is irrelevant in modern series play due to current pitching practice. It seems logical that in an era when starting pitchers worked every 3 days and usually went the distance that the team with the better offensive statistics was probably the winner. The deviation in pitching quality was small. In modern baseball the situation is quite different. It is entirely possible that a team with 2 excellent starters and an otherwise mediocre pitching staff could win 4 low scoring games and get blown out in the other 3. None of this has any bearing on whether OBA + SA is a good statistic for rating an individuals contribution to his team. In fact, when I looked at the final statistics for the year, OBA + SA DID correlate very strongly with my gut feeling as to how important individuals were to their team. As an aside to Paul B. who wondered some months ago why Hubie Brooks had about the same number of RBI's as G. Carter despite a 10% worse team batting average, Brooks also had about 10% more at bats. In addition, Montreal has perhaps the best lead-off man in the NL, Tim Raines, who along with stealing 50 bases has, surprise, one of the best OBA's in the league. So what we see in this case is exactly what D. Rubin has claimed; RBI's are influenced by lineup effects, OBA and SA are not. Just trying to generate some heat for the winter. Bob Weiler
franka@mmintl.UUCP (Frank Adams) (11/15/85)
In article <500@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >> >"Also, regarding the (SA+OB) argument, I looked it up for all World Series >> >from 1940-1981 (the year of my Baseball Encyclopedia). The results follow: >> > >> >>1940's : 7-3 >> >>1950's : 7-3 >> >>1960's : 5-5 >> >>1970's : 5-5 >> >>80 & 81: 0-2 >> >>------------- >> >>42 years 24-18 >> >> Did you read the rest of what I wrote? 42 series aren't statistically >> significant either. It takes hundreds. > >42 Series aren't significant?!?! That's over an entire season's worth of >games! Perhaps you should look up the definition of statistically >significant. If we ignore these stats, we might as well ignore all season >stats. If you are looking only at who wins the series, you only have 42 cases. If you want the results to reflect the number of games, you have to have the statistics by game, not by series. Also, the statistics for the 42 series *do* tend to support the importance of the statistic. Not as strongly as I would have expected, but well within the normal range of variation. If the expected number is 30 out of 42, the standard deviation is about 2.9. Thus 24 is not much more than two standard deviations away. About a one in twenty shot. As for season stats, most of the variation in a player's batting average from season to season is explainable by statistical fluctuation. >> >Also >> >note the circular nature of your argument #2. You state that the Yankees >> >dominated in all statistical departments. This applies only to those stats >> >in which the Yankees dominated! >> But batting average, slugging average, on base average, earned run average, >> and runs scored weren't retrofitted to the data. These are standard >> statistics which are generally applied. Since the measures are pre- >> selected, the argument is not circular. > >Think again. They dominated only in the stats in which they dominated. Also >please note that those "standard" stats are highly redundant - they all >are different ways of saying similar things. For example, team runs and the >opposing team's ERA are very similar. And note that there are stats in which >the Pirates led, such as game-winning RBI. > >Also realize that these statistical categories were not handed down by >God. They arose because they were retrofitted at one time to previous data. >Thus, they were never pre-selected. BA and ERA did not exist before >baseball! They were pre-selected *for that series*. That is, they were the established criteria by which the play in the series would be judged, when it was played. Game winning RBI, by contrast, is a retro-fit for that series. (It also bears such a trivial relationship to winning that one can hardly regard it as a *predictor* of victory. Any more than pitcher's win/loss records are.) >> >The only thing we can say with certainty is that SA+OBA clearly does not >> >correlate with winning a short series in the last 20 or so years (since >> >artificial turf, night baseball, etc.). >> >> The only thing we can say with certainty is that we don't know. > >No. We DO know that SA+OBA does not correlate with winning a short series >in the last 20 years or so, which is EXACTLY what I said. But that data is not statistically significant, so we don't know; which is EXACTLY what I said. (By the way, night baseball goes back to the 30's.) Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
dpb@philabs.UUCP (Paul Benjamin) (11/15/85)
> >> 1940's : 7-3 > >> 1950's : 7-3 > >> 1960's : 5-5 > >> 1970's : 5-5 > >> 80 & 81: 0-2 > >> ------------- > >> 42 years 24-18 > >> > >42 Series aren't significant?!?! That's over an entire season's worth of > >games! Perhaps you should look up the definition of statistically > >significant. If we ignore these stats, we might as well ignore all season > >stats. > > If you are looking only at who wins the series, you only have 42 cases. If > you want the results to reflect the number of games, you have to have the > statistics by game, not by series. > > Also, the statistics for the 42 series *do* tend to support the importance > of the statistic. Not as strongly as I would have expected, but well > within the normal range of variation. If the expected number is 30 out > of 42, the standard deviation is about 2.9. Thus 24 is not much more than > two standard deviations away. About a one in twenty shot. But it's closer to 21 out of 42 than to 30 out of 42. If you really like statistical arguments, how can you prefer an expectation of 30/42 to 21/42 unless you are previously biased? > >> >Also > >> >note the circular nature of your argument #2. You state that the Yankees > >> >dominated in all statistical departments. This applies only to those stats > >> >in which the Yankees dominated! > >> But batting average, slugging average, on base average, earned run average, > >> and runs scored weren't retrofitted to the data. These are standard > >> statistics which are generally applied. Since the measures are pre- > >> selected, the argument is not circular. > > > >Think again. They dominated only in the stats in which they dominated. Also > >please note that those "standard" stats are highly redundant - they all > >are different ways of saying similar things. For example, team runs and the > >opposing team's ERA are very similar. And note that there are stats in which > >the Pirates led, such as game-winning RBI. > > > >Also realize that these statistical categories were not handed down by > >God. They arose because they were retrofitted at one time to previous data. > >Thus, they were never pre-selected. BA and ERA did not exist before > >baseball! > > They were pre-selected *for that series*. That is, they were the established > criteria by which the play in the series would be judged, when it was played. > Game winning RBI, by contrast, is a retro-fit for that series. (It also > bears such a trivial relationship to winning that one can hardly regard it > as a *predictor* of victory. Any more than pitcher's win/loss records are.) You're missing the point. The '60 Yankees dominated in stats which the papers find easy to compute from boxscores. These stats are highly redundant. There exist many other stats which could be computed. I am not talking about game-winning hits. I am referring to things like "BA with men in scoring position", "BA when your team is losing or tied or 1 run ahead", etc. Stats like this reduce the impact of blowouts. After all, a HR when your team is 8 runs ahead in the late innings is worth less than a single when the score is tied. I have always, and will always object to simple-minded statistics. Your postings reveal that you understand more than a little about statistics - you know about standard deviations, etc. Why do you like a simple average like SA+OBA so much? If you were to try to build a mathematical model of the game, would you include only statistical means, or would you include more complex statistics? The papers aren't going to try to compute things like "BA with team behind, tied, or ahead by 1 run" or a more complicated nonlinear scheme, such as weighting runs by the probability that the other team will come from behind. Does this mean than the stats the papers publish are the best? > >> >The only thing we can say with certainty is that SA+OBA clearly does not > >> >correlate with winning a short series in the last 20 or so years (since > >> >artificial turf, night baseball, etc.). > >> > >> The only thing we can say with certainty is that we don't know. > > > >No. We DO know that SA+OBA does not correlate with winning a short series > >in the last 20 years or so, which is EXACTLY what I said. > > But that data is not statistically significant, so we don't know; which is > EXACTLY what I said. (By the way, night baseball goes back to the 30's.) But what you said was in response to my statement that the correlation does not exist. EXACTLY what I said is "SA+OBA clearly does not correlate with winning a short series in the last 20 or so years." I did not state a negative correlation. I stated that the correlation doesn't exist for those 20 years. It doesn't matter if the data is insignificant or not! If the data is insignificant, then an existing correlation could be put in doubt, but since the correlation does not exist, then there is no evidence to support SA+OBA from short series results. Again, I have not stated that this disproves the importance of SA+OBA, I have only said that it means that there is no evidence to support SA+OBA. That is all I have to show. Those who wish to proclaim the importance of a stat must provide evidence for it. In a sense, we are both right, because we are saying different things. I am saying that there is no evidence for SA+OBA from recent short series results, and you are saying that there aren't enough data points to make any evidence either way - which still means that there is no correlation, based upon the data, to support SA+OBA. Paul Benjamin
franka@mmintl.UUCP (Frank Adams) (11/18/85)
In article <513@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >But it's closer to 21 out of 42 than to 30 out of 42. If you really like >statistical arguments, how can you prefer an expectation of 30/42 to 21/42 >unless you are previously biased? I never claimed to be unbiased. I claimed that the results are consistent with my belief. A statistically insignificant test proves nothing. If you do not expect a positive result, it lets you go on not expecting a positive result. But don't use that test to back up your argument; it is irrelevant. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
dpb@philabs.UUCP (Paul Benjamin) (11/19/85)
Frank Adams writes: > > In article <513@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: > >But it's closer to 21 out of 42 than to 30 out of 42. If you really like > >statistical arguments, how can you prefer an expectation of 30/42 to 21/42 > >unless you are previously biased? > > I never claimed to be unbiased. Obviously. > I claimed that the results are consistent with my belief. If you are willing to go enough standard deviations away, ANY results are consistent with any belief. Just the use of the word "belief" illustrates the difference between our approaches to statistics. You choose those that back up your preconceptions; I try to fit my ideas as closely as possible to the observations. > A statistically insignificant test proves nothing. You haven't shown it is statistically insignificant. You try to draw such a fine line between evidence supporting my position, and evidence supporting noone, but you obviously don't apply such rigorous standards to yourself. Where is your evidence that the results of the last 25 series do not constitute enough data? And don't forget, we can always go to the game level. In other words, during the last series, StL won the first two games in spite of KC having a higher SA+OBA in those two games. Since the 25 series' results do show a lot of teams with higher SA+OBA losing the series, there are probably a good deal of games in which the team with higher SA+OBA lost. The number of games in those 25 series is nearly a whole season's games, which is not likely insignificant. > If you do not expect a positive result, it lets you go on not expecting a > positive result. But don't use that test to back up your argument; it is > irrelevant. Not if my argument is that those who expect a positive result have no evidence. Do you remember the original postings, in which David Rubin posted screen after screen of numbers which supposedly showed that one player was superior to another on the basis of SA+OBA? My argument is that there is no evidence to support SA+OBA as such an important determinant of quality. The lack of a positive result in the last 25 World Series IS evidence to support my position. Since you seem to be having trouble grasping this logical concept, let's make an analogy. Suppose someone were to try to convince you that aliens have been visiting the earth in flying saucers. Now, we have NO known verified sightings of saucers (fragments in museums, etc.) so that there is no positive evidence. This does not constitute negative evidence, i.e., we have no evidence that there haven't been flying saucers, just a lack of positive evidence. You are stating that this means it is OK to believe in flying saucers, since "we don't know." I am stating that it is not OK to believe in them, and particularly not OK to base decisions on this belief, since there is no positive evidence for that belief. If you were trying to program a computer to think, e.g., to be able to examine data and form hypotheses, would you want it to form conclusions which had no evidence backing them up? Paul Benjamin
franka@mmintl.UUCP (Frank Adams) (11/22/85)
In article <516@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >> A statistically insignificant test proves nothing. [Me] > >You haven't shown it is statistically insignificant. You try to draw >such a fine line between evidence supporting my position, and evidence >supporting noone, but you obviously don't apply such rigorous standards >to yourself. Where is your evidence that the results of the last 25 series >do not constitute enough data? And don't forget, we can always go to the >game level. In other words, during the last series, StL won the first >two games in spite of KC having a higher SA+OBA in those two games. Since >the 25 series' results do show a lot of teams with higher SA+OBA losing >the series, there are probably a good deal of games in which the team with >higher SA+OBA lost. The number of games in those 25 series is nearly a >whole season's games, which is not likely insignificant. Sorry, if you want to argue from the game by game evidence, you will have to actually present it. The inference from series to games doesn't wash -- the statistical validity of the sample depends on the size of the sample, not on the size of any underlying statistic. >> If you do not expect a positive result, it lets you go on not expecting a >> positive result. But don't use that test to back up your argument; it is >> irrelevant. > >Not if my argument is that those who expect a positive result have no >evidence. Do you remember the original postings, in which David Rubin >posted screen after screen of numbers which supposedly showed that >one player was superior to another on the basis of SA+OBA? My argument >is that there is no evidence to support SA+OBA as such an important >determinant of quality. The lack of a positive result in the last 25 >World Series IS evidence to support my position. No, because the evidence for the use of the statistic *isn't statistical*. It is based on theoretical analysis, based on the assumption that situational variations in performance is mostly random; that any personal differences from this (clutch hitters, etc.) are small if they exist at all. *This is a reasonable, although unproven, assumption.* Reference to statistical results is made primarily to show that they do not invalidate the hypothesis. Pointing out that those statistics don't support the conclusion is no reason to reject it, because there are other reasons for believing it. There are, by contrast, no reasons for believing in aliens in flying saucers. I note that although the Elias Sports Bureau did find statistically signif- icant differences in clutch performance, those differences are not large enough to invalidate the analysis. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
dpb@philabs.UUCP (Paul Benjamin) (12/05/85)
> Sorry, if you want to argue from the game by game evidence, you will have to > actually present it. The inference from series to games doesn't wash -- the > statistical validity of the sample depends on the size of the sample, not > on the size of any underlying statistic. I will present it. It's interesting that I have to present evidence to argue from it, whereas you state below that YOUR point of view is reasonable though unproven. Double standard, eh? No wonder I can't take your arguments seriously. > >> If you do not expect a positive result, it lets you go on not expecting a > >> positive result. But don't use that test to back up your argument; it is > >> irrelevant. > > > >Not if my argument is that those who expect a positive result have no > >evidence. Do you remember the original postings, in which David Rubin > >posted screen after screen of numbers which supposedly showed that > >one player was superior to another on the basis of SA+OBA? My argument > >is that there is no evidence to support SA+OBA as such an important > >determinant of quality. The lack of a positive result in the last 25 > >World Series IS evidence to support my position. > > No, because the evidence for the use of the statistic *isn't statistical*. > It is based on theoretical analysis, based on the assumption that situational > variations in performance is mostly random; that any personal differences > from this (clutch hitters, etc.) are small if they exist at all. *This is > a reasonable, although unproven, assumption.* Reference to statistical > results is made primarily to show that they do not invalidate the hypothesis. > Pointing out that those statistics don't support the conclusion is no reason > to reject it, because there are other reasons for believing it. There are, > by contrast, no reasons for believing in aliens in flying saucers. Take a basic logic course! The evidence for the use of the statistic isn't statistical?!?! It's based on theoretical arguments?!?!?!?! WHAT theoretical arguments? I have yet to see a mathematical model of the game of baseball that WORKS, i.e., that predicts winners. Don't expect a mathematician to swallow handwaving arguments. And you even invalidate your own arguments! How can I take this seriously? (I don't.) You invalidate your argument by stating "It is based on the assumption that situational variations in performance is mostly random". This is what I have been disagreeing with all along! If you are going to assume that I am wrong, then of course it is easy to show that I am wrong! What DRIVEL!!!
franka@mmintl.UUCP (Frank Adams) (12/08/85)
In article <528@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >> Sorry, if you want to argue from the game by game evidence, you will have to >> actually present it. The inference from series to games doesn't wash -- the >> statistical validity of the sample depends on the size of the sample, not >> on the size of any underlying statistic. > >I will present it. It's interesting that I have to present evidence to >argue from it, whereas you state below that YOUR point of view is reasonable >though unproven. Double standard, eh? No wonder I can't take your arguments >seriously. No, your point of view IS reasonable though unproven. I have never presented evidence which claims to refute it. You HAVE presented evidence which claims to refute my point of view, but doesn't. Until better studies are done, we won't really know. >Take a basic logic course! The evidence for the use of the statistic >isn't statistical?!?! It's based on theoretical arguments?!?!?!?! WHAT >theoretical arguments? I have yet to see a mathematical model of the >game of baseball that WORKS, i.e., that predicts winners. Don't expect >a mathematician to swallow handwaving arguments. Your condition for a model that "WORKS" presupposes your conclusion. If my position is correct, the winner of a seven game playoff is essentially a 50-50 proposition. Which of several good teams actually wins a pennant race is mostly random. You CAN'T predict winners. I would say, if anything, the absence of such a model supports my position. (But not strongly enough that I advance this as a serious argument.) By the way, I am a mathematician. >And you even invalidate your own arguments! How can I take this >seriously? (I don't.) You invalidate your argument by stating "It is based >on the assumption that situational variations in performance is mostly >random". This is what I have been disagreeing with all along! If you >are going to assume that I am wrong, then of course it is easy to >show that I am wrong! What DRIVEL!!! OK. This is what we are disagreeing about. If you want to present statistical evidence for your point of view, feel free to do so. But don't present non-evidence and claim it is evidence. The only evidence I have presented for my position is that I have claimed that my assumption accounts for the variation in results. I will admit that I have no hard statistics to support this point of view. I am getting tired of this. Unless you have something new to say, I will not make any further postings on the subject. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108