david@fisher.UUCP (David Rubin) (09/17/85)
You just KNEW I wasn't going to let Paul's demonstration pass without challenge, didn't you?? First, let me say right off that while I disagree with what most of Paul wrote, if I countered all his points, (a) this article would be another monster, and (b) general principles would be lost among specifics. Much of Paul's arguments are anecdotal in nature: he brings up a case which he believes supports his position, and concludes that, since his explanation is CONSISTENT with his own observations, it must be TRUE. As an example, he credits McGee's year to Coleman; he is satisfied that since his explanation makes sense, (1) he may disregard alternate explanations of the event, and (2) he need not further investigate. I shall limit myself, therefore, to the general comment (call it Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT HAS ACTUALLY OCCURRED. All of Paul's explanations mean little, therefore, until he establishes that what his explanations explain has indeed happened! Only in the case of Mattingly does he attempt to actually demonstrate that a lineup effect exists, and I will therefore concentrate on it. Elsewhere, he merely shows lineup effects are consistent with his selected observations without either showing other explanations are inconsistent or that the observations would be inexplicable without lineup effects. Unfortunately, he places far more interpretive weight on his statistics than they can bear; odd, considering his previously expressed worry that I, as a Statistician, was likely to be taken in by a spurious (and superficial) correlation. > Mattingly's stats Yankees record > BA Slugging W L Pct. >Batting 2nd .402 .715 27 8 .771 >Batting 3rd .303 .495 42 40 .525 > or 4th >We can see that not only are personal stats highly dependent >on the other people in the lineup, but also dependent on the order >in which those people bat. CONFOUNDMENT? We can only say this if we know the ONLY thing that is varying is the lineup. It may be that Mattingly has batted second ONLY against right-handed pitching or that some OTHER factor is responsible for the difference. In other words, a simple breakdown such as this is worthless (possibly even worse: it may be misleading) unless we also know that the circumstances of the two categories (batting 2nd vs. batting 3rd or 4th) are otherwise similar; otherwise, it may be some other factor (such as lefty-righty, home-away, grass-turf, day-night, etc.), strongly correlated with the categories, that is driving the discrepancy (Statisticians refer to this confusion of one cause with another as "confounding"). AMOUNT OF DATA? Moreover, even if Paul COULD assure us that this was so, he does not have nearly enough data. Examine, in particular, the data for batting second: it is based on 35 games, i.e. about 100-150 at bats. Most fans will not put much store in a player's average after 35 games (early May), and for good reason: the player has not yet accumulated enough at bats for us to form any reasonable opinion as to his likely seasonal productivity. We are talking about guessing whether a player is hitting .300 or .400 based on that many at bats: it would not be at all unusual for the difference (10 to 15 hits) to be due to a "hot" or "cold" streak (what Statisticians conveniently label "random", but we may understand as being that which is beyond our knowledge). We would need to have many more at bats (perhaps in a couple of more seasons we will) before we could say that the difference is due to the position in the lineup rather than a propitious hot streak. To put it another way, if a lifetime .300 hitter were to have a .400 average on May 5th, would you tentatively conclude (until further info was available) that the man would bat .400 for the season? Of course not. You would correctly conclude that he is more likely to hit .300 from June through September than .400. He may just have had a good April... LIMITED APPLICATION? Even if it were established for Mattingly, it would hold only for Don Mattingly with the current Yankees: to apply it to, say, Tony Pena, it would have to be demonstrated for a wide variety of players on a wide variety of teams. Still, it would be quite a surprise to me if anyone could get even that far. TSN BIAS!!!! Finally, the selection is biased. The Sporting News didn't say, "Let's check on Mattingly's stats and publish regardless", as they would have to if we were to have any hope that Mattingly was somehow typical; they certainly perused all the available stats and published the one(s) they considered most "interesting" or "newsworthy". We can be certain that the discrepancy in Mattingly's stats are therefore unusually large. If that is the greatest discrepancy available among the 300 or so regular players, Mattingly's extra 10-15 hits and 20-30 extra bases in his 150 at bats, then I am very unimpressed: such discrepancies would probably be as large in a similarly sized sample broken down into phases of the moon. Make no mistake: it is Sporting News's job to publish discrepancies such as this because they are among the largest, as their readers demand the unusual, not the typical. PENA-CARTER Yes, this came up again, and I have to point out that (1) My arguments were based entirely on pre-1985, and (2) Pre-1985, Pena's team was about as productive as Carter's so that Paul's argument (again) about Hernandez and Strawberry being responsible (again) for Carter's stats are irrelevant (again). And even if we WERE to consider them, why does Paul believe that Carter has his stats inflated by Hernandez, Strawberry, and Foster when NONE of those three show any substantial increase in production over last year? I suppose Paul believes Carter has a special dispensation: in moving from the Expos to the Mets, he gains by being surrounded by Keith, Darryl, and George, while those three do NOT gain from Gary's presence. The fact is, the production of all four has remained about the same over the past two years, an argument AGAINST lineup effects. My apologies for not being able to resist the anecdotal argument.. CONCLUSION For Paul to demonstrate lineup effects, he will need (1) More data (more players, more at bats), (2) Better data (some effort to exclude other factors; however, it may suffice to simply have more data, so that we may reasonably expect to have other factors balance out), and (3) Unbiased data (players selected because, a priori, we think their records will be most illuminating; a posteriori selection, a la TSN, is invalid). Neither Paul nor I has the time nor resources to do this. Some people do, and are supposedly doing it (the folks at SABR...). They have, so far, according to Pete Palmer, found "no evidence" of lineup effects. This does not "disprove" lineup effects; however, it detracts from human understanding to accept as truth all that is not disproven. David Rubin {allegra|astrovax|princeton}!fisher!david
dpb@philabs.UUCP (Paul Benjamin) (09/24/85)
Alright, folks, here's another exceedingly long posting for anyone who cares to keep track of this argument over what baseball statistics can and cannot mean. It consists of a point-by-point rebuttal of a posting by David Rubin. > First, let me say right off that while I disagree with what most of > Paul wrote, if I countered all his points, > (a) this article would be another monster, and > (b) general principles would be lost among specifics. > > Much of Paul's arguments are anecdotal in nature: he brings up a case > which he believes supports his position, and concludes that, since his > explanation is CONSISTENT with his own observations, it must be TRUE. > As an example, he credits McGee's year to Coleman; he is satisfied > that since his explanation makes sense, > > (1) he may disregard alternate explanations of the event, and > (2) he need not further investigate. I wish that, for once, you would read what I wrote. The points I presented were not of my own making. They are the opinions of, among others, Billy Martin, and the author of the article. Have you read the article? Also note that everything you have said above can be said about you! You disregard my explanation of the events, and have not proven, in any sense, that on-base average and slugging average are independent of factors such as who is batting in front or behind you. Your evidence is completely anecdotal. You embrace those stats without showing that any strong correlation exists between them and scoring runs (or more precisely, that a stronger correlation exists than for, say, the stat R + RBI - HR.) It is not just my responsibility to prove that lineup dependencies exist. It is also yours to prove that they don't! > I shall limit myself, therefore, to the general comment (call it > Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE > EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT > HAS ACTUALLY OCCURRED. Perhaps you should take a course or two in prob&stat and learn the actual laws, instead of making up your own. > All of Paul's explanations mean little, > therefore, until he establishes that what his explanations explain has > indeed happened! Only in the case of Mattingly does he attempt to > actually demonstrate that a lineup effect exists, and I will therefore > concentrate on it. Elsewhere, he merely shows lineup effects are > consistent with his selected observations without either showing other > explanations are inconsistent or that the observations would be > inexplicable without lineup effects. > In other words, a simple breakdown such as this is > worthless (possibly even worse: it may be misleading) unless we also > know that the circumstances of the two categories (batting 2nd vs. > batting 3rd or 4th) are otherwise similar; otherwise, it may be some > other factor (such as lefty-righty, home-away, grass-turf, day-night, > etc.), strongly correlated with the categories, that is driving the > discrepancy (Statisticians refer to this confusion of one cause with > another as "confounding"). Marvelous! Perfect! I'm SO glad you said this. It's much easier to shoot down someone's argument when he provides the ammunition himself. This is exactly the point I have been making for weeks. "it may be misleading unless we know that the circumstances of the two categories are otherwise similar..." Two players for different teams do not satisfy this criterion, and thus their stats are not directly comparable. For example, many, including myself, like Guerrero for the MVP, but I don't favor him because he leads the NL in slugging and on-base average. Those stats are irrelevant, since you can't compare them to, say, Dale Murphy's stats. Why not? It's simple. 18 times a year, Dale Murphy has to face the great Dodger pitching staff, which is clearly the best in the league, while Guerrero faces the Braves' staff, which is one of the worst. That's over 11% of the season. This is in addition to other differences, such as the number of day/night games, the different stadiums they play in, the number of double-headers they play in, the number of day games after night games, etc. They don't even play the exact same other teams, either! After all, if a team played most of its games against Philadelphia earlier in the year, they faced a much easier opponent than a team whose schedule calls for them to face Phila now. The reverse is true for the Cubs. Playing them before all their starters were injured is different than playing them afterwards. So, unless you can correct for ALL these factors, and others, to ensure that your circumstances are similar, all the analyses that you have posted are "worthless (possibly even worse: (they) may be misleading". The only attempt you have made to correct your stats is to include a ratio which takes into account the differences between stadiums, and how hard they are for hitters. But even this attempt showed your statistical inexperience. Saying, for example, that park A is 10% percent harder to hit in than park B because the overall averages (of say, slugging) are 10% lower, is a valuable and meaningful stat when applied to the whole group of hitters - it provides information on the park to its owners. But it is TOTALLY MEANINGLESS to apply this stat to individual batters in this park. One must also know the shape of the distribution. It could be that almost nobody hits 10% worse in that park - that many hit much worse or better, and it averages out to 10%. For example, if a country's families have 2.3 children on the average, it doesn't mean that anyone has 2.3 children, or even that most families have 2 or 3 children. Bivariate distributions are not uncommon, and in these, almost noone is around the mean. Furthermore, the reason that I use only Mattingly is that these stats are rarely available. It's much easier to compute personal averages such as batting average, slugging average, runs, RBI, etc. than to compute how much a batter tends to improve the stats of those batting ahead of him or behind him, etc. We almost never see these stats. We don't often enough see stats such as batting average with runners in scoring position, etc. You criticize me for the deficiencies of baseball statisticians everywhere. It's not my fault, so don't criticize me for it. > Moreover, even if Paul COULD assure us that this was so, he does not > have nearly enough data. Examine, in particular, the data for batting > second: it is based on 35 games, i.e. about 100-150 at bats. Most > fans will not put much store in a player's average after 35 games > (early May), and for good reason: the player has not yet accumulated > enough at bats for us to form any reasonable opinion as to his likely > seasonal productivity. We are talking about guessing whether a player > is hitting .300 or .400 based on that many at bats: it would not be at > all unusual for the difference (10 to 15 hits) to be due to a "hot" or > "cold" streak (what Statisticians conveniently label "random", but we > may understand as being that which is beyond our knowledge). We would > need to have many more at bats (perhaps in a couple of more seasons we > will) before we could say that the difference is due to the position > in the lineup rather than a propitious hot streak. To put it another > way, if a lifetime .300 hitter were to have a .400 average on May 5th, > would you tentatively conclude (until further info was available) that > the man would bat .400 for the season? Of course not. You would > correctly conclude that he is more likely to hit .300 from June > through September than .400. He may just have had a good April... Again I wish you would actually read the article before you respond to it! Of course, I know you already know everything :-) Mattingly's hot stats for the second position were not compiled in one streak. He started the season batting 3-4, then moved him to 2 in May for 17 games. He was then moved back to 3-4, but occasionally in June and July batted 2. The article does not give stats for those instances alone, but states that it "worked like a charm". He was still usually batting 3-4, but was moved to 2 on August 5, when Martin became aware of the stats for his earlier production in the 2 spot. So, it is NOT the result of a hot streak. As for right-handed vs. left-handed opposition, I checked the games from August 5 on. There were both right-handed and left-handed opponents. He is playing full-time in that spot, so he faces all types of pitching. Martin moved him to 2 on August 5 because of his excellent production in that spot before. > Even if it were established for Mattingly, it would hold only for > Don Mattingly with the current Yankees: to apply it to, say, Tony > Pena, it would have to be demonstrated for a wide variety of players on > a wide variety of teams. Still, it would be quite a surprise to me if > anyone could get even that far. I see! Whenever I come up with evidence, it counts only for that case, but you have never detailed an instance of a player changing, say, his lineup position and keeping the same OBA and slugging pct., but I am supposed to swallow your arguments! It's interesting. When I respond to your postings, I feel like I'm trying to explain baseball to a Martian. You know so little about the game! EVERYBODY knows that lineups are interdependent! Try watching a game sometime (instead of just reading numbers). You'll see that when a runner is on base, it affects (among other things): 1) the way the pitcher throws. Using the stretch instead of a full windup definitely hurts most pitchers' performances. Otherwise, there would be no need for anyone to ever windup. 2) the pitch selection; 3) the defensive alignment. Thus, if the batter ahead of, say Mattingly, gets on base more often, is a threat to steal, and gets in scoring position more often, he can (and does) affect whether Mattingly gets a hit or not. Perhaps we should just forget this whole argument. You will continue to emphasize the individual aspects of the game, and I will continue to emphasize the team aspects. After all, if we both enjoy the game, that's the purpose of baseball anyway. By the way, if you still doubt the existence of lineup dependency (which you undoubtedly still do) then answer the following question: If there were no lineup interaction, then all managers would bat their best hitter first, then their second-best, etc. to give them the most opportunities to hit. Thus, according to your criteria (OBA and slugging pct), the way to optimize the team's OBA and slugging pct is to bat the best in these categories first, the next-best second, etc. We would see Carter batting leadoff for the Mets, and Coleman would not be the leadoff hitter for St. Louis, McGee would be, followed by Clark. Coleman would be somewhere around 6 or 7. Come to think of it, since Cedeno has been playing for the Cards and the way he has been hitting, he would be batting leadoff. Also, Guerrero would be hitting leadoff for LA (absurd!). As ANY real baseball fan knows, managers carefully pick the order to help run production, e.g. alternating left-handed and right-handed batters, and putting speedsters in front of hitters who hit well with men in scoring position. WHY WOULD THEY BOTHER TO DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly leadoff, to get him more atbats? Maybe the fact that he would be batting behind a much weaker hitter just MIGHT have a teeny-weeny little bit to do with it?! Thus, we see that some excellent managers, such as Whitey Herzog, deliberately put a player like Coleman, who has a lower OBA and slugging average than McGee, in the spot where he will get the most at-bats, thus effectively reducing the overall OBA and slugging pct of his team. Do you really think he is deliberately reducing the run-scoring ability of his team? Or do you just think that all these baseball professionals are sadly misguided? The only other alternative is that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them up to be. There must be other factors, e.g., speed. To rephrase this point, so that you will have less chance of misinterpreting it, if Guerrero's slugging avg and OBA are what are most important to the Dodgers, then he should bat leadoff, so as to maximize the team's slugging avg and OBA. He doesn't, and the very idea seems preposterous. Either Lasorda doesn't understand the game as you do, or your emphasis on OBA and slugging is wrong. Which is it? The lineup can even affect the selection of relief pitchers. And haven't you ever heard a manager say that what he really needs is a left-handed power-hitter (or more speed in the lineup, etc.)? Why are these things important to managers if the players in lineups don't interact? > And even if we WERE to consider them, why does Paul believe that Carter > has his stats inflated by Hernandez, Strawberry, and Foster when NONE > of those three show any substantial increase in production over last > year? WRONG. Strawberry is having a much better year than last year. Note that Strawberry bats directly behind Carter, just as Mattingly bats directly behind Henderson. See below. > I suppose Paul believes Carter has a special dispensation: in > moving from the Expos to the Mets, he gains by being surrounded by > Keith, Darryl, and George, while those three do NOT gain from Gary's > presence. The fact is, the production of all four has remained about > the same over the past two years, an argument AGAINST lineup effects. Or an argument that Carter is about as productive as Hubie Brooks is. Hubie Brooks is a very productive hitter, and is having a fine year batting cleanup for Montreal. And I never said Carter didn't help the others. Don't put words in my mouth and then criticize me for saying them. But you are completely wrong about the production of all four Met players. Strawberry is having a better years, and all the other three are down, except for Carter's HR rate: (these 1985 stats are as of 9/12; the 1985f stats are approximations to what they would have at the end of the season if their surrent averages continue) BA HR RBI Strawberry: 1984 .251 26 97 1985 .282 23 66 (don't forget he missed 7 weeks injured; prorating his stats over that time gives him about 34 HRs and 95+ RBI already) 1985f .282 27 77 (.282 38 113) Hernandez: 1984 .311 15 94 1985 .291 10 79 1985f .291 12 92 Foster: 1984 .269 24 86 1985 .254 17 66 1985f .254 20 77 Carter: 1984 .294 27 106 1985 .281 26 77 1985f .281 30 90
franka@mmintl.UUCP (Frank Adams) (09/24/85)
In article <453@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >By the way, if you still doubt the existence of lineup dependency (which >you undoubtedly still do) then answer the following question: > > If there were no lineup interaction, then all managers would bat their > best hitter first, then their second-best, etc. to give them the > most opportunities to hit. [...] You are mixing apples and oranges here. Of course lineup order in this sense matters: a walk followed by a home run is two runs, while a home run followed by a walk is one run. I thought (up to this point) that the discussion was about whether players hit better depending on where they bat in the order. I don't *think* anyone ever claimed that OBA and slugging pct give a complete description of a team's offensive abilities; just that they are the two best readily available statistics. Which they are. I am unconvinced by the Mattingly data. There is just not enough there to be statistically significant. On the other hand, batters definitely DO hit better with men on base. The book put out by the Elias Sports Bureau (it has their name in the title) has statistics on this for the entire major leagues last year. As I remember (the book is not here) the effect was about 20 points in terms of batting average. So clearly there is an advantage to batting after a player who gets on base a lot. Although the statistics for it are not available, it seems likely that this is enhanced when batting after good base stealers. I am much more dubious about the claimed advantages of batting *before* a good hitter. This very likely affects the number of walks a player gets (certainly the number of intentional walks, but probably others as well). I doubt it much affects the overall performance. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
dpb@philabs.UUCP (Paul Benjamin) (09/25/85)
Frank Adams writes: >In article <453@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >>By the way, if you still doubt the existence of lineup dependency (which >>you undoubtedly still do) then answer the following question: >> If there were no lineup interaction, then all managers would bat their >> best hitter first, then their second-best, etc. to give them the >> most opportunities to hit. [...] > >You are mixing apples and oranges here. Of course lineup order in this >sense matters: a walk followed by a home run is two runs, while a home >run followed by a walk is one run. I thought (up to this point) that >the discussion was about whether players hit better depending on where >they bat in the order. Not just where in the order, but who is batting ahead of them and behind them. And as you point out later in this posting, batters do bat better when men are on base, so I'm not really mixing apples and oranges at all - lineup order can affect personal stats. >I don't *think* anyone ever claimed that OBA and >slugging pct give a complete description of a team's offensive abilities; >just that they are the two best readily available statistics. Which they >are. You contradict yourself! You state that "of course lineup order matters in this sense", and then state that personal stats such as OBA and slugging are the best. How about a stat such as "how many runs you contribute to", measured by runs you score, drive in, or help advance the runners, or even better, how much better you are at that than others batting in similar positions? These stats are virtually impossible to compute from box scores, because so much information is lost, such as if anyone was in scoring position when a player made an out, or whether an out advanced a runner. In this sense, I can agree with you that OBA and slugging may be the best available, but I'm saying that this means that the available stats are not very good (we need some new categories). >I am unconvinced by the Mattingly data. There is just not enough there >to be statistically significant. Of course. But I didn't say that this proved conclusively that all players' stats are highly order-dependent. I just showed the existence of stats that support the belief in lineup dependency. Again, just because these stats are not often kept is not my fault. It's the old "garbage in, garbage out" phenomenon. If you only input personal stats into your model-generation process, then you will produce only models which emphasize individual performances, and of course, you will be able to find no evidence of interdependencies. To be able to find interdependency, you must consider stats which can reflect it. >On the other hand, batters definitely DO hit better with men on base. >The book put out by the Elias Sports Bureau (it has their name in the >title) has statistics on this for the entire major leagues last year. >As I remember (the book is not here) the effect was about 20 points in >terms of batting average. So clearly there is an advantage to batting >after a player who gets on base a lot. Although the statistics for it >are not available, it seems likely that this is enhanced when batting >after good base stealers. Great. I'd love to see what the effects are on slugging, RBI, R, etc. >I am much more dubious about the claimed advantages of batting *before* a >good hitter. This very likely affects the number of walks a player gets >(certainly the number of intentional walks, but probably others as well). >I doubt it much affects the overall performance. Again, one case I cite is the Pirates of the late 70's. Nobody wanted to pitch to Stargell with men on base, so, as you say, people in front of him were rarely walked. But this means that they saw more fastballs, and less nibbling around the corner of the plate. That gave good fastball hitters, like Madlock, more fat pitches. Note that the difference need be quite small to still produce a good effect. Over, say 500 atbats, say about 3000 pitches a season, a hitter in such a nice spot might get only 30 more fat pitches to hit (1%). This could lead to several HRs, doubles, more RBI, more R, and higher OBA and slugging. Again, I have no printed stats for the Madlock case. It is based on my personal observation at the time, which was that Madlock was put into the 6 spot when he was acquired, and became a steady .280 hitter. He was quoted at the time as saying he didn't care, as long as the team won. When he was moved to 3 (in front of Stargell) he immediately became the .320+ hitter he had been before. This is not the only case I know of. Repeatedly, in reading quotes of managers, I have run across things like "...he is a fastball hitter, so I put him in front of (big slugger), so he'll see more fastballs". Now, I haven't clipped and saved all these quotes, because I never saw myself getting into an argument about it, but my memory is actually quite good, and I'm sure that, if we keep our eyes open, we'll see more quotes like this. Finally (sigh!) note that if hitting can be affected by the player in front, then it means that it can be affected by the player behind, too. After all, if player A bats in front of player B, and B is known to hit much better when men are on base, then the pitcher can be expected to try very hard to keep A off the bases. Thus, B's presence affects A's stats. This is as opposed to the situation in which a strong hitter bats in front of a weaker hitter. The pitcher might not care whether the strong one is walked, because he is not afraid of the weak hitter, particularly if there are two out, so he avoids giving the strong hitter anything too good to hit.
franka@mmintl.UUCP (Frank Adams) (09/26/85)
In article <455@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >>You are mixing apples and oranges here. Of course lineup order in this >>sense matters: a walk followed by a home run is two runs, while a home >>run followed by a walk is one run. I thought (up to this point) that >>the discussion was about whether players hit better depending on where >>they bat in the order. > >Not just where in the order, but who is batting ahead of them and >behind them. And as you point out later in this posting, batters >do bat better when men are on base, so I'm not really mixing apples >and oranges at all - lineup order can affect personal stats. Yes, lineup order can affect personal stats -- but the paragraph above is NOT an argument to that effect. >>I don't *think* anyone ever claimed that OBA and >>slugging pct give a complete description of a team's offensive abilities; >>just that they are the two best readily available statistics. Which they >>are. > >You contradict yourself! You state that "of course lineup order matters >in this sense", and then state that personal stats such as OBA and >slugging are the best. How about a stat such as "how many runs you >contribute to", measured by runs you score, drive in, or help advance >the runners, or even better, how much better you are at that than >others batting in similar positions? These stats are virtually impossible >to compute from box scores, because so much information is lost, such as >if anyone was in scoring position when a player made an out, or whether >an out advanced a runner. In this sense, I can agree with you that >OBA and slugging may be the best available, but I'm saying that this >means that the available stats are not very good (we need some new >categories). Well yes, that's what I said. "The two best readily available statistics." In particular, if I know a player's on base and slugging averages, I don't much care what his batting average is. In fact, it is better if the batting average is lower, with the same on base and slugging averages. Yes, we do need better stats. I have my doubts about your proposal; it is highly lineup dependent. A batter will participate in more runs on a good offensive team than on a poor one -- this is the main problem with RBIs. (That assessment sounds harsher than I really mean it to be. This would be a useful statistic -- certainly better than "runs produced" or "game winning RBI. (I don't really understand why the statistic isn't "go ahead RBI" instead -- the batter putting his team ahead cannot be affected by whether they will stay ahead.) But if I could only get one statistic about a player, I would rather know the sum of his on base and slugging, than the number of runs (total, per game, or per at bat, your choice) that he contributed to.) If you can get a copy, do look the Elias book (_The_1985_Elias_Baseball_ Analyst_). It has batting and pitching statistics broken down by whether the bases are empty, leading off an inning, with runners on base, with runners in scoring position, and with runners in scoring position with two out. It also has statistics for batting in late inning pressure situations (defined as the seventh inning or later, with the player's team tied, behind by not more than three runs, or behind by four runs with the bases loaded), broken down similarily. It also has home vs away, grass vs turf, and day vs night breakdowns. Some other statistics I would like to see: how often does a runner take an extra base on hit? And how often is he out trying to do so? Another interesting statistic would be bases advanced out of the number possible (a grand slam is ten out of ten; a bases empty single is one out of four). The ratio of bases advanced to outs made would also be interesting. None of these statistics is perfect, of course. >>I am unconvinced by the Mattingly data. There is just not enough there >>to be statistically significant. > >Of course. But I didn't say that this proved conclusively that all >players' stats are highly order-dependent. I just showed the existence >of stats that support the belief in lineup dependency. Again, just >because these stats are not often kept is not my fault. Let me put that a bit differently. While Mattingly undoubtably hits better batting after Henderson (who has a very good on base percentage and fantastic speed), it is unlikely that the effect is anywhere near as large as in those sample statistics. And whoever hits after Henderson can expect an improvement. >Again, one case I cite is the Pirates of the late 70's. Nobody wanted >to pitch to Stargell with men on base, so, as you say, people in front >of him were rarely walked. But this means that they saw more fastballs, >and less nibbling around the corner of the plate. That gave good >fastball hitters, like Madlock, more fat pitches. Note that the >difference need be quite small to still produce a good effect. Over, >say 500 atbats, say about 3000 pitches a season, a hitter in such a >nice spot might get only 30 more fat pitches to hit (1%). This could >lead to several HRs, doubles, more RBI, more R, and higher OBA and >slugging. The player will have more strikes thrown at him (which tends to mean more fastballs). Since he is getting more good pitches, he is likely to hit for better average and power. But he can be expected to walk *less*, and thus have a lower on base average. If the on base average is truly higher in such a case, the opposing pitchers are making a mistake -- they should be pitching to the batter the same as they normally would. I am unconvinced that batters do significantly better on balance in such situations. This is one reason the on base and slugging averages make such a good pair. When a player is pitched to cautiously, the on base average goes up and the slugging average goes down. In the reverse case, the opposite happens. >Again, I have no printed stats for the Madlock case. It is based on >my personal observation at the time, which was that Madlock was >put into the 6 spot when he was acquired, and became a steady .280 >hitter. He was quoted at the time as saying he didn't care, as long >as the team won. When he was moved to 3 (in front of Stargell) he >immediately became the .320+ hitter he had been before. Again, the batting average I would expect to be affected. What happened to his on base average? It is certainly true that batting average is overemphasized in the baseball world as a whole. There are probably a good many managers and players who make this mistake. (Earl Weaver doesn't. Bill Madlock probably does.) Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
david@fisher.UUCP (David Rubin) (09/26/85)
[">>"," " = me,">" = (shudder) him :-)] Don't panic, folks! Only about half of it is new material! >You disregard my explanation of the events, and have not proven, in any >sense, that on-base average and slugging average are independent of >factors such as who is batting in front or behind you. I can demonstrate that OBA and SA together will predict very well the run production of a team. All I have demanded is the same standard of evidence be applied to lineup effects: that you demonstrate that the consideration of such an effect improve our ability to project or predict run production, and that you provide some rationale for it. You've done the latter without doing the former. >................ You embrace those stats without showing >that any strong correlation exists between them and scoring runs (or >more precisely, that a stronger correlation exists than for, say, the >stat R + RBI - HR.) I can demonstrate such a correlation for OBA and SA; for your benefit, I will post them (this weekend, probably). As for R+RBI-HR, please note that it adds nothing to our understanding of run production to predict runs using runs!!! We have already noted that R's and RBI's are heavily and DIRECTLY dependent on one's teammates' actions: if the question we are considering is how does an individual player contribute, we must free him from the burden/benefit of his teammates and figure out how much each of the events he could contribute ALONE (outs, walks, hits, etc.) would contribute to producing runs on some "typical" team. You, too, recognize this principle, as it is the rationale for your argument for "lineup effects". The problem is not one of goals, but of methods. "Lineup effects" are, I suggest, illusions caused by using the wrong statistics to evaluate offensive performance. It is because you are tied to measuring INDIIVIDUAL performance with tools meant to evaluate TEAM performance that you must deal with something as archane and diffuse as lineup effects; when one considers statistics that are not directly influenced by one's teammates, one finds that there is no discernable lineup effect. In other words, if you took a player who remains with one team over the course of his career's prime, you would likely find the player's RBI and R totals fluctuating with the team's fortunes (strongly correlated), but his SA and OBA fluctuating "randomly" (weakly correlated). If you were to focus on the RBI's, you would persuade yourself there was such a thing as "lineup effect", because RBI's measure what the guys in front of you did as well as what you did. If you looked at SA, you would remain agnostic concerning lineup effects, because SA appears to fluctuate with little regard to the quality of the team. >.......................................It is not just my responsibility to >prove that lineup dependencies exist. It is also yours to prove that >they don't! You can never prove something doesn't exist (how does one proceed in a disproof of existance?). It is considered sensible in most circles to keep one's explanation of events as simple as possible: we need only consider new factors if they somehow improve our understanding of events. It is therefore the burden of one who wishes to include an "effect" to show its inclusion improves our knowledge or understanding, for if we can do as well without it, we have no reason to use it. >> I shall limit myself, therefore, to the general comment (call it >> Rubin's Law of Empirics, if you will) that HAVING A PLAUSIBLE >> EXPLANATION FOR AN ANTICIPATED EFFECT IS NOT EVIDENCE THAT THAT EFFECT >> HAS ACTUALLY OCCURRED. >Perhaps you should take a course or two in prob&stat and learn the >actual laws, instead of making up your own. Are you serious? Taking shots at my statistical sophistication is inappropriate (as well as incorrect), and serves only as an excuse for ignoring the truth of the statement that I referred to, tongue-in-cheek, as Rubin's Law of Empirics. >> All of Paul's explanations mean little, >> therefore, until he establishes that what his explanations explain has >> indeed happened! Only in the case of Mattingly does he attempt to >> actually demonstrate that a lineup effect exists, and I will therefore >> concentrate on it. Elsewhere, he merely shows lineup effects are >> consistent with his selected observations without either showing other >> explanations are inconsistent or that the observations would be >> inexplicable without lineup effects. >This is exactly the point I have been making for weeks. "it may be >misleading unless we know that the circumstances of the two categories >are otherwise similar..." Two players for different teams do not >satisfy this criterion, and thus their stats are not directly >comparable. For example, many, including myself, like Guerrero for >the MVP, but I don't favor him because he leads the NL in slugging >and on-base average. Those stats are irrelevant, since you can't compare >them to, say, Dale Murphy's stats. Why not? It's simple. 18 times >a year, Dale Murphy has to face the great Dodger pitching staff, which >is clearly the best in the league, while Guerrero faces the Braves' staff, >which is one of the worst. That's over 11% of the season. This is in >addition to other differences, such as the number of day/night games, >the different stadiums they play in, the number of double-headers they >play in, the number of day games after night games, etc. They don't even >play the exact same other teams, either! After all, if a team played >most of its games against Philadelphia earlier in the year, they faced a >much easier opponent than a team whose schedule calls for them to >face Phila now. The reverse is true for the Cubs. Playing them >before all their starters were injured is different than playing >them afterwards. Strange, but when I did try to adjust for these effects in the Carter-Pena discussion, you protested vociferously! I am all for adjusting for effects whose existance is demonstrable, and thus had called earlier for the inclusion of Palmer's "Park Factor", which considered, explictly or implicitly, park dimensions, day/night balance at the home field, and the quality of the hitter's own pitching staff. If it could be shown that some complex scheme to correct for the changing quality of the opposition is necessary (most teams remain about as talented in August as they were in May, and the ones that don't may not have a substantial effect), I would certainly entertain that correction. At the time I first brought up the matter of such adjustments, you held your hands up to your ears and screamed that he didn't want to hear about such stuff; as those factors did not strongly affect the relative offensive merits of Carter and Pena, I didn't press the issue then. Naturally, I'm stunned by your reversal; stunned, but not surprised. Incidentally, it is likely that Murphy derives more benefit from Fulton County Stadium than Guerrero derives from not having to face his own staff; adjusted statistics will likely favor Guerrero even more than the unadjusted ones do! >So, unless you can correct for ALL these factors, and others, to >ensure that your circumstances are similar, all the analyses that >you have posted are "worthless (possibly even worse: (they) may be >misleading". I will adjust for all the factors that can be demonstrated; "adjusting" for a factor that has not been demonstrated (and therefore cannot be quantified) is a theological exercise. Rather than asking ourselves how much a factor affects our statistics, we wind up asking ourselves how much we BELIEVE a factor affects our statistics. >The only attempt you have made to correct your stats is to include >a ratio which takes into account the differences between stadiums, >and how hard they are for hitters. You did not read, then, how the "Park Factor" was derived. Tsk, tsk. It measured how difficult it was to produce runs in a particular park, and therefore implicitly considered dimensions, elevation, day/night games, etc, etc, and corrected for the prowess (or lack thereof) of the home staff. >................................But even this attempt showed your >statistical inexperience. Saying, for example, that park A is 10% percent >harder to hit in than park B because the overall averages (of say, slugging) >are 10% lower, is a valuable and meaningful stat when applied to the whole >group of hitters - it provides information on the park to its owners. >But it is TOTALLY MEANINGLESS to apply this stat to individual batters >in this park. One must also know the shape of the distribution. It could >be that almost nobody hits 10% worse in that park - that many hit much worse >or better, and it averages out to 10%. For example, if a country's families >have 2.3 children on the average, it doesn't mean that anyone has 2.3 >children, or even that most families have 2 or 3 children. Bivariate >distributions are not uncommon, and in these, almost noone is around >the mean. You are correct, but need not worry. It is necessary to check that the detriment/advantage supplied by a home park effects the players equally (or that deviations from equality are random, rather than systematic). You will be pleased, therefore, to hear that such deviations are binomially/normally distributed, and that where individual players fall on these distributions appears random, and that the distribution of ALL players is tighter once these effects are taken out. >Furthermore, the reason that I use only Mattingly is that these stats >are rarely available. It's much easier to compute personal averages >such as batting average, slugging average, runs, RBI, etc. than to >compute how much a batter tends to improve the stats of those batting >ahead of him or behind him, etc. We almost never see these stats. We >don't often enough see stats such as batting average with runners in >scoring position, etc. You criticize me for the deficiencies of baseball >statisticians everywhere. It's not my fault, so don't criticize me >for it. I know it's not your fault. I'd love to see such breakdowns myself; supposedly, that's what Bill James's "Project Scoresheet" is in the process of doing. If this enlarged data base should provide someone with the means to prove a "lineup effect", I will change my tune, naturally. However, it is difficult (and unwise) to believe a dramatic effect could "hide" in currently available data; if there is some "lineup effect", it likely to far smaller (and possibly even of a far different nature) than you believe. >> Moreover, even if Paul COULD assure us that this was so, he does not >> have nearly enough data. Examine, in particular, the data for batting >> second: it is based on 35 games, i.e. about 100-150 at bats. Most >> fans will not put much store in a player's average after 35 games >> (early May), and for good reason: the player has not yet accumulated >> enough at bats for us to form any reasonable opinion as to his likely >> seasonal productivity. We are talking about guessing whether a player >> is hitting .300 or .400 based on that many at bats: it would not be at >> all unusual for the difference (10 to 15 hits) to be due to a "hot" or >> "cold" streak (what Statisticians conveniently label "random", but we >> may understand as being that which is beyond our knowledge). We would >> need to have many more at bats (perhaps in a couple of more seasons we >> will) before we could say that the difference is due to the position >> in the lineup rather than a propitious hot streak. To put it another >> way, if a lifetime .300 hitter were to have a .400 average on May 5th, >> would you tentatively conclude (until further info was available) that >> the man would bat .400 for the season? Of course not. You would >> correctly conclude that he is more likely to hit .300 from June >> through September than .400. He may just have had a good April... >..............................................Mattingly's hot >stats for the second position were not compiled in one streak... >....................................So, it is NOT the result of a hot streak. You misunderstand what I mean by a "hot" streak. I mean any random fluctuation caused by the smallness of the sample. Perhaps I should have clarified as follows: pick, at random, ANY 120 of Mattingly's AB's. Calculate his BA. For all his AB's, his BA is about .320 (last time I looked). You will find, if you do this, say, 100 times, that five to ten times you will get a .400+ BA for Mattingly. In other words, there's a greater than 5% chance that any particular .320 hitter will hit over .400 for 120 RANDOM at bats; if we check out, say, 20 major leaguers who are batting between .300 and .340, and pick 120 at bats for each of them at random, we would only be surprised if NONE of them hit over .400 during that span. Since it is likely this is what TSN did, intentionally or unintentionally, the statistic carried nothing to contradict my inclination that the .400 average meant nothing. >As for right-handed vs. left-handed opposition, I checked the games from >August 5 on. There were both right-handed and left-handed opponents. >He is playing full-time in that spot, so he faces all types of pitching. >Martin moved him to 2 on August 5 because of his excellent production >in that spot before. The question, though, is whether Mattingly faced lefty/righty pitching in the same proportion as a #2 hitter as he did as a #3 hitter. Moreover, I don't think he is REGULARLY batting #2, as every time I watch the Yankees (about half a dozen times in the past month), he's batting third. Too bad, too: if he batted second the rest of the way, we may have had enough data to say something about Mattingly in '85. >> Even if it were established for Mattingly, it would hold only for >> Don Mattingly with the current Yankees: to apply it to, say, Tony >> Pena, it would have to be demonstrated for a wide variety of players on >> a wide variety of teams. Still, it would be quite a surprise to me if >> anyone could get even that far. >I see! Whenever I come up with evidence, it counts only for that >case, but you have never detailed an instance of a player changing, >say, his lineup position and keeping the same OBA and slugging pct., I could come up with a wide variety of such instances. Shall I spend an hour with my Baseball Encyclopedia? To establish a general principle, I won't require that you prove it in every instance, but I don't feel I'd be unreasonable to remain dubious even if it were proved for one. >EVERYBODY knows that lineups are interdependent! "Everybody" "knows" this, because "everybody" evaluates players on the basis of RBI's and R's. Yes, lineups are interdependent in scoring runs. No, lineups do not substantially effect INDIVIDUAL performance. That Fred Xyzz is more likely to bat in a run with a runner on first is what "everybody" DOES know; that Fred Xyzz is more likely to hit a double with a runner on first is something that is not known by "everybody"; certainly, it is not yet known by me. >.......................................................Try watching a game >sometime (instead of just reading numbers). I watch, on TV or in person, about 100 games a season. >.................................................You'll see that when a runner >is on base, it affects (among other things): > 1) the way the pitcher throws. Using the stretch instead of a full > windup definitely hurts most pitchers' performances. Otherwise, > there would be no need for anyone to ever windup. > 2) the pitch selection; > 3) the defensive alignment. No doubt, but none of these things is done often enough to substantially effect a player's OB or SA. Let's say, for example, that a player gets 500 AB's. On a really lousy team, there's a runner on when he bats, say (these are only guesses; if you have the real numbers, go ahead and substitute, as I doubt that I am SO far off as to invalidate my argument) 25% of the time, while with a really good team, it might be 50% of the time. The lucky player gets an extra 125 AB's with runners on. Consider #3; this lucky player, if he's a right-handed pull or straight-away hitter gets the secondbaseman in a position where the second baseman is less likely to make the play. Let's say he is a contact-hitter who NEVER strikesout, and he hits lots of groundballs, with few down the line. Then he might hit a groundball toward the secondbaseman about 20% of the time, and the secondbaseman may now convert only two thirds of them, rather than three quarters of them, into outs. So we have 125*.2*(.75-.67) is an extra four or so singles over the course of the season. If the batter in question strikes out some, or hits a lot of fly balls, than the difference is even less. Of course, with 125 extra shots at an RBI, THAT total will rise substantially. I could argue similarly on the other points. My point is not that these things are fiction, only that it is unlikely that they SUBSTANTIALLY affect a player's SA or OBA. The numbers I use are unimportant; what is important is the plausibility of my argument that we ought to be careful not to confuse existance with significance. >By the way, if you still doubt the existence of lineup dependency (which >you undoubtedly still do) then answer the following question: > If there were no lineup interaction, then all managers would bat their > best hitter first, then their second-best, etc. to give them the > most opportunities to hit. Thus, according to your criteria (OBA and > slugging pct), the way to optimize the team's OBA and slugging pct > is to bat the best in these categories first, the next-best second, > etc. We would see Carter batting leadoff for the Mets, and Coleman > would not be the leadoff hitter for St. Louis, McGee would be, followed > by Clark. Coleman would be somewhere around 6 or 7. Come to think of it, > since Cedeno has been playing for the Cards and the way he has been > hitting, he would be batting leadoff. Also, Guerrero would be > hitting leadoff for LA (absurd!). There is lineup interaction on a team's run production; I only deny its significance in judging individual performance. Using my criteria, a manager would be disposed to bat his top OBA men near the top of the order and his top SA men in the middle. I would not have Carter bat leadoff (you are a silly one, aren't you?), but I would drop Wilson from the top of the order. I would also switch Coleman and McGee around, but as long we were going with two table setters, both would be secure near the top of the order. You are confused: I do NOT say that lineup doesn't affect a team's performance, only that it has precious little effect on an individual's performance. > As ANY real baseball fan knows, managers carefully > pick the order to help run production, e.g. alternating left-handed > and right-handed batters, and putting speedsters in front of hitters > who hit well with men in scoring position. WHY WOULD THEY BOTHER TO > DO THIS IF THERE WERE NO LINEUP INTERACTION??? Why not bat Mattingly > leadoff, to get him more atbats? Maybe the fact that he would be > batting behind a much weaker hitter just MIGHT have a teeny-weeny > little bit to do with it?! Nyahh. The reason that we don't bat Mattingly lead-off is not that we fear his production will drop, but because we fear his production will be wasted. There is a difference. > Thus, we see that some excellent managers, such as Whitey Herzog, > deliberately put a player like Coleman, who has a lower OBA and > slugging average than McGee, in the spot where he will get the most > at-bats, thus effectively reducing the overall OBA and slugging pct of > his team. Do you really think he is deliberately reducing the run-scoring > ability of his team? Or do you just think that all these baseball > professionals are sadly misguided? I think Herzog is making a mistake. Not a big one, but probably one that will cost him a few runs over the course of the season. Herzog is not sadly misguided, just slightly in error. Herzog makes mistakes, Benjamin makes mistakes, even Rubin makes mistakes! That we HOPE that Herzog makes them less frequently is no guarantee of his infallibility. I vaguely recall Herzog being fired from a couple of jobs. Perhaps he did make mistakes...or do you believe that the professionals running the Rangers and the Royals did?? Some of these professionals must have erred if a firing was necessary.....Of course, you will argue that Herzog knows so much, I cannot question him. Thus I ask you: if there thirty professional managers who, in a given situation, would do ten different things, does that make most of them "sadly misguided"? Of course not; men of good faith can disagree without calling one another idiots. I reserve the phrase "sadly misguided" for those who will not even examine alternatives. Maybe Benjamin would call me sadly misguided for batting McGee ahead of Coleman, but I doubt that Herzog would do so. As for team OBA/SA vs. individual OBA/SA, see below. > The only other alternative is > that TEAM RUN-SCORING ABILITY IS NOT DIRECTLY CORRELATED WITH > TEAM OBA OR SLUGGING, i.e., these stats aren't all you crack them > up to be. There must be other factors, e.g., speed. There are other factors. They just don't provide many runs that are not already accounted for by OB and SA. Coleman has stolen 100+ bases, and has been caught, say, 30 times. By the best estimate available, Coleman's base stealing has given the Cardinals an extra .3*(100-2*30)= 12 runs. (The forumula was empirically derived; it is how many runs an average team would gain if a player had 100SB, 40CS rather than just wait on first for the next player to put the ball in play. How many runs he has meant to the Cards this year may be somewhat higher (or lower), but we who do not have score sheets for all Card games cannot otherwise make a better guess. Of course, anyone who knows how many CS Coleman has can improve matters by substituting for my guess) > To rephrase this point, so that you will have less chance of > misinterpreting it, if Guerrero's slugging avg and OBA are what > are most important to the Dodgers, then he should bat leadoff, > so as to maximize the team's slugging avg and OBA. He doesn't, > and the very idea seems preposterous. Either Lasorda doesn't > understand the game as you do, or your emphasis on OBA and > slugging is wrong. Which is it? Lasorda and I both agree that much of Guererro's SA will be wasted if there are no men on base. As Lasorda and I have found that it's far easier to scrape someone up who has a decent OBA then it is to get someone who has a good SA, we both place a greater premium on Guererro's power. Certainly, it is NOT true maximizing a team's OBA and/or SA is the SAME as maximizing the teams run production, and I have never said that it was. I have suggested it's pretty darn close, though. The relationship between team OBA, SA, and run production is close, but not exact. It would cost the Dodgers some runs to bat Guerrero lead-off, but not because Guerrero wouldn't be a good lead off man. You've merely shown that OBA, SA, and runs are not identical: another straw man bites the dust! >The lineup can even affect the selection of relief pitchers. And haven't >you ever heard a manager say that what he really needs is a left-handed >power-hitter (or more speed in the lineup, etc.)? Why are these things >important to managers if the players in lineups don't interact? Again, you misunderstand what I am saying. The new left-handed power hitter may see big changes in his RBI totals, and his new team may see a surge in runs scored, but the new player is unlikely to see any substantial change in his OBA and SA, once those two are properly adjusted. >> I suppose Paul believes Carter has a special dispensation: in >> moving from the Expos to the Mets, he gains by being surrounded by >> Keith, Darryl, and George, while those three do NOT gain from Gary's >> presence. The fact is, the production of all four has remained about >> the same over the past two years, an argument AGAINST lineup effects. >Or an argument that Carter is about as productive as Hubie Brooks is. Correct. It says a lot about lineup effects if they indicate that Carter is about the same hitter as Brooks is. It says just how off the wall they are... Of course, I should have expected this. Brooks is about as productive a player as Pena, and so Paul must assert that Brooks is about on par with Carter. That is, of course, why the Mets were obliged to throw in Youmans, Fitzgerald, and Winningham into a deal involving palyers of equal value. Well, Paul, if you're right, the Mets and Expos managements must be mistaken about Carter's value vis a vis Brooks. So you, too, find yourself in contradiction with baseball "authority". Let us all savor this moment: it is as if the Pope were found guilty of heresy! >Strawberry is having a better years, and all the other three are down, >except for Carter's HR rate: >(these 1985 stats are as of 9/12; the 1985f stats are approximations to >what they would have at the end of the season if their surrent averages >continue) > BA HR RBI >Strawberry: > 1984 .251 26 97 > 1985 .282 23 66 > 1985f .282 27 77 (.282 38 113) >Hernandez: > 1984 .311 15 94 > 1985 .291 10 79 > 1985f .291 12 92 >Foster: > 1984 .269 24 86 > 1985 .254 17 66 > 1985f .254 20 77 >Carter: > 1984 .294 27 106 > 1985 .281 26 77 > 1985f .281 30 90 Read my lips: I HAVE NEVER NEVER NEVER NEVER DENIED THAT LINEUPS EFFECT RBI'S!!!!! To show an increase in RBI's shows the TEAM has had a better (or worse) year, not that the player has had a better or worse year. Looking at the specifics, you'll find that Hernandez, Carter, and Foster all show something of a drop-off from the last two years and Strawberry shows a definite improvement (this is apparent when looking at SA and OBA; BA and HR give us a glimpse of it). You would argue that Carter's introduction strengthened/weakened the Mets' lineup, but this would lead to general rise/fall in INDIVIDUAL production. There is no such general rise/fall; as a GROUP, one would be hard pressed to say the four were doing better or worse than last year. What we do see is (1) Carter and Hernandez are having "typical" seasons. Their slight drop is due to the fact they both had outstanding seasons the previous year. (2) Foster fell off a bit. This is expected from 36 year olds. (3) Strawberry has improved. He was expected to, with or without Carter, and will likely further improve next year. Fact is, these are the kind of outputs we would have expected from all four had Carter remained in Montreal....... David Rubin {allegra|astrovax|princeton}!fisher!david P.S. Remember, Paul, that I deny lineup effects only with regard to SA and OBA, and that are argument is over INDIVIDUAL, not team, production. Repeat this to yourself five times before you write a rebuttal. P.P.S. Paul, you also dropped a lot of smiley faces, e.g. when you declared me to be statistically naive and understanding baseball as well as a Martian. Fortunately, I KNOW you didn't mean to insult, but shouldn't you be more careful for the sake of others who are not as intimately familiar with your tolerant nature? P.P.P.S. There is something called Linear Weights that does even better with run production that OBA and SA; it includes things you object to having left out, such as SB's. It is a SLIGHT improvement, while being a GREAT increase in complexity. The increased complexity, in my view, is too great to be justified by this slight improvement. You may well think otherwise.
dpb@philabs.UUCP (Paul Benjamin) (09/27/85)
> >How about a stat such as "how many runs you > >contribute to", measured by runs you score, drive in, or help advance > >the runners, or even better, how much better you are at that than > >others batting in similar positions? These stats are virtually impossible > >to compute from box scores, because so much information is lost, such as > >if anyone was in scoring position when a player made an out, or whether > >an out advanced a runner. In this sense, I can agree with you that > >OBA and slugging may be the best available, but I'm saying that this > >means that the available stats are not very good (we need some new > >categories). > > Well yes, that's what I said. "The two best readily available statistics." > In particular, if I know a player's on base and slugging averages, I don't > much care what his batting average is. In fact, it is better if the batting > average is lower, with the same on base and slugging averages. Well, I think that is lineup dependent! Specifically, a cleanup hitter should get a lot of hits - his OBA is not terribly important. He's supposed to be driving in runs. A perfect example of this is Jason Thompson. His OBA is among the best in the league, but his BA is low. I would much rather see a higher BA, even at the cost of a lower OBA. He just doesn't drive in runners. So who cares if we walks that much (he was leading the NL the last time I saw the numbers) - that just passes the RBI duty along to #5, and the Pirates haven't had a good #5 in a long time (George Hendrick??). But I basically agree with you. > Yes, we do need better stats. I have my doubts about your proposal; it is > highly lineup dependent. A batter will participate in more runs on a good > offensive team than on a poor one -- this is the main problem with RBIs. > (That assessment sounds harsher than I really mean it to be. This would > be a useful statistic -- certainly better than "runs produced" or "game > winning RBI. (I don't really understand why the statistic isn't "go ahead > RBI" instead -- the batter putting his team ahead cannot be affected by > whether they will stay ahead.) But if I could only get one statistic about > a player, I would rather know the sum of his on base and slugging, than > the number of runs (total, per game, or per at bat, your choice) that he > contributed to.) Specifically, what I had mentioned in previous postings, but had not repeated, (the posting was long enough!) was the percentage of a team's runs that a player figures in. I think this might be a very meaningful stat, particularly where MVP awards are being discussed. This would parallel the hockey stat, in which we can see, for instance, someone like Gretzky participating in a very high percentage of his team's goals. > > If you can get a copy, do look the Elias book (_The_1985_Elias_Baseball_ > Analyst_). It has batting and pitching statistics broken down by whether > the bases are empty, leading off an inning, with runners on base, with > runners in scoring position, and with runners in scoring position with > two out. It also has statistics for batting in late inning pressure > situations (defined as the seventh inning or later, with the player's > team tied, behind by not more than three runs, or behind by four runs > with the bases loaded), broken down similarily. It also has home vs > away, grass vs turf, and day vs night breakdowns. Thanks for the suggestion. > Some other statistics I would like to see: how often does a runner take > an extra base on hit? And how often is he out trying to do so? Another > interesting statistic would be bases advanced out of the number possible > (a grand slam is ten out of ten; a bases empty single is one out of four). > The ratio of bases advanced to outs made would also be interesting. > None of these statistics is perfect, of course. No, no stat is. But I like the way you think. The idea of bases advanced seems nice to me. I am particularly thinking about people who get singles with the bases loaded instead of solo HRs. I have always felt that they got the short end of the stick statistically, particularly from people like David Rubin, who ignore such things as R and RBI. The single with the bases loaded contributes, say, 5 bases, and the solo HR contributes 4. It doesn't seem quite fair to give the HR 4 bases and the single 1 (in slugging avg), when the TIMING of the hit is all important. By the way, when you read "solo HR" above, read "Gary Carter". Almost all of his HR's are solo - thus his low RBI total (he doesn't get many RBI singles, either.) In his recent HR binge, he hit 9 HR, with 15 RBI. Actually, he had 3 HR, 6 RBI in one game, and 6 HR, 9 RBI during the rest of the streak. If you look around the league, you will find other players who go on RBI binges, and do it without that many HRs. But they don't get the slugging average boost that HRs give - Carter picked up about 45 points on his season average in 2 games! (5 HRs) > >>I am unconvinced by the Mattingly data. There is just not enough there > >>to be statistically significant. > > > >Of course. But I didn't say that this proved conclusively that all > >players' stats are highly order-dependent. I just showed the existence > >of stats that support the belief in lineup dependency. Again, just > >because these stats are not often kept is not my fault. > > Let me put that a bit differently. While Mattingly undoubtably hits > better batting after Henderson (who has a very good on base percentage > and fantastic speed), it is unlikely that the effect is anywhere near > as large as in those sample statistics. And whoever hits after Henderson > can expect an improvement. I stated in that posting that the magnitude of the difference was undoubtedly exceptional. I agree with you, that whoever hits after Henderson (Coleman, etc.) can expect an improvement. > This is one reason the on base and slugging averages make such a good pair. > When a player is pitched to cautiously, the on base average goes up and > the slugging average goes down. In the reverse case, the opposite happens. It's true that they complement each other well. But they are both terribly inadequate to begin with, so who cares?
franka@mmintl.UUCP (Frank Adams) (10/01/85)
[Not food] In article <458@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >> >How about a stat such as "how many runs you >> >contribute to", measured by runs you score, drive in, or help advance >> >the runners, One question. Suppose the lead off batter singles and steals seconds. The next two batters make outs. The cleanup hitter walks. The number five hitter singles, bringing in the runner from second. Finally, the number six hitter strikes out. Shouldn't the cleanup hitter get credit for "contributing" to the run? If he had made an out, it wouldn't have scored. >> In particular, if I know a player's on base and slugging averages, I don't >> much care what his batting average is. In fact, it is better if the batting >> average is lower, with the same on base and slugging averages. > >Well, I think that is lineup dependent! Specifically, a cleanup hitter should >get a lot of hits - his OBA is not terribly important. Yes, but if you fix the OBA and SA, and decrease the BA, he gets more extra base hits. This is likely to mean more RBI, not fewer. >He's supposed to >be driving in runs. A perfect example of this is Jason Thompson. His OBA >is among the best in the league, but his BA is low. I would much rather >see a higher BA, even at the cost of a lower OBA. He just doesn't drive >in runners. So who cares if we walks that much (he was leading the NL the >last time I saw the numbers) - that just passes the RBI duty along to #5, >and the Pirates haven't had a good #5 in a long time (George Hendrick??). If your number five hitter can't drive in runs, don't blame it on the cleanup hitter. And do you really want a higher BA with the same SA? Also, I don't have the statistics handy, but I believe Thompson scores a fair number of runs. It doesn't matter whether they are scored the way they are "supposed" to be. >It's true that they complement each other well. But they are both terribly >inadequate to begin with, so who cares? This is where we disagree. I would say "reasonable but not ideal", not "terribly inadequate". By the way, an interesting statistic from the Elias book: looking at all teams in the majors, the most runs per inning and the greatest chance of scoring in an inning occurs when the number 3 hitter leads off the inning. This suggests that the "traditional" batting order may not be the best after all. Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108
david@fisher.UUCP (David Rubin) (10/03/85)
Frank Adams has kept the issues clear. I'd like to comment on one of his contributions: > On the other hand, batters definitely DO hit better with men on base. > The book put out by the Elias Sports Bureau (it has their name in the > title) has statistics on this for the entire major leagues last year. > As I remember (the book is not here) the effect was about 20 points in > terms of batting average. So clearly there is an advantage to batting > after a player who gets on base a lot. Although the statistics for it > are not available, it seems likely that this is enhanced when batting > after good base stealers. What this says is that if a player played on a team that had a runner on every time he hit, he could expect to hit 20 points better than if he never had a runner on. Applying this to my rough guess that the best teams have runners on about half the time, and the worst about a quarter of the time, the advantage to be gained is no more than 20*(.5-.25) = 5 BA points. Exactly what I mean when I suggest that the difference is not something we ought to lose sleep over... David Rubin
dpb@philabs.UUCP (Paul Benjamin) (10/03/85)
Frank Adams writes: > In article <458@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: > >> >How about a stat such as "how many runs you > >> >contribute to", measured by runs you score, drive in, or help advance > >> >the runners, > > One question. Suppose the lead off batter singles and steals seconds. > The next two batters make outs. The cleanup hitter walks. The number > five hitter singles, bringing in the runner from second. Finally, the > number six hitter strikes out. Shouldn't the cleanup hitter get credit > for "contributing" to the run? If he had made an out, it wouldn't have > scored. But it's an awfully small contribution. It's more of a non-negative act, not a positive one, so I can't see really giving him a positive stat. But you are right in general - evaluating individual contributions and separating them from team scoring is difficult. >>> In particular, if I know a player's on base and slugging averages, I don't >>> much care what his batting average is. In fact, it is better if the batting >>> average is lower, with the same on base and slugging averages. >> >>Well, I think that is lineup dependent! Specifically, a cleanup hitter should >>get a lot of hits - his OBA is not terribly important. > > Yes, but if you fix the OBA and SA, and decrease the BA, he gets more extra > base hits. This is likely to mean more RBI, not fewer. > Maybe. In some situations, yes, but sometimes, no. We are actually talking about different things, though. I was making the statement that I'd prefer to see a cleanup hitter have a higher BA at the expense of his OBA. > >He's supposed to > >be driving in runs. A perfect example of this is Jason Thompson. His OBA > >is among the best in the league, but his BA is low. I would much rather > >see a higher BA, even at the cost of a lower OBA. He just doesn't drive > >in runners. So who cares if we walks that much (he was leading the NL the > >last time I saw the numbers) - that just passes the RBI duty along to #5, > >and the Pirates haven't had a good #5 in a long time (George Hendrick??). > > If your number five hitter can't drive in runs, don't blame it on the > cleanup hitter. And do you really want a higher BA with the same SA? > Also, I don't have the statistics handy, but I believe Thompson scores > a fair number of runs. It doesn't matter whether they are scored the > way they are "supposed" to be. > There's more to the Jason Thompson story than this, though. He is guilty of failing to drive in runs in many situations where he has the opportunity. He strikes out or pops up when the runners are in scoring position, and gets his hits when the bases are empty. I am saying that he doesn't make up for this annual lack of production by getting walks (he has a great batting eye). He may have the great batting eye, and not swing at many bad pitches, but he doesn't do too well with the good ones. The walks may be crucial for 1 or 2 or 3 hitters, but you like to see your 4-5-6 hitters driving in the runs. He doesn't do this too often, certainly not in correlation with his OBA+SA. > >It's true that they complement each other well. But they are both terribly > >inadequate to begin with, so who cares? > > This is where we disagree. I would say "reasonable but not ideal", not > "terribly inadequate". > > By the way, an interesting statistic from the Elias book: looking at all > teams in the majors, the most runs per inning and the greatest chance of > scoring in an inning occurs when the number 3 hitter leads off the inning. > This suggests that the "traditional" batting order may not be the best > after all. > But if you move the 1 and 2 hitters to 7 and 8, say, and improve the runs scored in the innings in which 3 leads off, you may lose runs in the innings in which 1 or 2 lead off. The net may be worse than originally. You have to be VERY careful of this kind of stat. For instance, I remember reading the obvious stat that more runs score with 1 or 2 outs than with no outs (obvious when you think about it). This shouldn't be interpreted as meaning that, for example, a batter should intentionally strike out with the bases loaded and no outs, to improve his team's odds of scoring! This is just another example of the dangers of trying to separate stats from their context. Just because a stat can be computed (in this case, the odds of scoring when a particular batting position leads off the inning) and just because it has a correlation with team scoring, doesn't mean that it corresponds to anything in the real world. Interpretation is everything with statistics. Unfortunately (or fortunately, depending on your philosophical inclination) interpretation is a subjective art. A person without any real baseball knowledge might reasonably infer that the batter in the above situation should intentionally strike out. It requires knowledge of the real world to see why this stat occurred, and to understand the situation. A similar case leaps to mind. Several years ago, someone published an analysis of football in Sports Illustrated. I think his name might have been Goode, or something. Anyway, he showed that the single stat with the highest correlation to winning was the number of rushing attempts per game. Thus, he concluded, the running game was the most important aspect of football, and furthermore, it wasn't so much the yardage gained, as the number of attempts that mattered. But given a little knowledge about the real world of football, another interpretation is easily possible: teams that already have a game wrapped up tend to run the clock out by running the football. They don't care at this point about yardage, first downs, etc. This inflates the rushing attempts, and could account for the high correlation with winning, since teams who are losing won't resort to this strategy. But this says nothing about how the winning teams got so far ahead. They might not have done it with a strict running game. They might have mixed things up a lot. So, the analyst should have recomputed his data, ignoring what happened after a team had already built a good lead. This may have led to different results. Now, this is strongly reminiscent of attempts to come up with one stat, say OBA+SA, and correlate it with team runs (even though the correlation has not been mathematically shown yet.) This example also shows why I insist on depending upon expert advice, rather than our own interpretation of the stats. The baseball experts know much more than we do, and can possibly give completely different interpretations to the numbers. I am not a baseball expert, since I have never played or coached professionally. Neither is anyone else on this net, to my knowledge.
dpb@philabs.UUCP (Paul Benjamin) (10/09/85)
> Frank Adams has kept the issues clear. I'd like to comment on one of > his contributions: > > > On the other hand, batters definitely DO hit better with men on base. > > The book put out by the Elias Sports Bureau (it has their name in the > > title) has statistics on this for the entire major leagues last year. > > As I remember (the book is not here) the effect was about 20 points in > > terms of batting average. So clearly there is an advantage to batting > > after a player who gets on base a lot. Although the statistics for it > > are not available, it seems likely that this is enhanced when batting > > after good base stealers. > > What this says is that if a player played on a team that had a runner > on every time he hit, he could expect to hit 20 points better than if > he never had a runner on. Applying this to my rough guess that the > best teams have runners on about half the time, and the worst about a > quarter of the time, the advantage to be gained is no more than > 20*(.5-.25) = 5 BA points. Exactly what I mean when I suggest that > the difference is not something we ought to lose sleep over... > > David Rubin Another case of bad reasoning. This may be the average over all players, but certain players bat much more than 20 points better with men on base. For examples, Boggs batted .418 this year with runners in scoring position. This is about 45 points above his overall average, and about 55-60 points above his average with the bases empty. I certainly don't lose sleep over this, but it is significant.
dday@gymble.UUCP (Dennis Doubleday) (10/15/85)
In article <472@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: >> he never had a runner on. Applying this to my rough guess that the >> best teams have runners on about half the time, and the worst about a >> quarter of the time, the advantage to be gained is no more than >> 20*(.5-.25) = 5 BA points. Exactly what I mean when I suggest that >> the difference is not something we ought to lose sleep over... >> >> David Rubin > >Another case of bad reasoning. This may be the average over all players, >but certain players bat much more than 20 points better with men on >base. For examples, Boggs batted .418 this year with runners in scoring >position. This is about 45 points above his overall average, and about >55-60 points above his average with the bases empty. I certainly don't >lose sleep over this, but it is significant. I hesitate to stick my nose into this (and I am not taking sides) but let me make one point about men on base. If Wade Boggs did the above (I don't question it), couldn't this say at least at much about the pitchers he faced in those situations as it does about him? Boggs is much more likely to come to the plate with men on base against, say, Dennis Martinez than he is against, say, Ron Guidry. The simple reason is that *everybody* on the Red Sox is likely to get more hits (and thus be on base more) against the inferior pitchers. And so Wade Boggs is more likely to bat with men on against Dennis Martinez and with the bases empty against Ron Guidry. This might go a long way toward explaining the differential. -- UUCP: {seismo,allegra,brl-bmd}!umcp-cs!dday Dept. of Computer Science CSNet: dday@umcp-cs University of Maryland ARPA: dday@maryland College Park, MD 20742 (301) 454-4247
dpb@philabs.UUCP (Paul Benjamin) (10/16/85)
> In article <472@philabs.UUCP> dpb@philabs.UUCP (Paul Benjamin) writes: > >> he never had a runner on. Applying this to my rough guess that the > >> best teams have runners on about half the time, and the worst about a > >> quarter of the time, the advantage to be gained is no more than > >> 20*(.5-.25) = 5 BA points. Exactly what I mean when I suggest that > >> the difference is not something we ought to lose sleep over... > >> > >> David Rubin > > > >Another case of bad reasoning. This may be the average over all players, > >but certain players bat much more than 20 points better with men on > >base. For examples, Boggs batted .418 this year with runners in scoring > >position. This is about 45 points above his overall average, and about > >55-60 points above his average with the bases empty. I certainly don't > >lose sleep over this, but it is significant. > > I hesitate to stick my nose into this (and I am not taking sides) but > let me make one point about men on base. If Wade Boggs did the above > (I don't question it), couldn't this say at least at much about the > pitchers he faced in those situations as it does about him? Boggs > is much more likely to come to the plate with men on base against, say, > Dennis Martinez than he is against, say, Ron Guidry. The simple reason > is that *everybody* on the Red Sox is likely to get more hits (and thus > be on base more) against the inferior pitchers. And so Wade Boggs is > more likely to bat with men on against Dennis Martinez and with the > bases empty against Ron Guidry. This might go a long way toward > explaining the differential. > > UUCP: {seismo,allegra,brl-bmd}!umcp-cs!dday Dept. of Computer Science > CSNet: dday@umcp-cs University of Maryland > ARPA: dday@maryland College Park, MD 20742 > (301) 454-4247 I agree completely. These stats, as well as others, are influenced by the difference between starters and relievers. This has often been noted to be one of the major differences between modern baseball and that of previous eras - the emergence of the relief specialist. There is no question that everybody tends to face different pitchers in hot spots than during the rest of a game. But this could actually make it harder to hit with runners in scoring position than it otherwise would be, since managers usually try to bring in relievers who are best suited to face specific batters, e.g., lefties against lefties. But then, your point about facing Guidry with the bases empty can be rephrased as, "When a pitcher is doing very well, then batters will tend to face him less often with runners in scoring position, so that the situations with runners in scoring position will often be against starters who are in trouble." This makes good sense to me. So we have two opposing tendencies. The net is that (as someone posted) hitters tend to bat about 20 points better with men on. This could well reflect that the second tendency outweighs the first. However, this is not directly relevant to what the original argument was about, since I was stating that there are individuals who consistently perform above the average with men on base, and there are those who consistently perform worse, so that this factor cannot be dismissed by saying "the effect is randomly distributed", or something else to that effect. The individual differences between players' performances need to be taken into consideration.