david@fisher.UUCP (David Rubin) (08/07/85)
THE RECORD We begin with an unedited reproduction of Pena's and Carter's career offensive statistics, taken from the Baseball Encyclopedia. To make the strike shortened 1981 season comparable to others, I added 50% to all of Carter's totals and 60% to all of Pena's (the Expos played 108 games, or 2/3 of 162; the Pirates played 102 games, about 5/8 of 162). These strike-adjusted figures are listed parenthetically under the unadjusted figures and were used in the latter's place in finding a "typical" year. I also added an important stat that is not included in the official listings, on-base percentage (or on-base average), OB. Finally, bank rounding procedure was used. Carter, Gary Edmund Year G AB H 2B 3B HR HR% R RBI BB SO SB BA SA OB 1974 9 27 11 0 1 1 3.7 5 6 1 2 2 .407 .593 .428 1975 144 503 136 20 1 17 3.4 58 68 72 83 5 .270 .416 .362 1976 91 311 68 8 1 6 1.9 31 38 30 43 0 .219 .309 .287 1977 154 522 148 29 2 31 5.9 86 84 58 103 5 .284 .525 .355 1978 157 533 136 27 1 20 3.8 76 72 62 70 10 .255 .422 .333 1979 141 505 143 26 5 22 4.4 74 75 40 62 3 .283 .485 .336 1980 154 549 145 25 5 29 5.3 76 101 58 78 3 .264 .486 .334 1981 100 374 94 20 2 16 4.3 48 68 35 35 1 .251 .444 .315 (1981 150 561 141 30 3 24 - 72 112 52 52 2 - - - ) 1982 154 557 163 32 1 29 5.2 91 97 78 64 2 .293 .510 .380 1983 145 541 146 37 3 17 3.1 63 79 51 57 1 .270 .444 .333 1984 159 596 175 32 1 27 4.5 75 106 64 57 2 .294 .487 .362 Pena, Antonio Francesco Year G AB H 2B 3B HR HR% R RBI BB SO SB BA SA OB 1980 8 21 9 1 1 0 0.0 1 1 0 4 0 .429 .571 .429 1981 66 210 63 9 1 2 1.0 16 17 8 23 1 .300 .381 .326 (1981 106 336 101 14 2 3 - 26 27 13 37 2 - - - ) 1982 138 497 147 28 4 11 2.2 53 63 17 57 2 .296 .435 .319 1983 151 542 163 22 3 15 2.8 51 70 31 73 6 .301 .435 .339 1984 147 546 156 27 2 15 2.7 77 78 36 79 12 .286 .425 .330 Ok, then, we now have a whole slew of numbers. What to make of them? Since most fans can most easily consider the record of a single season, I next summarized the two catchers careers. But how? Simply averaging across all the years for each category is too subject to heavy influence by a particularly bad or particularly good year. It's safer to take medians. Also, we really don't want to consider years before the principles established themselves in their respective teams starting lineups, so we consider Carter in 1975 and 1977-1984 (in 1975, Carter spent most of his playing time in the outfield (as the starting left fielder, generally) while serving as the #2 catcher; in 1976, he was just the back-up catcher) and Pena from 1981-1984. So, taking medians of all the totals (G,AB,H,2B,3B,HR,R,RBI,BB,SO,SB) and calculating from the result the appropriate rates (HR%,BA,SA,OB), we get: Plyr G AB H 2B 3B HR HR% R RBI BB SO SB BA SA OB Crtr 154 541 145 29 2 24 4.4 75 84 58 64 3 .268 .462 .339 Pena 149 520 152 24 2 13 2.5 65 66 24 65 4 .292 .421 .324 WHAT TO LOOK AT It is widely recognized that the purpose of the offense is run production, and there are two distinct ways in which a hitter may contribute to it. The first is to score runs, the second to drive them in. Thus, traditionally, fans have placed great store in the most obvious measures of that production, runs scored and runs batted in. Unfortunately, those traditional measures are heavily dependent on circumstances beyond the hitter's control: how well his teammates fare in doing THEIR job. You can't score if no one drives you in, and you can't drive some one in if no one is on base. If we are to evaluate individual performance, we must look at statistics that are NOT dependent on the action of anyone save the individual in question. Other than knocking it out of the park everytime he comes up, what is the most helpful thing that a batter can do to aid in (a) the scoring of runs and (b) the driving in of runs? The answer to the first is to place oneself in a position where a teammate can drive you in, that is, to get on base. All in all, it is better for a hitter to place himself on second base then first base, but the critical first step is to get to ANY base safely. Thus, we look at on On-Base Percentage (a.k.a. Average) to evaluate how well a player performs this function. The answer to the second is to perform some function that will advance any teammates who might be on base. One might do this by drawing a walk, but walks are much more likely to be drawn in situations where no runner would be advanced (first base open) than otherwise. We therefore consider only hits, and as runners are far more effectively advanced by extra base hits than singles, we account for this by considering total bases, that is, Slugging Average (a.k.a. Percentage). What of Batting Average? It is a statistic of overestimated importance, as it does a poor job of measuring either facet of run production. In fact, what it does measure (percentage of at bats that were converted to hits) bears little direct bearing on the crucial offensive issue: run production. An interesting statistic that has been proposed to measure power is "isolated power", defined to be the difference between the Slugging Average and Batting Average, that is, it can also be calculated as (Total Bases - Hits)/ At Bats. In other words, it considers ONLY extra bases. Can you guess the career leader in isolated power among active players? Of course you can. It's Dave Kingman. But this is just a sidelight... THE EVALUATION Carter wins hands down in advancing runners (an extra base produced every twenty-five at bats, i.e. an extra base per week or so) and edges Pena in putting himself in a position where his teammates have an opportunity to score him. ADJUSTMENTS??? Paul Benjamin has repeatedly expressed the opinion that Pena's batting in the Pittsburgh's supposedly weak lineup vis a vis Montreal has damaged Pena's statistics. My response is two fold: (1) As we are limiting ourselves to statistics upon which the performance of one's teammates has no direct bearing (OB and SA), and for which there is no empirical evidence of a substantial indirect bearing, it is irrelevant. (2) Pittsburgh was not a substantially less capable offensive team than Montreal. Pittsburgh's worst year was 1984 (remember, 1985 has not been considered by me in any way), when the team BA (which I reproduce merely to humor those who refuse the logic of OB and SA) was .255, their OB .312, and their SA of .363. Contrast this with Montreal's BA of .251, OB of .314, and SA of .362, and one could rightly ask, WHAT contrast? Pittsburgh also outscored their opponents 615-587, while Montreal edged their opponents 593-585. It is only myth that Pittsburgh had a terrible offense; the fact is that Pittsburgh (and, to a lesser extent, Montreal) was unfortunate with regard to WHEN their runs were scored and their opponents' runs yielded. There is overwhelming evidence, however, that the home park heavily influences statistics, and that this effect has worked to Pena's advantage vis a vis Carter during their careers. The discussion and application of such effects is postponed to the end of this article, as I'd rather dispose of one other offensive issue before I take up that exposition. SPEED We can consider two subtopics offensively: base stealing and base running. With regard to the first, do Pena's stolen bases substantially add to his offensive value? The answer is an emphatic no; Pena's steals have HURT the Pirates the past two years. In 1983, he stole 6 but was caught 7 times; in 1984, he stole 12 but was caught 8 times. As CS isn't listed in the Baseball Encyclopedia, I had to pull these figures out of James's Abstracts, which I only have for those two years; however, Pena's stolen base totals before 1983 are unremarkable, even for a catcher. The stolen base is a dangerous play. If it succeeds, the runner has advanced one base, but if it fails, the runner is entirely removed AND one of the team's precious 27 outs is consumed. For this reason, it must have a high probability of success for it to be properly employed. Both conventional wisdom and empirical studies (see Thorn and Palmer for the latter) place the break even point at somewhere between 2/3 and 3/4, i.e. if you are succeeding less often than that, the offensive contribution is NEGATIVE. Thus, if you steal twice as often as you are caught, you are at best running in place; it is clear that Pena has been running backwards in this regard. And baserunning? The most important use of speed in that regard is the time it takes the batter to get from home to first; that speed is already appropriately awarded by both OB and SA (even overawarded by the latter, as the infield single does not advance runners as well as the outfield single). The only other baserunning situation that places a premium on speed and occurs with any frequency is the situation where there is a single to the outfield with less than two outs, a man on first, and the runner does not start before the hit: can he make it to third? The issue is treated with care in only one source that I am aware of: the Bill James 1984 Baseball Abstract, which considers the matter for one team (1983 Texas Rangers) in detail, but includes all singles and all out counts; the question studied there was how often does each runner advance from first to third on a single. It was found: (1) An everyday player may expect between 20 (if he bats near the bottom of the order) and 60 (if he is near the top) such opportunities in a season. (2) The best runners advanced to third about half the time, while the worst about a tenth. (3) Speed was the most important factor, but smarts substantially helped some of the slower players. Both Carter and Pena batted in the middle of their respective orders (generally fifth for Carter, sixth for Pena), and probably have about 45 such opportunities in a season. Assuming that the opportunities are uniformly distributed among out counts, 30 of these occurred with none or one out (actually, this probably overestimates the number of such opportunities, as outs accumulate as batters bat, thus implying that more runners are on base, on average, with two out than with none out). Pena has good speed for a catcher, average for all runners, and would probably advance to third about 33% of the time (choosing the median value from Texas regulars); for Carter, my best guess is 20% (he's not as hopeless on the bases as Sundberg). The difference, then, is probably about 30/3 - 30/5 = 4. It does not make up for Pena's negative contribution in his stolen base attempts. PARK EFFECTS They exist. Hitters who play for particular teams (such as the Astros) almost always perform better on the road; hitters on other teams (such as the Braves or Cubs) are as certain to have better performance at home than on the road. They are not the vagaries of chance (well, they could be, but the chance of that is about the same as all the gas molecules in the room in which you are now reading this to spontaneously leave that room). How, then, to go about adjusting all hitters' performances within the league to reflect what their performance would have been had their team had a home field that favored or disfavored offense as much as an "average" park? We would also like to take some account of the hitter's exemption from batting against their own staff... Thorn and Palmer provide one such solution that does account for both, but the mathematics leads to an iterative solution, and it is quite messy. The result of such is a "batting park factor" by which one should divide the various achievements of the hitters playing in that park that season to get some idea of the likely performance of that hitter in an "typical" park. Values greater than 1 are associated with hitter's parks. Below I've reproduced the "BPF" for all NL ball parks in 1984, the values for Pittsburgh 1981-1984, and for Montreal 1975 & 1977-1984. 1984 BPF Chi 1.10 NY 0.97 StL 0.97 Phi 1.06 Mon 0.91 Pit 0.95 SD 0.96 Atl 1.15 Hou 0.94 LA 1.07 Cin 1.00 SF 0.95 PAST BPF's Year Mon. Pit. 1975 1.11 1977 1.01 1978 1.00 1979 0.97 1980 0.98 1981 0.89 1.01 1982 1.15 1.12 1983 1.03 1.06 1984 0.91 0.95 Median 1.00 1.04 Year-to-year fluctuations are due to dimensional changes, the year's weather, the balance between day and night games, and randomness. Montreal's Olympic Stadium (opened for baseball in 1977) has generally been a moderate pitcher's park. The uptick in 1982 reflects the great year the Expo pitching staff had, and the consequent advantage the Expo lineup received in not having to face their own pitchers. Pittsburgh's Three Rivers has generally been a moderate hitter's park, but last season was a pitcher's park for the first time since 1977 (more night games? colder weather?). The appropriate thing to do would be to now adjust the records at the beginning of this article by the BPF's and derive new "typical" years, but I can sense that you are getting sleepy. A quicker thing to do would be to just apply the "typical" BPF to the "typical" year (i.e. reduce Pena's output by 3 or 4%), but I don't want to upset Paul TOO much. Congratulations! You have reached the end of Part II! Part III looks to be much shorter, as defensive statistics are not kept with the same zeal by organized baseball as offensive ones which, given the relative importance of hitting (48%), pitching (44%), defense (6%), and base running (2%), may be very sensible. David Rubin {allegra|astrovax|princeton}!fisher!david P.S. To justify the last throw away remark about the relative importance of the game's aspects, consider first that every run scored is a run yielded, and that offense and defense must be assigned 50% each. For defense, 88% of all runs have been earned for some decades now (in the old days, far fewer runs were earned, and defense was consequently more important), thus suggesting 44% be assigned to the pitchers and 6% to the fielders. Note, though, that the catcher is the only one other than the pitcher involved in the 44%, and thus may affect the defense more than any other player besides his battery mate; unfortunately, we cannot measure "the call of the game", which is too bad, as expert opinion uniformly favors Carter on this point. Well, I seem to have already written my introduction to Part III...