[net.sport.baseball] Carter vs. Pena, Part II -- Offense

david@fisher.UUCP (David Rubin) (08/07/85)

			THE RECORD

We begin with an unedited reproduction of Pena's and Carter's career
offensive statistics, taken from the Baseball Encyclopedia.  To make 
the strike shortened 1981 season comparable to others, I added 50% to 
all of Carter's totals and 60% to all of Pena's (the Expos played 108
games, or 2/3 of 162; the Pirates played 102 games, about 5/8 of 162).
These strike-adjusted figures are listed parenthetically under the
unadjusted figures and were used in the latter's place in finding a
"typical" year.  I also added an important stat that is not included 
in the official listings, on-base percentage (or on-base average), OB.
Finally, bank rounding procedure was used.

			Carter, Gary Edmund

Year   G     AB   H   2B  3B  HR  HR%   R RBI   BB   SO   SB   BA   SA   OB
1974    9    27   11   0   1   1  3.7   5   6    1    2    2  .407 .593 .428
1975  144   503  136  20   1  17  3.4  58  68   72   83    5  .270 .416 .362
1976   91   311   68   8   1   6  1.9  31  38   30   43    0  .219 .309 .287
1977  154   522  148  29   2  31  5.9  86  84   58  103    5  .284 .525 .355
1978  157   533  136  27   1  20  3.8  76  72   62   70   10  .255 .422 .333
1979  141   505  143  26   5  22  4.4  74  75   40   62    3  .283 .485 .336
1980  154   549  145  25   5  29  5.3  76 101   58   78    3  .264 .486 .334
1981  100   374   94  20   2  16  4.3  48  68   35   35    1  .251 .444 .315
(1981 150   561  141  30   3  24   -   72 112   52   52    2    -    -    - )
1982  154   557  163  32   1  29  5.2  91  97   78   64    2  .293 .510 .380
1983  145   541  146  37   3  17  3.1  63  79   51   57    1  .270 .444 .333
1984  159   596  175  32   1  27  4.5  75 106   64   57    2  .294 .487 .362

			
			Pena, Antonio Francesco

Year   G     AB   H   2B  3B  HR  HR%   R RBI   BB   SO   SB   BA   SA   OB
1980    8    21    9   1   1   0  0.0   1   1    0    4    0  .429 .571 .429
1981   66   210   63   9   1   2  1.0  16  17    8   23    1  .300 .381 .326
(1981 106   336  101  14   2   3   -   26  27   13   37    2    -    -    - )
1982  138   497  147  28   4  11  2.2  53  63   17   57    2  .296 .435 .319
1983  151   542  163  22   3  15  2.8  51  70   31   73    6  .301 .435 .339
1984  147   546  156  27   2  15  2.7  77  78   36   79   12  .286 .425 .330


Ok, then, we now have a whole slew of numbers.  What to make of them? 
Since most fans can most easily consider the record of a single
season, I next summarized the two catchers careers.  But how?  Simply
averaging across all the years for each category is too subject to
heavy influence by a particularly bad or particularly good year.  It's
safer to take medians.  Also, we really don't want to consider years
before the principles established themselves in their respective teams
starting lineups, so we consider Carter in 1975 and 1977-1984 (in 1975,
Carter spent most of his playing time in the outfield (as the starting
left fielder, generally) while serving as the #2 catcher; in 1976, he
was just the back-up catcher) and Pena from 1981-1984.  So, taking
medians of all the totals (G,AB,H,2B,3B,HR,R,RBI,BB,SO,SB) and
calculating from the result the appropriate rates (HR%,BA,SA,OB), we 
get:

Plyr   G     AB   H   2B  3B  HR  HR%   R RBI   BB   SO   SB   BA   SA   OB
Crtr  154   541  145  29   2  24  4.4  75  84   58   64    3  .268 .462 .339
Pena  149   520  152  24   2  13  2.5  65  66   24   65    4  .292 .421 .324


			WHAT TO LOOK AT

It is widely recognized that the purpose of the offense is run
production, and there are two distinct ways in which a hitter may
contribute to it.  The first is to score runs, the second to drive
them in.  Thus, traditionally, fans have placed great store in the
most obvious measures of that production, runs scored and runs batted
in.  Unfortunately, those traditional measures are heavily dependent
on circumstances beyond the hitter's control: how well his teammates
fare in doing THEIR job.  You can't score if no one drives you in, and
you can't drive some one in if no one is on base.  If we are to
evaluate individual performance, we must look at statistics that are
NOT dependent on the action of anyone save the individual in question.

Other than knocking it out of the park everytime he comes up, what is
the most helpful thing that a batter can do to aid in (a) the scoring
of runs and (b) the driving in of runs?  The answer to the first is to
place oneself in a position where a teammate can drive you in, that
is, to get on base.  All in all, it is better for a hitter to place
himself on second base then first base, but the critical first step is
to get to ANY base safely. Thus, we look at on On-Base Percentage (a.k.a.
Average) to evaluate how well a player performs this function.  

The answer to the second is to perform some function that will advance
any teammates who might be on base.  One might do this by drawing a
walk, but walks are much more likely to be drawn in situations where
no runner would be advanced (first base open) than otherwise.  We
therefore consider only hits, and as runners are far more effectively
advanced by extra base hits than singles, we account for this by
considering total bases, that is, Slugging Average (a.k.a. Percentage).

What of Batting Average?  It is a statistic of overestimated 
importance, as it does a poor job of measuring either facet of run
production.  In fact, what it does measure (percentage of at bats
that were converted to hits) bears little direct bearing on the
crucial offensive issue: run production.

An interesting statistic that has been proposed to measure power is
"isolated power", defined to be the difference between the Slugging
Average and Batting Average, that is, it can also be calculated as 

		(Total Bases - Hits)/ At Bats.

In other words, it considers ONLY extra bases.  Can you guess the 
career leader in isolated power among active players?  Of course you 
can.  It's Dave Kingman.  But this is just a sidelight...


			THE EVALUATION

Carter wins hands down in advancing runners (an extra base produced
every twenty-five at bats, i.e. an extra base per week or so) and
edges Pena in putting himself in a position where his teammates have
an opportunity to score him.


			ADJUSTMENTS???

Paul Benjamin has repeatedly expressed the opinion that Pena's batting
in the Pittsburgh's supposedly weak lineup vis a vis Montreal has
damaged Pena's statistics.  My response is two fold:

	(1) As we are limiting ourselves to statistics upon which the
	    performance of one's teammates has no direct bearing (OB
	    and SA), and for which there is no empirical evidence of a
	    substantial indirect bearing, it is irrelevant.

	(2) Pittsburgh was not a substantially less capable offensive
	    team than Montreal.  Pittsburgh's worst year was 1984
	    (remember, 1985 has not been considered by me in any way),
	    when the team BA (which I reproduce merely to humor those
	    who refuse the logic of OB and SA) was .255, their OB .312, 
	    and their SA  of .363.  Contrast this with Montreal's BA of
	    .251, OB of .314, and SA of .362, and one could rightly
	    ask, WHAT contrast?  Pittsburgh also outscored their
	    opponents 615-587, while Montreal edged their opponents
	    593-585.  It is only myth that Pittsburgh had a terrible
	    offense; the fact is that Pittsburgh (and, to a lesser
	    extent, Montreal) was unfortunate with regard to WHEN
	    their runs were scored and their opponents' runs yielded.

There is overwhelming evidence, however, that the home park heavily
influences statistics, and that this effect has worked to Pena's
advantage vis a vis Carter during their careers.  The discussion and
application of such effects is postponed to the end of this article,
as I'd rather dispose of one other offensive issue before I take up
that exposition.

				SPEED

We can consider two subtopics offensively: base stealing and base
running.

With regard to the first, do Pena's stolen bases substantially add to
his offensive value?  The answer is an emphatic no; Pena's steals have
HURT the Pirates the past two years.  In 1983, he stole 6 but was
caught 7 times; in 1984, he stole 12 but was caught 8 times.  As CS
isn't listed in the Baseball Encyclopedia, I had to pull these figures
out of James's Abstracts, which I only have for those two years;
however, Pena's stolen base totals before 1983 are unremarkable, even
for a catcher.

The stolen base is a dangerous play.  If it succeeds, the runner has
advanced one base, but if it fails, the runner is entirely removed
AND one of the team's precious 27 outs is consumed.  For this reason,
it must have a high probability of success for it to be properly
employed.  Both conventional wisdom and empirical studies (see Thorn
and Palmer for the latter) place the break even point at somewhere
between 2/3 and 3/4, i.e. if you are succeeding less often than that,
the offensive contribution is NEGATIVE.  Thus, if you steal twice as
often as you are caught, you are at best running in place; it is clear
that Pena has been running backwards in this regard.

And baserunning?  The most important use of speed in that regard is
the time it takes the batter to get from home to first; that speed is
already appropriately awarded by both OB and SA (even overawarded by
the latter, as the infield single does not advance runners as well as
the outfield single).

The only other baserunning situation that places a premium on speed
and occurs with any frequency is the situation where there is a single
to the outfield with less than two outs, a man on first, and the
runner does not start before the hit: can he make it to third?  The
issue is treated with care in only one source that I am aware of: the
Bill James 1984 Baseball Abstract, which considers the matter for one
team (1983 Texas Rangers) in detail, but includes all singles and all
out counts; the question studied there was how often does each runner
advance from first to third on a single.  It was found:

	(1) An everyday player may expect between 20 (if he bats near
	    the bottom of the order) and 60 (if he is near the top) 
	    such opportunities in a season.

	(2) The best runners advanced to third about half the time,
	    while the worst about a tenth.

	(3) Speed was the most important factor, but smarts
	    substantially helped some of the slower players.

Both Carter and Pena batted in the middle of their respective orders
(generally fifth for Carter, sixth for Pena), and probably have about
45 such opportunities in a season.  Assuming that the opportunities
are uniformly distributed among out counts, 30 of these occurred with
none or one out (actually, this probably overestimates the number of
such opportunities, as outs accumulate as batters bat, thus implying
that more runners are on base, on average, with two out than with none
out).  Pena has good speed for a catcher, average for all runners, and
would probably advance to third about 33% of the time (choosing the
median value from Texas regulars); for Carter, my best guess is 20%
(he's not as hopeless on the bases as Sundberg).  The difference,
then, is probably about 30/3 - 30/5 = 4.  It does not make up for
Pena's negative contribution in his stolen base attempts.


			PARK EFFECTS

They exist.  Hitters who play for particular teams (such as the
Astros) almost always perform better on the road; hitters on other 
teams (such as the Braves or Cubs) are as certain to have better
performance at home than on the road.  They are not the vagaries of
chance (well, they could be, but the chance of that is about the same
as all the gas molecules in the room in which you are now reading this
to spontaneously leave that room).  How, then, to go about adjusting
all hitters' performances within the league to reflect what their
performance would have been had their team had a home field that favored
or disfavored offense as much as an "average" park? We would also like
to take some account of the hitter's exemption from batting against
their own staff...

Thorn and Palmer provide one such solution that does account for both,
but the mathematics leads to an iterative solution, and it is quite
messy.  The result of such is a "batting park factor" by which one
should divide the various achievements of the hitters playing in that 
park that season to get some idea of the likely performance of that
hitter in an "typical" park.  Values greater than 1 are associated 
with hitter's parks.  Below I've reproduced the "BPF" for all NL ball
parks in 1984, the values for Pittsburgh 1981-1984, and for Montreal
1975 & 1977-1984.

			1984 BPF

Chi	1.10		NY	0.97		StL	0.97
Phi	1.06		Mon	0.91		Pit	0.95
SD	0.96		Atl	1.15		Hou	0.94
LA	1.07		Cin	1.00		SF	0.95

			PAST BPF's

	Year		Mon.		Pit.
	1975		1.11
	1977		1.01
	1978		1.00
	1979		0.97
	1980		0.98
	1981		0.89		1.01
	1982		1.15		1.12
	1983		1.03		1.06
	1984		0.91		0.95
	Median		1.00		1.04

Year-to-year fluctuations are due to dimensional changes, the year's
weather, the balance between day and night games, and randomness.
Montreal's Olympic Stadium (opened for baseball in 1977) has generally
been a moderate pitcher's park. The uptick in 1982 reflects the great
year the Expo pitching staff had, and the consequent advantage the
Expo lineup received in not having to face their own pitchers.
Pittsburgh's  Three Rivers has generally been a moderate hitter's
park, but last season was a pitcher's park for the first time since
1977 (more night games? colder weather?).  The appropriate thing to do
would be to now adjust the records at the beginning of this article by
the BPF's and derive new "typical" years, but I can sense that you are
getting sleepy.  A quicker thing to do would be to just apply the 
"typical" BPF to the "typical" year (i.e. reduce Pena's output by 3 or
4%), but I don't want to upset Paul TOO much.  

Congratulations!  You have reached the end of Part II!  Part III looks
to be much shorter, as defensive statistics are not kept with the same
zeal by organized baseball as offensive ones which, given the relative
importance of hitting (48%), pitching (44%), defense (6%), and base
running (2%), may be very sensible.

					David Rubin
			{allegra|astrovax|princeton}!fisher!david

P.S.  To justify the last throw away remark about the relative
importance of the game's aspects, consider first that every run scored
is a run yielded, and that offense and defense must be assigned 50%
each.  For defense, 88% of all runs have been earned for some decades
now (in the old days, far fewer runs were earned, and defense was
consequently more important), thus suggesting 44% be assigned to the
pitchers and 6% to the fielders.  Note, though, that the catcher is
the only one other than the pitcher involved in the 44%, and thus may
affect the defense more than any other player besides his battery
mate; unfortunately, we cannot measure "the call of the game", which
is too bad, as expert opinion uniformly favors Carter on this point.
Well, I seem to have already written my introduction to Part III...