Bill James devised once upon a time that in order to predict the amount of games a team would win, it was more beneficial to compare the ratio of runs scored to runs allowed than to look at the wins and losses themselves. That is, wins and losses are a product of runs scored and runs allowed. The formula for Win% is approximately given by:
Win% ≈ RF2 ⁄ (RF2 + RA2)
This is called pythagorean expectation due to its resemblance to the Pythagorean theorem (though in reality the exponent is rarely, if ever, exactly 2).
So, if wins are a product of runs, what are runs a product of?
I propose that runs scored = outcome of plate appearances + base running + defense + sequencing. For this model, we'll ignore sequencing and focus just on the outcome of plate appearances (singles, doubles, home runs), base running (going from first to third on a single, stealing second base, etc.), and fielding. While there is surely some interplay between a team's baserunning and their opponent's fielding and vice versa, for this model:
Offense = Batting + Base Running
Defense = Pitching + Fielding
Batting: I used team-level wRC+, which takes the events that happen during a plate appearance (park adjusted) and converts them into a single indexed number representing how many more runs a player/team would produce than an average team. Then, I convert that number to a number of runs above or below zero. Batting Runs = wRC+ ⁄ 100 × rpg × gp
Pitching: I use team-level xFIP-, which factors in strikeouts, walks, and flyballs while normalizing how often a flyball becomes a home run, and then adjusts for ballpark. Pitching Runs = xFIP- ⁄ 100 × rpg × gp
Base Running: I just take the FanGraphs base running metric, which gives a certain number of runs above or below average (0).
Fielding: I use the team defense metric from FanGraphs, giving a certain number of runs above or below average (0).
I then add 69 games of league average production to regress to the mean. Why 69? See here. This is checked every season and does happen to be 69 for 2024 but was as high as 148 for the 1991 season!
Example:
After the games of April 28th, 2024, the Orioles had played 27 games. They had a wRC+ of 123, an xFIP- of 89, a BsR of 5.2, and a DEF of 4.5. The average MLB team up to that date scored 4.41 runs/game.
Offense = Batting + Running
BattingRuns = 1.23 × 4.41 × 27
Running = 5.2
Expected Runs Scored = xRF ≈ 151 runs
Defense = Pitching - Defense
PitchingRuns = 0.89 × 4.41 × 27 ≈ 106
Fielding = 4.5
Expected Runs Allowed = xRA ≈ 101 runs
To estimate the team's talent we add 69 games of average production:
Adjusted Runs For = aRF = xRF + rpg * 69 ≈ 478 runs
Adjusted Runs Against = aRA = xRA + rpg * 69 ≈ 428 runs
Putting in the Orioles team pythagorean exponent, P, gives an equation of:
adjRFP ⁄ adjRFP + adjRAP) ≈ 0.552
This is the current estimate of the Orioles' true talent level.
So what could we do better?