Pythagorean Expectation in Baseball

Bill James devised once upon a time that in order to predict the amount of games a team would win, it was more beneficial to compare the ratio of runs scored to runs allowed than to look at the wins and losses themselves. That is, wins and losses are a product of runs scored and runs allowed. The formula for Win% is approximately given by:

Win% ≈ RF² ⁄ (RF² + RA²)

This is called pythagorean expectation due to its resemblance to the Pythagorean theorem (though in reality the exponent is rarely, if ever, exactly 2).

So, if wins are a product of runs, what are runs a product of?

I propose that runs scored = outcome of plate appearances + base running + defense + sequencing. For this model, we'll ignore sequencing and focus just on the outcome of plate appearances (singles, doubles, home runs), base running (going from first to third on a single, stealing second base, etc.), and fielding. While there is surely some interplay between a team's baserunning and their opponent's fielding and vice versa, for this model:

Offense = Batting + Base Running

Defense = Pitching + Fielding

Batting: I used team-level wRC+, which takes the events that happen during a plate appearance (park adjusted) and converts them into a single indexed number representing how many more runs a player/team would produce than an average team. Then, I convert that number to a number of runs above or below zero. Batting Runs = wRC+ ⁄ 100 × rpg × gp

Pitching: I use team-level xFIP-, which factors in strikeouts, walks, and flyballs while normalizing how often a flyball becomes a home run, and then adjusts for ballpark. Pitching Runs = xFIP- ⁄ 100 × rpg × gp

Base Running: I just take the FanGraphs base running metric, which gives a certain number of runs above or below average (0).

Fielding: I use the team defense metric from FanGraphs, giving a certain number of runs above or below average (0).

I then add 69 games of league average production to regress to the mean. Why 69? See here. This is checked every season and does happen to be 69 for 2024 but was as high as 148 for the 1991 season!

Example:

After the games of April 28th, 2024, the Orioles had played 27 games. They had a wRC+ of 123, an xFIP- of 89, a BsR of 5.2, and a DEF of 4.5. The average MLB team up to that date scored 4.41 runs/game.

Offense = Batting + Running

BattingRuns = 1.23 × 4.41 × 27

Running = 5.2

Expected Runs Scored = xRF ≈ 151 runs

Defense = Pitching - Defense

PitchingRuns = 0.89 × 4.41 × 27 ≈ 106

Fielding = 4.5

Expected Runs Allowed = xRA ≈ 101 runs

To estimate the team's talent we add 69 games of average production:

Adjusted Runs For = aRF = xRF + rpg * 69 ≈ 478 runs

Adjusted Runs Against = aRA = xRA + rpg * 69 ≈ 428 runs

Putting in the Orioles team pythagorean exponent, P, gives an equation of:

adjRF^P ⁄ adjRF^P + adjRA^P) ≈ 0.552

This is the current estimate of the Orioles' true talent level.

So what could we do better?

We could try and regress a team's run environment. After all, we know that Coors Field is not a normal run environment
Early in the season I could control for opponent. Starting 4-5 against the 2023 Braves, Dodgers, and Orioles wouldn't be as bad as starting 4-5 against the 2024 White Sox, Marlins, and Rockies.
We don't care about the roster as currently constructed. If the Mariners were to acquire peak Babe Ruth, it wouldn't adjust their wRC+ one iota (and could he even hit in T-Mobile?). We also don't care about projections that say Julio was great last year and is now a year older at a time when hitters tend to improve, that Matt Brash is injured, or that the Mariners are a bullpen arm creating machine, etc.
While the hitters a team puts on the field are reasonably static, pitchers vary quite widely game to game. Consider the 1995 Seattle Mariners. Twelve pitchers started a game that season and seven of them were below replacement level. One who wasn't? Randy Johnson, who contributed 8.6 of the team's 15.7 pitching wins above replacement and won the Cy Young award. Lumping all pitchers as one stat is a little bit of a cheat, especially in this era of pitcher injuries.
Am I looking at the right data? Baseball Reference looks at the last 100 games for their talent estimate. I look at only this season. Is last year relevant? For how long is opening day relevant? While Baseball Reference has forgotten that April existed by September (except that those wins and losses occurred), I am still very much aware of those games even though injuries/call-ups have taken their toll. Maybe I'll revisit that.
Wins are a product of runs. Runs are a product of wRC+! What is wRC+ a product of? Barrel rates and bat speed and swing decisions and infield alignments and... I could dig into that. I feel like this is the last level and it might be worth doing.
Pitching is a product of...you get it.
To that point. It's always struck me a little weird that we don't use the same stats for hitters as pitchers. If a batter hit a ball 102 mph, a pitcher gave up a 102 mph hit, and I bet we'd find that George Kirby allows fewer of those than a AAAA spot starter. If I were to break down wRC+ into other stats (whomps/whiff anyone?), then maybe I should break down xFIP- similarly?
But...we also know that we've seen enough of Clayton Kershaw to think he does have some ability to turn a batted ball in about that goes beyond luck. And this model ignores it! For now...