Skip to content

mccapobianco/No-Hitter-Probability

Repository files navigation

No-Hitter-Probability

See this project on GitHub pages

The data used to make these calculations is obtained from 2 APIs: MLB Gameday and MLB Stats API. The data is used to create a list of batting averages and on-base percentages that the batters are expected to have against the current pitcher. This list will be refered to as the xlineup and represented by a list of nine ordered pairs, where the values in each pair are Out% (the percent of plate appearances resulting in outs) and BB% (the percent of plate appearance resulting in walks). For simplicity, a HBP is considered a type of BB The notation represents the m statistic of the nth batter. The number of outs remaining and the current or due-up batter are also important pieces of data.

Calculating a Perfect Game

The probability of a perfect game is the probability that all remaining batters get out. Therefore, with outs representing the number of outs recorded, the probability of a perfect game is the product of the probabilities of each batter:

Calculating a No-Hitter

The probability of a no-hitter is a bit more complex. There is a chance of a no-hitter with 0 baserunners, with 1 baserunners, with 2 baserunners, etc. For a no-hitter with n baserunners, there are 27^n permutations for the order of outs and baserunners. To account for every possibility, the program uses a 2D-array. The row number represents the number of outs and the column number represents the number of baserunners. The table is limited to 32 columns, and he first column and row have an index of 0. Although more baserunners are theoretically possible, the probability of this is so small that it is negligible. The notation array[X][Y] represents the cell in row X and column Y. The value in array[X][Y] is the probability that the game reaches X outs with no hits and Y baserunners. The array is initialized by setting the cell representing the current situation equal to 1. Then, values are passed from cell to cell. Iteratively set cells using the following formulas:

Below is an example of an array with 2 rows and 2 columns, where Out%=0.7 and BB%=0.1:

[ P = 1 ] [ P = 1*0.1 ]
[ P = 1*0.7 ] [ P = 0.7*0.1+0.1*0.7 ]

After this process fills the entire array, array[27] will contain the probabilities of reaching the end of the game without allowing a hit for each number of baserunners. Simply sum this row to get the probability of a no-hitter.

xLineup

Expected future performance

The first step in creating the xlineup is to calculate the batters' and pitcher's expected batting average and on-base percentage for future plate appearances. With small sample sizes, a player's statistics can be wildly different than their true performance level. For example, if a pitcher does not allow a hit in his first appearance of the season, his .000 batting average against would imply that a no-hitter his following performance is certain. For a more realistic value, a player's season statistics are regressed to their projected season statistics.

Regression to the Mean

Regression to the mean is used to explain how larger sample sizes will be closer to an expected value than some small samples. This can be done by adding a fixed number of average values to the sample. The larger the real sample, the more reliable it is. Since some players have a higher average performance than others, the expected value for a player is their projected season statistics.

Projections

Fangraphs has multiple projection systems available. This program uses the Steamer projections. One issue is that the pitching projections do not have a batting average against nor a on-base percentage against. Using the provided stats WHIP and BB/9, I created estimations for AVG and OBP with strong correlations. My initial estimations were estOBP = WHIP/(WHIP+3) and estAVG = (WHIP-BB9/9)/((WHIP-BB9/9)+3), but I found these did not quite have a slope of 1 when plotted against their actual statistics, so I multiply these each by a constant: estOBP = 0.9512*WHIP/(WHIP+3) and estAVG = 0.9652*(WHIP-BB9/9)/((WHIP-BB9/9)+3).

Head-to-Head Matchup

The next step in creating the xlineup is to find the batter's predicted satistics against a certain pitcher. A simple way to do this is using Bill James's Log5 formula. P = (xy/z) / (xy/z+(1-x)(1-y)/(1-z)), where x is the batter's stats, y is the pitcher's stats, and z is the league average stat. This formula is used to get the estimated AVG and estimated OBP in a given matchup.

Statistics to Outcomes

The probability of an out and the probability of a walk (or hit-by-pich) must be calculated from the AVG and OBP. The P('out'), or Out%, is simple; 1-OBP. The P('walk'), or BB%, is a little more complex. This is OBP-P('hit'). Notice that AVG does not equal P('hit'); AVG=P('hit' | 'at-bat'), while we want P('hit' | 'plate appearance'). I found that P('hit')=AVG*(1-OBP)/(1-AVG). So BB% = OBP-AVG*(1-OBP)/(1-AVG). A chance of an error 1-FLD% is added to the BB%, where FLD% is the league-wide fielding percentage.

Future work

The following aspects are not currently considered in the calculations, but could be considered in the future:

  • Double plays
  • Caught stealing/stolen bases
  • Extra innings/scoring runs
  • Pitcher fatigue
  • Team specific fielding percentage
  • Count-adjusted statistics
  • Other head-to-head formulas
  • "Hot/Cold" (e.g. if a pitcher hasn't allowed a hit through 7 innings, they are likely performing better than their average, but the calculation uses their season averages)

I also plan to add more detailed information to the user interface, including:

  • Pitcher name
  • Opponent
  • Score
  • Inning
  • Baserunners
  • Outs
  • Current/due-up batter

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published