Stat468 Final Project
1 Introduction
1.1 Abstract
In the days of and leading up to the 2025 NHL Entry Draft there were a total of 18 trades which only included draft picks. This report aims to use historical data to determine the relative value of selections in the NHL Entry Draft. To do this, data will be imported from Hockey Reference, and several potential models will be fit to the data with the goal of predicting the value of each pick based on historical outcomes. The Shiny app component of this project allows users to interactively check the fairness of potential trades. Knowing the relative value of picks allows NHL teams to both make favourable trade offers as well as evaluate incoming trade offers. Specifically, this project makes contributions to both asset valuation as well as trade analysis, both in the context of selections in the NHL Entry Draft.
1.2 Data
The data used by this report is imported from Hockey Reference, which has data on NHL drafts dating back to 1963. Each row on Hockey Reference is one player selected, the columns included on the site are:
Overall
: the selection number where the player was selected.Team
: the team that selected the player.Player
,Nat
,Pos
,Age
: the player’s name, nationality, position, at age at the time of the draft.To
: the last year a player played in the NHL. For players who never played in the NHL this will be the empty string, for active players it will be2025
.Amateur Team
: the team the player was drafted from (confusingly this is not necessarily an amateur team, a large number of players have been drafted from professional European teams).GP
,G
,A
,PTS
,+/-
,PIM
: the player’s career games played, goals, assists, points (goals plus assists), plus minus, and penalty minutes. For players who never played in the NHL these will all be empty strings. For goalies, thisGP
column will match the nextGP
column for goalies, and these values can be, but are are not necessarily, the empty string for goalies (eg a goalie could have gotten an assist). Note that I will abbreviate games played to GP throughout this report.GP
,W
,L
,T/O
,SV%
,GAA
: for goalies only, the goalie’s career games played, wins, losses, ties plus overtime losses, save percentage, and goals against average. For goalies who never played in the NHL and all skaters, these columns will all be empty strings.PS
: the player’s estimated point shares, or points added to their team over the course of their career (here we mean points in the standings, NOT goals and assists). We will discuss the benefits and drawbacks of using point share in the Question chapter. Note that I abbreviate point share to PS throughout this report.
All stats listed only pertain to regular season games, which is preferable anyway since we do not want to put players who played on bad teams at a disadvantage more than they already are (it’s harder to get a high PS if your team rarely wins). Note that we will only use a subset of the years between 1963 and 2025 and of the attributes listed above, as will be explained in the Import and Tidy chapters.
1.3 Constraints
There are a number of technical and practical constraints at play. Here are some of them:
There have been 63 drafts in NHL history, but drafts which occurred too long ago are likely not relevant and drafts which occurred very recently are difficult to evaluate. Despite this, we would like to include as many drafts as possible to maintain a reasonable sample size. This will be discussed further in the Import chapter.
Measuring the value of a player’s career is not a trivial task. One of the metrics we will use is called point share, which is calculated by Hockey Reference and incorporates several stats, but it is still not a perfect metric as it can still be dependent on external factors, such as the quality of the player’s team the opportunities the player was given. We will discuss point share more in the Visualize, Transform, and Model chapters.
Furthermore, measuring the value of an active (non retired) player’s career is even more difficult because they could still contribute to their team and thus their career could still increase in value. This leaves us with the following three options, which will be discussed further in the Transform chapter:
Ignore the issue and act as if all active players retired today.
Avoid the problem by only using data from draft classes in which all players have retired.
Try to estimate the what the value of a player’s career will be when they retire and use that instead.
We are interested in using historical data to predict the value of a draft pick from the team’s perspective. One slight problem with this is that the value of the pick from the team’s perspective depends on how long the player stayed on their team and what (if anything) the team got when the player left the team (via trade, free agency, or retirement). We will ignore this because it is nearly impossible to take these factors into account.
Players drafted earlier in a draft (ie with a better pick) typically get more opportunities than players selected in the later rounds. In particular, teams often fall victim to the sunk cost fallacy because scouts and management look bad when players who they invested a high pick into never makes it to the NHL. Accounting for this is either very difficult or impossible, so we will not attempt to remedy it.
Every draft has strong portions and weak portions. For example, draft A might have a very strong second round (by this we mean the prospects drafted in the second round of draft A are of higher quality than those typically drafted in the second round). Though this seems like an obvious point, it is crucial to mention because it is a significant asterisk on this report, which will assume all drafts have equal value structures (ie the quality of a prospect #27 overall of draft A is the same as the quality of a prospect drafted at # 27 of draft B).
1.4 Frequently Used Definitions
We define the following terms which will be used frequently throughout the report:
\(v_i\): the true value of pick \(i\). This is unknown and is what we are trying to estimate.
\(\hat v_i\): our estimate of the value of pick \(i\).
\(h_{i, j, m}\): the value of the \(i^{th}\) pick of draft \(j\) when we measure value using metric \(m\). We will consider this value known even though it needs to be estimated for certain metrics. Most of our metrics cannot be negative, so we have a lower bound on \(h_{i, j, m}\) for active players. Note that there are four metrics (possible values for \(m\)) that we will consider:
PS: Point share. We sometimes use \(ps_{i,j}\) for the point share of the player drafted at pick \(i\) of draft \(j\).
Adjusted PS: The player’s point share at the end of their career. We sometimes use \(ps^{adj}_{i,j}\) for the adjusted point share of the player drafted at pick \(i\) of draft \(j\). Note that for retired players we have \(ps_{i,j}^{adj} = ps_{i,j}\), since if a player is retired we know what their point share was at the end of their career. For active players we will consider this value known, even though it needs to be estimated.
GP: Games played. We sometimes use \(gp_{i,j}\) for the number of games played by the player drafted at pick \(i\) of draft \(j\).
NHL Regular: A player who played in (or is on track to play in) at least 200 NHL games. This definition is the same as the one used by Luo (2024), we will discuss it more in the Transform chapter.
To avoid confusion, here are the different definitions of “points”:
Points as in goals + assists. This report will NEVER use this definition of points.
Points as in point share. Recall this is calculated to measure a player’s effect on his team’s points in the standings, not goals and assists.
Points as in the unitless value of a draft pick. To maintain consistency with past work, we will define \(v_i\) to be in terms of “points” which do not have units but which make comparing the values of picks significantly easier and more intuitive.