Masterproef: abstract

The popularity of data driven decision-making in sports has risen significantly since the success of the 2002 Oakland Athletics baseball team. Based on an objective, analytical approach, the franchise improved upon their 2001 record and made the playoffs despite a limited payroll and a lack of star players.

In the National Basketball Association (NBA), analytics have caused offenses to prioritize 3-point shooting over 2-pointers. All teams employ analytics experts in the hope of creating a competitive advantage and recently "player tracking" (measuring the movements of the basketball and of every player on the court multiple times per second) has introduced the era of big data in basketball. Evaluating teams and players, however, is not straightforward, even with the large amount of available data and multiple methods to quantify team and player performance. With only five players per team, the interaction between players is vital. As are coaching and player matchups. The main aim of this thesis is to gain insight into the NBA dynamics from a statistical point of view. To this end, principal component analysis (PCA) is performed as a descriptive tool.

An interesting result is found when comparing DeAndre Jordan and Andrew Bogut. Bogut was supposed to be the next star Center of the Dallas Mavericks after the team failed to sign All-NBA Center DeAndre Jordan in 2015. Despite high hopes, Bogut turned out to be a bad fit for the team and they missed the playoffs in 2016. A principal component analysis shows that Bogut and Jordan have similar playing styles and qualities. This appears to suggest that Jordan would also have been a bad fit for the Dallas Mavericks.

The principal component method could help scouting departments to identify players of interest and coaches with the creation of their game plans.

Secondary goals of this thesis are to predict which player will win the Most Valuable Player (MVP) award and to forecast the outcome of games. Penalized regression methods (LASSO, elastic net and ridge regression) are used to predict MVP points and individual game scores.

Russell Westbrook is predicted as winner of the 2017 MVP award, with James Harden finishing second. LeBron James, Kawhi Leonard and Kevin Durant should be competing for 3rd place.

The 2016 NBA season serves as validation dataset for the model forecasting game scores. The winner of a game is correctly forecast in 68.7% of the games. This result is similar to existing models discussed in the literature. Future research could expand this model by including player-specific information to further improve accuracy.