A Shot in the Dark

Introduction

Steph Curry, Klay Thompson, Ray Allen, Ben Simmons. Wait, scratch that last one. These are some of the greatest shooters in NBA history, but where did they come from and how did their teams figure out they were going to be so great? In recent years, accurate 3-point shooting has changed the way basketball is played. To properly optimize an offense, shooters are a necessity for any team with playoff ambitions. If NBA teams can find legitimate trends, they can make causal inferences about what college stars now are going to become NBA sharpshooters later. Being able to more accurately predict draft prospects NBA shooting would allow for better, safer drafting. In this report, we aim to test the predictability of NBA 3-point shooting, using college basketball 3-point shooting, in order to better evaluate draft prospects. This is an incredibly interesting question and we aim to find the best predictors (an just how accurate those predictors are) in order to find the next Steph Curry and in order to avoid the next Ben Simmons.

Data

The data used in this report comes from the hoopR package in R. In hoopR, there is a multitude of play by play and box score basketball data, coming from ESPN. We specifically looked at NBA player stats from 2007 to 2020, and college player stats from 2014-2023. In order to use this data, we merged the two datasets, matching for the common players. This gave us with 469 players in total. The dataset includes the following variables:

Variables

mbb_attempts: Total college 3-point attempts per player

mbb_made: Total college 3-point attempts made per player

mbb_percent: College 3-point percentage per player

mbb_total_points: Total points for college players

mbb_total_minutes: Total minutes for college players

player_position: What position the player had in college

nba_attempts: Total NBA 3-point attempts per player

nba_made: Total NBA 3-point attempts made per player

nba_percent: NBA 3-point percentage per player

above_average: Whether a player shot above the league average (37%) for 3-point shooting

We proceed to conduct some exploratory analysis to better understand the variables. Specifically, we look at the distributions of attempts done and shots made by both NBA players and college players, as well as the proportion of “above average” players in the NBA, which we define as players whose percentage of shots made are greater than 37%. As seen in the scatterplots of college player attempts, there is a reasonable spread, with the majority of the players resting from 0 to 800. In NBA player attempts, there is a stronger concentration of players at the lower end of attempts around the range of 0 to 2000, with some outlier players at the range of 2000 to 3000. This makes sense, as players usually playing at college for a maximum of 4 years, while in the NBA, theres a much larger range of playing years. This also applies to made shots, in which the majority of college players are fairly spread around 0 to 300, and NBA players, in which a large portion of players are concentrated with 0 to 400, with outliers reaching to 1200-1400. Players defined as “above-average” in NBA shooting make up around 20%, with the other portion making up around 80%.

Before we go into the methods, it’s always a good idea to look at how a basic linear model looks against the data. The model shown below is simply regressing college percentage on nba percentage, and while the data points seem to show a clear linear relationship, the model itself doesn’t seem to fit the data very well. We’ll examine why that is later, but it’s something to keep track of for now.

Methods

When attempting to create a model predicting shooting based on college performance, our first thought was to create a Generalized Linear Model using the bestglm function in R. In order to do this, we needed to have a binary variable act as the response variable so we included the above_average variable as our response variable and used all of the relevant variables in our nba_mbb dataset as our predictor variables. When running the bestglm model, we used AIC as our information criteria and we set out family to “binomial” as we are working with a binary variable.

Before we go any further, we are going to check and make sure the necessary assumptions for this model are satisfied. First, understanding the difficulty in making interpretations off of residual/QQ plots for binomial regressions, we are going to instead analyze these assumptions using a very similar model, with percent as the predictor instead of above_average. We can do this through looking at a residual plot and a QQ plot, basically checking for heteroskedacity and ormality. As shown below, the residual plot seems to check out as the residuals seem scattered appropriately around 0. However, there are some issues with the QQ plot, specifically around the bottom tail. Even though we can see this variation around the bottom tail, for the purposes of this project, we will forge ahead with our GLM, not forgetting this issue with our Normality assumption.

When running the model, we found that the bestglm function chose mbb_made and mbb_attempts as our two predictors. As shown below, our GLM with these two variables used as predictors shows a positive relationship between 3-point shots made in college and likelihood of becoming an above average shooter in the NBA (0.036) and a negative relationship between 3-pointers attempted in college and likelihood of becoming an above average shooter in the NBA (0.012). While percent is not included in this GLM, we can use the coefficients in order to make some assumptions here. Since, the coefficient for shots made is three times higher than the coefficient for shots attempted, we can conclude that based off of this model, college players who shoot above 33.3% have an above average chance at becoming an above average 3 point shooter, and college players who shoot below 33.3% have a below average chance at becoming an above average 3 point shooter.

We are going to quantify uncertainty for this model using 5-fold leave one-out cross validation. When doing such, we found that the uncertainty estimate for this model is .14, which seems like a pretty low value. This GLM has given interesting results and shows a statistically significant relationship between college makes/attempts and NBA shooting performance, which is unsurprising but still interesting to see.

The next model we are going to attempt to create is a multilevel model. We are going to specifically factor on what position that these athletes played in collge, using college position as a random effect. We decided to make this choice after realizing that the importance of strong shooting is definitely a lot different based on a player’s position. For example, it is much more important for guards to be above average shooters compared to centers. For the most part, guards who can’t shoot have been finding their way to the bench (Ben Simmons) but centers who can’t shoot (Steven Adams), still definitely have a role in the modern NBA. As far as fixed effects, we are going to use percent as an instrument for made and attempts here, as we believe that college 3-point shooting as a predictor operates similarly to using both college 3-point makes and college 3-point attempts.

We are also going to quantify uncertainty for this model using conditional standard deviation estimates per position. Shown below are these specific standard deviations per position.

As we can see, the far majority of the standard deviations are centered below 0, which is a major problem for our assumption that our errors are Normally distributed. Because of the fact we can’t approve the assumption here, it’s going to create limits on what predictions we can make with this model. We’ll touch upon this later in our conclusion.

When it comes to comparing these two models, we’re going to use a couple different metrics. First, we’ll compare the uncertainties of each model, and then we’ll compare whether the predictors in each model are statistically significant, and finally we’ll examine the coefficients in each model to see if the two models are telling the same story or whether they differ.

Results

The first element of these models that we are going to compare are the coefficents for the predictors in each model. The GLM model has an intercept with a coefficent of -1.79 with a standard error of 0.22, a coefficient for mbb_made of 0.036 with a standard error of 0.012, and a coefficient for mbb_attempts of -0.012 with a standard error of 0.0046.

As this is a glm, we can interpret those variables using their impact on the log odds. This means for each additional shot made, the log odds of being an above-average 3-point shooter in the NBA increases by 3%, but for every additional shot attempted, the log odds of being an above-average 3-point shooter in the NBA decreases by 1%.

Meanwhile, the multilevel model with the random effects for position players has an intercept of -2.54 with a standard error of 0.59, and an intercept for mbb_percent of 2.93 with a standard error of 1.68. This means that for every increase in 3-point shooting percentage in college by 1 %, the log odds of being an above-average 3-point shooter in the NBA increases by 18%.

So far, both models seem to be telling us similar things, that the more accurate of a shooter that you are at the college level, the more accurate of a shooter you will be at the NBA level.

The next way that we are going to compare these two models is by looking at the significance values of the coefficients. First off, the p-values for the coefficients in the GLM model are both far below 0.05, showing that both coefficients are significant. Next in the multilevel model, where we use positions as random effects, the p-value for college shooting percentage is 0.08, which is below the conventional significance value of 0.05, which is a legitimate problem. However, as it still is under 0.10, it is considered significant when testing to a 95% level.

Finally, we are going to look back at the uncertainty estimates that we measured in the “Methods” section. We used 5-fold cross validation to calculate an uncertainty estimate of 0.14 for our GLM, which all things considered is a pretty low value. However, there were issues with meeting the Normality assumption due to variations in the QQ plot for this GLM, so this low uncertainty value could very well be misleading and inaccurate, which we have to account for when deciding whether to make predictions based on this model.

Moreover, the uncertainty estimates for our multilevel model were also problematic as we used the standard deviations for the random effect groups to measure uncertainty, and we found that the distribution of the standard deviations were not centered around 0. This tells us that there may be uncertainty in this model that the model currently does not account for, again casting doubts on the reliabilty of our multilevel model.

Even though there are plenty of problems with our multilevel model, there are lots of insights we can take away from using positions as random effects. First off, we can look at how the position a player had in college had an impact on their NBA shooting performance using this graphic below.

This gives us the order (compared to centers) of positions that have the highest shooting percentage in the NBA. Unsurprisingly, college guards have the highest shooting percentage in the NBA, then forwards, then centers. This is what a reasonable person would have assumed before conducting this analysis, but it’s interesting to see the statistical proof here. We were able to find this simply by filtering our data by position and running a simpler linear model, filtering by position, and comparing the coefficients (all of which were statistically significant). Even though this is a much simpler way of modeling compared to our multilevel model, it gives us some clear insights we can take away from it. For example, NBA teams who are simply looking for strong 3-point shooters in the draft should definitely be focusing on draftign guards.

Additionally, when filtering by college guards, the simple linear model seems to fit the scatter plot a whole lot better than it did earlier back in the EDA section. Even though it isn’t perfect, we can see below, that the model filtering on guards shows a closer relationship between college shooting percentage and NBA shooting percentage for guards.

Discussion

What we set out to do here is to find predictors from college statistics that accurately and routinely predict strong shooting performance in the NBA. In order to do this, we relied on two different models, the first being a generalized linear model (GLM) using the bestglm function, second a multilevel model with college shooting percentage as a fixed effect and player position as a random effect. We learned from both models that there was a relationship between shooting percentage in the college basketball and shooting percentage in the NBA.

In the second model, we found that players who play point guard and shooting guard in college tend to have a better shooting percentage in the NBA than forwards and especially centers. Moreover, the multilevel model was stronger and showed a more significant predictive power when filtered solely to guards. This means that guards who shoot well in college or more likely to become better shooters in the NBA than forwards who shoot well in college. We’d speculate that this is because of the fact that guards tend to take more three point shots (and we also found that our model’s predictive power is correlated with attempts as well), so the larger sample size should theoretically make the model more powerful.

However, while we were able to gleam a lot of insights from these models, we must remember that there were limitations. First, we were unable to “check off” the Normality assumption in the first GLM model, which means that our model very well could have biased parameters and inaccurate results. This makes it dangerous to accept any conclusions from this model as fact. Next, the uncertainty estimates in the multilevel model were not centered around zero, which also makes it dangerous to make certain conclusions from the multilevel model.

Another limitation of our model is that it doesn’t have a minimum number of shot attempts, a lucky center could take and make one three point shot, and it could completely skew the data, and especially our cross-validation estimate. A clear next step to account for this would be to filter the dataset by a minimum number of shots, in both college and in the NBA.

One more next step that would be interesting to see would be to filter the dataset by the conference that the college basketball player was in, as it may be easier to be a good shooter against MAC defense compared the Big East. This way we can see if 3 point percentage in harder games is more impactful than 3-point percentage against easier competition.

Code Appendix

nba_player_stats <- load_nba_player_box(seasons = 2014:2023)

# Simplify to a ranking of player stats by points per 36 min --------------

nba_player_score_summary <- nba_player_stats |>
  group_by(athlete_id, athlete_display_name, athlete_position_name) |>
  summarize(minutes_played = sum(minutes, na.rm = TRUE),
            total_points = sum(points, na.rm = TRUE),
            made = sum(three_point_field_goals_made, na.rm = TRUE),
            attempts = sum(three_point_field_goals_attempted, na.rm = TRUE),
            percent = made/attempts,
            .groups = "drop") |>
  mutate(points_per_36min = total_points / minutes_played * 36)

mbb_player_stats <- load_mbb_player_box(seasons = 2007:2020)

mbb_player_score_summary <- mbb_player_stats |>
  group_by(athlete_id, athlete_display_name, athlete_position_name) |>
  summarize(mbb_minutes_played = sum(minutes, na.rm = TRUE),
            mbb_total_points = sum(points, na.rm = TRUE),
            mbb_made = sum(three_point_field_goals_made, na.rm = TRUE),
            mbb_attempts = sum(three_point_field_goals_attempted, na.rm = TRUE),
            mbb_percent = mbb_made/mbb_attempts,
            .groups = "drop") |>
  mutate(points_per_36min = mbb_total_points / mbb_minutes_played * 36)

nba_mbb <- merge(nba_player_score_summary, mbb_player_score_summary, by = "athlete_id")

nba_mbb$above_average <- ifelse(nba_mbb$percent > 0.37, 1, 0)

nba_mbb$highShooter <- ifelse(nba_mbb$mbb_attempts > 227, 1, 0)

library(bestglm)

onlyNumeric <- nba_mbb[, c("above_average", "mbb_minutes_played", "mbb_total_points", "mbb_made", "mbb_attempts", "mbb_percent"),]

onlyNumeric = na.omit(onlyNumeric)

bestglm(onlyNumeric, above_average ~ mbb_minutes_played + mbb_total_points + mbb_made + mbb_attempts + mbb_percent, IC = "LOOCV", method = 'exhaustive', family = gaussian)

## LOOCV
## BICq equivalent for q in (1.48588030768337e-10, 0.939799619840228)
## Best Model:
##                   Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept)   0.3173594418 0.0061045021 51.987768 2.540006e-182
## mbb_made      0.0029723440 0.0003490469  8.515600  3.197123e-16
## mbb_attempts -0.0009900557 0.0001344669 -7.362819  9.991703e-13

bestglm(onlyNumeric, percent ~ mbb_minutes_played + mbb_total_points + mbb_made + mbb_attempts + mbb_percent, IC = "LOOCV", method = 'exhaustive', family = gaussian)

## LOOCV
## BICq equivalent for q in (1.48588030768337e-10, 0.939799619840228)
## Best Model:
##                   Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept)   0.3173594418 0.0061045021 51.987768 2.540006e-182
## mbb_made      0.0029723440 0.0003490469  8.515600  3.197123e-16
## mbb_attempts -0.0009900557 0.0001344669 -7.362819  9.991703e-13

bestModel <- glm(data = onlyNumeric, above_average ~ mbb_made + mbb_attempts, family = binomial)
summary(bestModel)

## 
## Call:
## glm(formula = above_average ~ mbb_made + mbb_attempts, family = binomial, 
##     data = onlyNumeric)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8478  -0.6288  -0.5511  -0.4899   2.1750  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.791373   0.220388  -8.128 4.35e-16 ***
## mbb_made      0.036485   0.011847   3.080  0.00207 ** 
## mbb_attempts -0.012327   0.004622  -2.667  0.00765 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 390.96  on 411  degrees of freedom
## Residual deviance: 372.45  on 409  degrees of freedom
## AIC: 378.45
## 
## Number of Fisher Scoring iterations: 4

bestModel1 <- glm(data = nba_mbb, percent ~ mbb_made + mbb_attempts, family = binomial)

## Warning in eval(family$initialize): non-integer #successes in a binomial glm!

summary(bestModel1)

## 
## Call:
## glm(formula = percent ~ mbb_made + mbb_attempts, family = binomial, 
##     data = nba_mbb)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.89689  -0.05933   0.05872   0.12405   1.59246  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.887486   0.168782  -5.258 1.45e-07 ***
## mbb_made      0.008662   0.009948   0.871    0.384    
## mbb_attempts -0.002977   0.003837  -0.776    0.438    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 39.167  on 427  degrees of freedom
## Residual deviance: 38.006  on 425  degrees of freedom
##   (41 observations deleted due to missingness)
## AIC: 316.45
## 
## Number of Fisher Scoring iterations: 4

cv.glm(data = onlyNumeric, bestModel, K = 5)

## $call
## cv.glm(data = onlyNumeric, glmfit = bestModel, K = 5)
## 
## $K
## [1] 5
## 
## $delta
## [1] 0.1441580 0.1438464
## 
## $seed
##   [1]       10403         587 -1751070999  1130615835  -467567574 -1554489524
##   [7]   555009295 -1454913467  -821300740 -1245536174   513034701   852810631
##  [13]  2027802222  1131724360   429414475  -210325079  1238584568  -589506522
##  [19]   291391809   926320339  1567960706   933893300 -1682060233  -139579603
##  [25] -1369248700   242282858 -1879413291   -13546641  1339142774 -1426807392
##  [31]   670691939   420197121  -393958800 -2072033778  1767870073   -87074293
##  [37]  -725511366 -2083935684  1557047679 -1368874283  -146853140  -296065982
##  [43]  1471437725  -442584553 -1527101570  2129913144 -2142572005  1733262201
##  [49] -1115677240 -1984093706   888681521   723568227   -52889038 -1753864316
##  [55]  1427690951 -1413956451 -1682899948   167877306 -1252214011  1429332191
##  [61] -1182008986  -832197392 -1968252909  -463172175  -420738976  -229177858
##  [67] -1000148599  1145136827 -1919999350   330186092  1308756719  -713706203
##  [73] -1407138852  1193210290  1030434989  -326303641 -1350211698 -1487040408
##  [79] -1847182229 -1662523895  -860551016  1629713350   119441185   890315699
##  [85]   893635362   635763988   200895895 -1611410035  1833530532  -968698230
##  [91]  2041171189  1527599951   437816406   454627072  -571110845  -807882271
##  [97]  -770468528  1508232110 -1700976295    57759595   635070170  -143923300
## [103]  1497299871   333767157  -563440500   211582306  -180211395  1073109431
## [109] -1291051426  -235817064  -877906181 -1372602727 -1320917208  -552821290
## [115] -1186348463  1079499011 -2049272686   663513636 -2128464665  1300375869
## [121]  -883196492   512066074   348735845  -300222465  -992170362  1052571152
## [127]  1105194163 -1865554095 -1830499584    -1019426  -210920919  2081380827
## [133] -1411152150  1157948556   298593231   540435333  -466096068  1572946194
## [139]   725616141  1921881415  -323436754  -521203064    93647755   229385065
## [145]  -829186760   -98095130 -1074141311  1689566995  1366100162 -1306745996
## [151]  1197551479   518740077   663746308  -874494422  1487055765 -1475686353
## [157]   -64001738  1276447072   837748643   865259713  -890822480  1470754766
## [163]  -646339783   202632011  2144387322 -1418380804  2122308159  1824403221
## [169]   223394860  1825924354 -1331851811 -1880814377   233019326   141935096
## [175]  1133868507 -1438147527   380579208   815655222  1442083697   693203363
## [181]    33025266  1638838340  -370056441  -805522723 -1800528940 -1487709446
## [187]  1466023493   -58268001 -1922466778  -691283792   394338643  -478667663
## [193]  2132599200 -1589527234  1734581321  -205940485   519033034 -1005361620
## [199]  2045360687  1423520485   215386012    95842802   869178477  -321688153
## [205] -1082402866   950859048  1909270315   213495113 -1581534632  2064321286
## [211]   659023585  -620989837  1277049570   544147028  2036567895   692659021
## [217]  1001866980  -148945206  1104381749  1433628047  2105428630  -546282944
## [223]  1659724547  -540290271  -966831088 -2027179026 -1623899239  -119604181
## [229]  1857046810  1929917020  1113155026   368664064   -97912492 -1282061080
## [235]  -951781126  1690696400 -1990066388   -38394636  1839873426 -1313394336
## [241]   804106252 -1977808672   -83782046  -320810840 -1242980276 -1492240516
## [247]   213331250  -334524816  1241306948  1441491928 -1558665350  1409598512
## [253]   462194620   855169620 -1695057854  -710654304 -2126916932  -952004656
## [259]   593768802   113339112 -1551807284  -940881412 -1852116494   881227904
## [265] -1000571404 -1015223672 -1151408326   767926608 -1236303764 -2064587372
## [271]  -807275438   959656416   311259340  1034762912  1376418178   149785832
## [277] -1744268564  -163432132  1421407346  2091630192   -12958172 -1951455176
## [283]  -746367718  2128908496   597998908  1335656340  1552580482  1362108352
## [289]  -266582468   822020496  -578444254  1310292232 -1634274164  1063871260
## [295]    63690898  1198311808   291101716 -1889525976  -912964038   179450320
## [301]    27341164 -1051379788  -413024750  1714988000 -1414437428   523273440
## [307]  -144011230  1355773736  1640707596 -1790766276  2145383410  -734369040
## [313]   613493316 -1402100136  -814926022 -1221323792  -560339268  -644930284
## [319]  1726574786  -904160160   763115900   766629840  -399779550   165998248
## [325]  1990982540   161864892   571785010  -284016064  -630709068 -1071512504
## [331]   883677626  1073298960  1112574188  1416386900   925177618  1893594528
## [337] -1792959796  1760630368 -1322041406   898425640 -1787449940   448622908
## [343]  1273388914  1002389808  1639141156  1113269560  1588157146  1355171728
## [349]   709825276 -1380561004  -464435134   183909632   466070588 -1185927856
## [355]   -71313246  1547958088  -776030452   832228060  1987867474  1392521856
## [361]  1478596692 -1275630616   486242042   317712720   296272684   -66009228
## [367]  1588814738   578385504 -1577373812 -1110551712   605161442   775061416
## [373] -1573692212  1670516220  1383656370  -555136528 -1693914684  -372039336
## [379]    96912762  -820989520 -1350627396  -229994412 -1795906622   313852960
## [385]   532503740 -1998366128   582713314   411473512   700865100  1633476860
## [391]  1445818354  -748757760 -1950599308 -1573724152   655926714 -2142306864
## [397]   487850988  2022883220  -924927022  -557123744  1825629516 -2058203872
## [403] -1585855614 -2077377816   435938540 -1346639300   804932466 -1693128080
## [409]   869267492  1074399032 -1509036902  1006613200 -2099904324  -439495020
## [415]  -695294334   127409344 -1475088068 -1203179376 -1466062302 -2120456696
## [421] -1424146164 -1088597348 -1645580782  -733418112   -11985388  -148629976
## [427]  2097063226 -1751660592  1119277676  -306829516  -603528558   604846432
## [433]   818024780 -1374916640   634429346   185448360  1863388044  -718526788
## [439]  -517584014  -290244368    16586820 -1091314344  1334547514  -972356752
## [445] -1112046532 -2032424172  2114341314  -947650336 -2054768772 -1906499888
## [451]  -340162910  2084488232 -1779881716  1151890364 -1830716110 -1204199488
## [457]   357403572 -1317047355   426074895   644145544 -1715047258   142754899
## [463]  1658684853  1326607122 -1494525728 -1851790007  -507662469 -1236796852
## [469] -1983822886   813500879 -1341808743   993523630 -1893925700 -1030064851
## [475]  -220059945 -1368133856  1759438974  1161809483   894821453 -1348572550
## [481]  1661502136   689502305 -1673541101  1012310068  -385016254 -1305839017
## [487]   827083169   814811750 -1759667612   963406229  1842741951   145939032
## [493]  -862335370  1611387523  1205151429  1660097506  -961802672  1590492025
## [499]   165515435 -1525204964  1207293770 -1606963393  2125853065  1100706206
## [505] -2062718900 -1566137827  2031456167   362685136 -1208437650   736455451
## [511]  1330422909  -106431894   204262152 -1479916623   555566115   351595812
## [517]  2035935698   281803303   198288497   890190518   531670612  -127087899
## [523]   335316719 -1823059416 -1904052602  1002930355   259757589  -546514830
## [529] -1727265600    68557097    98148251   357791852  1342925946   134297007
## [535]   175158457  -724730162  -672089060 -1640980595  -524666377 -1617343232
## [541]  -245978274   580729899 -1078824147    15900570 -1616065192  -633790271
## [547]  -118988941 -2057657196  1254904098  2003972151    60440833  -375098362
## [553]   956256836  -693953675 -2051670305  1793428024 -1605041642   -56379869
## [559]  2008474085 -1928138750   978455152 -1419198439  1450416651   154087676
## [565]   415247658   981644127  1462485737  1378929150   370578924  -671472323
## [571]  1363102983  -303797008  1230041230  -810841413   120709789 -1694861750
## [577]  -930890648  1087231057  1213005635  -278730172 -1725126414  1049973959
## [583]  1568239249  1193291798   578812532 -1378042619  2128802639  -988635832
## [589] -1841721114 -1786837485   876665205 -2025841326  -152531552  -389055351
## [595]  1062948027   884137740  -584403302 -1578537969  1118365785   306709742
## [601]  1580334972 -1452676115 -1814031209    94723936 -1356409026  -636418037
## [607] -1759989619   297904570  -795326472   579494817  1694195027 -1777429388
## [613]  -341015038 -1966712681   856558177 -1735097690   783895204 -1044234411
## [619]  -975993857   246645400 -1339775050  -618001341 -1682086523  -583074270
## [625]  1407296784  1510369401

guards <- nba_mbb %>%
  filter(str_detect(athlete_position_name.x, "Guard"))

guardsModel <- lm(percent ~ mbb_percent, data = guards)

library(lme4)

glmm_shooting <- glmer(above_average ~ mbb_percent + (mbb_attempts | athlete_position_name.x),
                         family = binomial, data = nba_mbb)

## boundary (singular) fit: see help('isSingular')

summary(glmm_shooting)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: above_average ~ mbb_percent + (mbb_attempts | athlete_position_name.x)
##    Data: nba_mbb
## 
##      AIC      BIC   logLik deviance df.resid 
##    396.6    416.7   -193.3    386.6      407 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.2201 -0.4933 -0.4542 -0.3911  3.5506 
## 
## Random effects:
##  Groups                  Name         Variance  Std.Dev.  Corr 
##  athlete_position_name.x (Intercept)  2.536e-02 0.1592474      
##                          mbb_attempts 6.461e-07 0.0008038 -1.00
## Number of obs: 412, groups:  athlete_position_name.x, 7
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.5498     0.5918  -4.309 1.64e-05 ***
## mbb_percent   2.9318     1.6817   1.743   0.0813 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## mbb_percent -0.963
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

library(broom.mixed)

group_raneff <- tidy(glmm_shooting, effects = "ran_vals")