Steph Curry, Klay Thompson, Ray Allen, Ben Simmons. Wait, scratch that last one. These are some of the greatest shooters in NBA history, but where did they come from and how did their teams figure out they were going to be so great? In recent years, accurate 3-point shooting has changed the way basketball is played. To properly optimize an offense, shooters are a necessity for any team with playoff ambitions. If NBA teams can find legitimate trends, they can make causal inferences about what college stars now are going to become NBA sharpshooters later. Being able to more accurately predict draft prospects NBA shooting would allow for better, safer drafting. In this report, we aim to test the predictability of NBA 3-point shooting, using college basketball 3-point shooting, in order to better evaluate draft prospects. This is an incredibly interesting question and we aim to find the best predictors (an just how accurate those predictors are) in order to find the next Steph Curry and in order to avoid the next Ben Simmons.
The data used in this report comes from the hoopR package in R. In hoopR, there is a multitude of play by play and box score basketball data, coming from ESPN. We specifically looked at NBA player stats from 2007 to 2020, and college player stats from 2014-2023. In order to use this data, we merged the two datasets, matching for the common players. This gave us with 469 players in total. The dataset includes the following variables:
Variables
mbb_attempts: Total college 3-point attempts per player
mbb_made: Total college 3-point attempts made per player
mbb_percent: College 3-point percentage per player
mbb_total_points: Total points for college players
mbb_total_minutes: Total minutes for college players
player_position: What position the player had in college
nba_attempts: Total NBA 3-point attempts per player
nba_made: Total NBA 3-point attempts made per player
nba_percent: NBA 3-point percentage per player
above_average: Whether a player shot above the league average (37%) for 3-point shooting
We proceed to conduct some exploratory analysis to better understand the variables. Specifically, we look at the distributions of attempts done and shots made by both NBA players and college players, as well as the proportion of “above average” players in the NBA, which we define as players whose percentage of shots made are greater than 37%. As seen in the scatterplots of college player attempts, there is a reasonable spread, with the majority of the players resting from 0 to 800. In NBA player attempts, there is a stronger concentration of players at the lower end of attempts around the range of 0 to 2000, with some outlier players at the range of 2000 to 3000. This makes sense, as players usually playing at college for a maximum of 4 years, while in the NBA, theres a much larger range of playing years. This also applies to made shots, in which the majority of college players are fairly spread around 0 to 300, and NBA players, in which a large portion of players are concentrated with 0 to 400, with outliers reaching to 1200-1400. Players defined as “above-average” in NBA shooting make up around 20%, with the other portion making up around 80%.
Before we go into the methods, it’s always a good idea to look at how a basic linear model looks against the data. The model shown below is simply regressing college percentage on nba percentage, and while the data points seem to show a clear linear relationship, the model itself doesn’t seem to fit the data very well. We’ll examine why that is later, but it’s something to keep track of for now.
When attempting to create a model predicting shooting based on college performance, our first thought was to create a Generalized Linear Model using the bestglm function in R. In order to do this, we needed to have a binary variable act as the response variable so we included the above_average variable as our response variable and used all of the relevant variables in our nba_mbb dataset as our predictor variables. When running the bestglm model, we used AIC as our information criteria and we set out family to “binomial” as we are working with a binary variable.
Before we go any further, we are going to check and make sure the necessary assumptions for this model are satisfied. First, understanding the difficulty in making interpretations off of residual/QQ plots for binomial regressions, we are going to instead analyze these assumptions using a very similar model, with percent as the predictor instead of above_average. We can do this through looking at a residual plot and a QQ plot, basically checking for heteroskedacity and ormality. As shown below, the residual plot seems to check out as the residuals seem scattered appropriately around 0. However, there are some issues with the QQ plot, specifically around the bottom tail. Even though we can see this variation around the bottom tail, for the purposes of this project, we will forge ahead with our GLM, not forgetting this issue with our Normality assumption.
When running the model, we found that the bestglm function chose mbb_made and mbb_attempts as our two predictors. As shown below, our GLM with these two variables used as predictors shows a positive relationship between 3-point shots made in college and likelihood of becoming an above average shooter in the NBA (0.036) and a negative relationship between 3-pointers attempted in college and likelihood of becoming an above average shooter in the NBA (0.012). While percent is not included in this GLM, we can use the coefficients in order to make some assumptions here. Since, the coefficient for shots made is three times higher than the coefficient for shots attempted, we can conclude that based off of this model, college players who shoot above 33.3% have an above average chance at becoming an above average 3 point shooter, and college players who shoot below 33.3% have a below average chance at becoming an above average 3 point shooter.
We are going to quantify uncertainty for this model using 5-fold leave one-out cross validation. When doing such, we found that the uncertainty estimate for this model is .14, which seems like a pretty low value. This GLM has given interesting results and shows a statistically significant relationship between college makes/attempts and NBA shooting performance, which is unsurprising but still interesting to see.
The next model we are going to attempt to create is a multilevel model. We are going to specifically factor on what position that these athletes played in collge, using college position as a random effect. We decided to make this choice after realizing that the importance of strong shooting is definitely a lot different based on a player’s position. For example, it is much more important for guards to be above average shooters compared to centers. For the most part, guards who can’t shoot have been finding their way to the bench (Ben Simmons) but centers who can’t shoot (Steven Adams), still definitely have a role in the modern NBA. As far as fixed effects, we are going to use percent as an instrument for made and attempts here, as we believe that college 3-point shooting as a predictor operates similarly to using both college 3-point makes and college 3-point attempts.
We are also going to quantify uncertainty for this model using conditional standard deviation estimates per position. Shown below are these specific standard deviations per position.
As we can see, the far majority of the standard deviations are centered below 0, which is a major problem for our assumption that our errors are Normally distributed. Because of the fact we can’t approve the assumption here, it’s going to create limits on what predictions we can make with this model. We’ll touch upon this later in our conclusion.
When it comes to comparing these two models, we’re going to use a couple different metrics. First, we’ll compare the uncertainties of each model, and then we’ll compare whether the predictors in each model are statistically significant, and finally we’ll examine the coefficients in each model to see if the two models are telling the same story or whether they differ.
The first element of these models that we are going to compare are the coefficents for the predictors in each model. The GLM model has an intercept with a coefficent of -1.79 with a standard error of 0.22, a coefficient for mbb_made of 0.036 with a standard error of 0.012, and a coefficient for mbb_attempts of -0.012 with a standard error of 0.0046.
As this is a glm, we can interpret those variables using their impact on the log odds. This means for each additional shot made, the log odds of being an above-average 3-point shooter in the NBA increases by 3%, but for every additional shot attempted, the log odds of being an above-average 3-point shooter in the NBA decreases by 1%.
Meanwhile, the multilevel model with the random effects for position players has an intercept of -2.54 with a standard error of 0.59, and an intercept for mbb_percent of 2.93 with a standard error of 1.68. This means that for every increase in 3-point shooting percentage in college by 1 %, the log odds of being an above-average 3-point shooter in the NBA increases by 18%.
So far, both models seem to be telling us similar things, that the more accurate of a shooter that you are at the college level, the more accurate of a shooter you will be at the NBA level.
The next way that we are going to compare these two models is by looking at the significance values of the coefficients. First off, the p-values for the coefficients in the GLM model are both far below 0.05, showing that both coefficients are significant. Next in the multilevel model, where we use positions as random effects, the p-value for college shooting percentage is 0.08, which is below the conventional significance value of 0.05, which is a legitimate problem. However, as it still is under 0.10, it is considered significant when testing to a 95% level.
Finally, we are going to look back at the uncertainty estimates that we measured in the “Methods” section. We used 5-fold cross validation to calculate an uncertainty estimate of 0.14 for our GLM, which all things considered is a pretty low value. However, there were issues with meeting the Normality assumption due to variations in the QQ plot for this GLM, so this low uncertainty value could very well be misleading and inaccurate, which we have to account for when deciding whether to make predictions based on this model.
Moreover, the uncertainty estimates for our multilevel model were also problematic as we used the standard deviations for the random effect groups to measure uncertainty, and we found that the distribution of the standard deviations were not centered around 0. This tells us that there may be uncertainty in this model that the model currently does not account for, again casting doubts on the reliabilty of our multilevel model.
Even though there are plenty of problems with our multilevel model, there are lots of insights we can take away from using positions as random effects. First off, we can look at how the position a player had in college had an impact on their NBA shooting performance using this graphic below.
This gives us the order (compared to centers) of positions that have the highest shooting percentage in the NBA. Unsurprisingly, college guards have the highest shooting percentage in the NBA, then forwards, then centers. This is what a reasonable person would have assumed before conducting this analysis, but it’s interesting to see the statistical proof here. We were able to find this simply by filtering our data by position and running a simpler linear model, filtering by position, and comparing the coefficients (all of which were statistically significant). Even though this is a much simpler way of modeling compared to our multilevel model, it gives us some clear insights we can take away from it. For example, NBA teams who are simply looking for strong 3-point shooters in the draft should definitely be focusing on draftign guards.
Additionally, when filtering by college guards, the simple linear model seems to fit the scatter plot a whole lot better than it did earlier back in the EDA section. Even though it isn’t perfect, we can see below, that the model filtering on guards shows a closer relationship between college shooting percentage and NBA shooting percentage for guards.
What we set out to do here is to find predictors from college statistics that accurately and routinely predict strong shooting performance in the NBA. In order to do this, we relied on two different models, the first being a generalized linear model (GLM) using the bestglm function, second a multilevel model with college shooting percentage as a fixed effect and player position as a random effect. We learned from both models that there was a relationship between shooting percentage in the college basketball and shooting percentage in the NBA.
In the second model, we found that players who play point guard and shooting guard in college tend to have a better shooting percentage in the NBA than forwards and especially centers. Moreover, the multilevel model was stronger and showed a more significant predictive power when filtered solely to guards. This means that guards who shoot well in college or more likely to become better shooters in the NBA than forwards who shoot well in college. We’d speculate that this is because of the fact that guards tend to take more three point shots (and we also found that our model’s predictive power is correlated with attempts as well), so the larger sample size should theoretically make the model more powerful.
However, while we were able to gleam a lot of insights from these models, we must remember that there were limitations. First, we were unable to “check off” the Normality assumption in the first GLM model, which means that our model very well could have biased parameters and inaccurate results. This makes it dangerous to accept any conclusions from this model as fact. Next, the uncertainty estimates in the multilevel model were not centered around zero, which also makes it dangerous to make certain conclusions from the multilevel model.
Another limitation of our model is that it doesn’t have a minimum number of shot attempts, a lucky center could take and make one three point shot, and it could completely skew the data, and especially our cross-validation estimate. A clear next step to account for this would be to filter the dataset by a minimum number of shots, in both college and in the NBA.
One more next step that would be interesting to see would be to filter the dataset by the conference that the college basketball player was in, as it may be easier to be a good shooter against MAC defense compared the Big East. This way we can see if 3 point percentage in harder games is more impactful than 3-point percentage against easier competition.
nba_player_stats <- load_nba_player_box(seasons = 2014:2023)
# Simplify to a ranking of player stats by points per 36 min --------------
nba_player_score_summary <- nba_player_stats |>
group_by(athlete_id, athlete_display_name, athlete_position_name) |>
summarize(minutes_played = sum(minutes, na.rm = TRUE),
total_points = sum(points, na.rm = TRUE),
made = sum(three_point_field_goals_made, na.rm = TRUE),
attempts = sum(three_point_field_goals_attempted, na.rm = TRUE),
percent = made/attempts,
.groups = "drop") |>
mutate(points_per_36min = total_points / minutes_played * 36)
mbb_player_stats <- load_mbb_player_box(seasons = 2007:2020)
mbb_player_score_summary <- mbb_player_stats |>
group_by(athlete_id, athlete_display_name, athlete_position_name) |>
summarize(mbb_minutes_played = sum(minutes, na.rm = TRUE),
mbb_total_points = sum(points, na.rm = TRUE),
mbb_made = sum(three_point_field_goals_made, na.rm = TRUE),
mbb_attempts = sum(three_point_field_goals_attempted, na.rm = TRUE),
mbb_percent = mbb_made/mbb_attempts,
.groups = "drop") |>
mutate(points_per_36min = mbb_total_points / mbb_minutes_played * 36)
nba_mbb <- merge(nba_player_score_summary, mbb_player_score_summary, by = "athlete_id")
nba_mbb$above_average <- ifelse(nba_mbb$percent > 0.37, 1, 0)
nba_mbb$highShooter <- ifelse(nba_mbb$mbb_attempts > 227, 1, 0)
library(bestglm)
onlyNumeric <- nba_mbb[, c("above_average", "mbb_minutes_played", "mbb_total_points", "mbb_made", "mbb_attempts", "mbb_percent"),]
onlyNumeric = na.omit(onlyNumeric)
bestglm(onlyNumeric, above_average ~ mbb_minutes_played + mbb_total_points + mbb_made + mbb_attempts + mbb_percent, IC = "LOOCV", method = 'exhaustive', family = gaussian)
## LOOCV
## BICq equivalent for q in (1.48588030768337e-10, 0.939799619840228)
## Best Model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3173594418 0.0061045021 51.987768 2.540006e-182
## mbb_made 0.0029723440 0.0003490469 8.515600 3.197123e-16
## mbb_attempts -0.0009900557 0.0001344669 -7.362819 9.991703e-13
bestglm(onlyNumeric, percent ~ mbb_minutes_played + mbb_total_points + mbb_made + mbb_attempts + mbb_percent, IC = "LOOCV", method = 'exhaustive', family = gaussian)
## LOOCV
## BICq equivalent for q in (1.48588030768337e-10, 0.939799619840228)
## Best Model:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3173594418 0.0061045021 51.987768 2.540006e-182
## mbb_made 0.0029723440 0.0003490469 8.515600 3.197123e-16
## mbb_attempts -0.0009900557 0.0001344669 -7.362819 9.991703e-13
bestModel <- glm(data = onlyNumeric, above_average ~ mbb_made + mbb_attempts, family = binomial)
summary(bestModel)
##
## Call:
## glm(formula = above_average ~ mbb_made + mbb_attempts, family = binomial,
## data = onlyNumeric)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8478 -0.6288 -0.5511 -0.4899 2.1750
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.791373 0.220388 -8.128 4.35e-16 ***
## mbb_made 0.036485 0.011847 3.080 0.00207 **
## mbb_attempts -0.012327 0.004622 -2.667 0.00765 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 390.96 on 411 degrees of freedom
## Residual deviance: 372.45 on 409 degrees of freedom
## AIC: 378.45
##
## Number of Fisher Scoring iterations: 4
bestModel1 <- glm(data = nba_mbb, percent ~ mbb_made + mbb_attempts, family = binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
summary(bestModel1)
##
## Call:
## glm(formula = percent ~ mbb_made + mbb_attempts, family = binomial,
## data = nba_mbb)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.89689 -0.05933 0.05872 0.12405 1.59246
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.887486 0.168782 -5.258 1.45e-07 ***
## mbb_made 0.008662 0.009948 0.871 0.384
## mbb_attempts -0.002977 0.003837 -0.776 0.438
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 39.167 on 427 degrees of freedom
## Residual deviance: 38.006 on 425 degrees of freedom
## (41 observations deleted due to missingness)
## AIC: 316.45
##
## Number of Fisher Scoring iterations: 4
cv.glm(data = onlyNumeric, bestModel, K = 5)
## $call
## cv.glm(data = onlyNumeric, glmfit = bestModel, K = 5)
##
## $K
## [1] 5
##
## $delta
## [1] 0.1441580 0.1438464
##
## $seed
## [1] 10403 587 -1751070999 1130615835 -467567574 -1554489524
## [7] 555009295 -1454913467 -821300740 -1245536174 513034701 852810631
## [13] 2027802222 1131724360 429414475 -210325079 1238584568 -589506522
## [19] 291391809 926320339 1567960706 933893300 -1682060233 -139579603
## [25] -1369248700 242282858 -1879413291 -13546641 1339142774 -1426807392
## [31] 670691939 420197121 -393958800 -2072033778 1767870073 -87074293
## [37] -725511366 -2083935684 1557047679 -1368874283 -146853140 -296065982
## [43] 1471437725 -442584553 -1527101570 2129913144 -2142572005 1733262201
## [49] -1115677240 -1984093706 888681521 723568227 -52889038 -1753864316
## [55] 1427690951 -1413956451 -1682899948 167877306 -1252214011 1429332191
## [61] -1182008986 -832197392 -1968252909 -463172175 -420738976 -229177858
## [67] -1000148599 1145136827 -1919999350 330186092 1308756719 -713706203
## [73] -1407138852 1193210290 1030434989 -326303641 -1350211698 -1487040408
## [79] -1847182229 -1662523895 -860551016 1629713350 119441185 890315699
## [85] 893635362 635763988 200895895 -1611410035 1833530532 -968698230
## [91] 2041171189 1527599951 437816406 454627072 -571110845 -807882271
## [97] -770468528 1508232110 -1700976295 57759595 635070170 -143923300
## [103] 1497299871 333767157 -563440500 211582306 -180211395 1073109431
## [109] -1291051426 -235817064 -877906181 -1372602727 -1320917208 -552821290
## [115] -1186348463 1079499011 -2049272686 663513636 -2128464665 1300375869
## [121] -883196492 512066074 348735845 -300222465 -992170362 1052571152
## [127] 1105194163 -1865554095 -1830499584 -1019426 -210920919 2081380827
## [133] -1411152150 1157948556 298593231 540435333 -466096068 1572946194
## [139] 725616141 1921881415 -323436754 -521203064 93647755 229385065
## [145] -829186760 -98095130 -1074141311 1689566995 1366100162 -1306745996
## [151] 1197551479 518740077 663746308 -874494422 1487055765 -1475686353
## [157] -64001738 1276447072 837748643 865259713 -890822480 1470754766
## [163] -646339783 202632011 2144387322 -1418380804 2122308159 1824403221
## [169] 223394860 1825924354 -1331851811 -1880814377 233019326 141935096
## [175] 1133868507 -1438147527 380579208 815655222 1442083697 693203363
## [181] 33025266 1638838340 -370056441 -805522723 -1800528940 -1487709446
## [187] 1466023493 -58268001 -1922466778 -691283792 394338643 -478667663
## [193] 2132599200 -1589527234 1734581321 -205940485 519033034 -1005361620
## [199] 2045360687 1423520485 215386012 95842802 869178477 -321688153
## [205] -1082402866 950859048 1909270315 213495113 -1581534632 2064321286
## [211] 659023585 -620989837 1277049570 544147028 2036567895 692659021
## [217] 1001866980 -148945206 1104381749 1433628047 2105428630 -546282944
## [223] 1659724547 -540290271 -966831088 -2027179026 -1623899239 -119604181
## [229] 1857046810 1929917020 1113155026 368664064 -97912492 -1282061080
## [235] -951781126 1690696400 -1990066388 -38394636 1839873426 -1313394336
## [241] 804106252 -1977808672 -83782046 -320810840 -1242980276 -1492240516
## [247] 213331250 -334524816 1241306948 1441491928 -1558665350 1409598512
## [253] 462194620 855169620 -1695057854 -710654304 -2126916932 -952004656
## [259] 593768802 113339112 -1551807284 -940881412 -1852116494 881227904
## [265] -1000571404 -1015223672 -1151408326 767926608 -1236303764 -2064587372
## [271] -807275438 959656416 311259340 1034762912 1376418178 149785832
## [277] -1744268564 -163432132 1421407346 2091630192 -12958172 -1951455176
## [283] -746367718 2128908496 597998908 1335656340 1552580482 1362108352
## [289] -266582468 822020496 -578444254 1310292232 -1634274164 1063871260
## [295] 63690898 1198311808 291101716 -1889525976 -912964038 179450320
## [301] 27341164 -1051379788 -413024750 1714988000 -1414437428 523273440
## [307] -144011230 1355773736 1640707596 -1790766276 2145383410 -734369040
## [313] 613493316 -1402100136 -814926022 -1221323792 -560339268 -644930284
## [319] 1726574786 -904160160 763115900 766629840 -399779550 165998248
## [325] 1990982540 161864892 571785010 -284016064 -630709068 -1071512504
## [331] 883677626 1073298960 1112574188 1416386900 925177618 1893594528
## [337] -1792959796 1760630368 -1322041406 898425640 -1787449940 448622908
## [343] 1273388914 1002389808 1639141156 1113269560 1588157146 1355171728
## [349] 709825276 -1380561004 -464435134 183909632 466070588 -1185927856
## [355] -71313246 1547958088 -776030452 832228060 1987867474 1392521856
## [361] 1478596692 -1275630616 486242042 317712720 296272684 -66009228
## [367] 1588814738 578385504 -1577373812 -1110551712 605161442 775061416
## [373] -1573692212 1670516220 1383656370 -555136528 -1693914684 -372039336
## [379] 96912762 -820989520 -1350627396 -229994412 -1795906622 313852960
## [385] 532503740 -1998366128 582713314 411473512 700865100 1633476860
## [391] 1445818354 -748757760 -1950599308 -1573724152 655926714 -2142306864
## [397] 487850988 2022883220 -924927022 -557123744 1825629516 -2058203872
## [403] -1585855614 -2077377816 435938540 -1346639300 804932466 -1693128080
## [409] 869267492 1074399032 -1509036902 1006613200 -2099904324 -439495020
## [415] -695294334 127409344 -1475088068 -1203179376 -1466062302 -2120456696
## [421] -1424146164 -1088597348 -1645580782 -733418112 -11985388 -148629976
## [427] 2097063226 -1751660592 1119277676 -306829516 -603528558 604846432
## [433] 818024780 -1374916640 634429346 185448360 1863388044 -718526788
## [439] -517584014 -290244368 16586820 -1091314344 1334547514 -972356752
## [445] -1112046532 -2032424172 2114341314 -947650336 -2054768772 -1906499888
## [451] -340162910 2084488232 -1779881716 1151890364 -1830716110 -1204199488
## [457] 357403572 -1317047355 426074895 644145544 -1715047258 142754899
## [463] 1658684853 1326607122 -1494525728 -1851790007 -507662469 -1236796852
## [469] -1983822886 813500879 -1341808743 993523630 -1893925700 -1030064851
## [475] -220059945 -1368133856 1759438974 1161809483 894821453 -1348572550
## [481] 1661502136 689502305 -1673541101 1012310068 -385016254 -1305839017
## [487] 827083169 814811750 -1759667612 963406229 1842741951 145939032
## [493] -862335370 1611387523 1205151429 1660097506 -961802672 1590492025
## [499] 165515435 -1525204964 1207293770 -1606963393 2125853065 1100706206
## [505] -2062718900 -1566137827 2031456167 362685136 -1208437650 736455451
## [511] 1330422909 -106431894 204262152 -1479916623 555566115 351595812
## [517] 2035935698 281803303 198288497 890190518 531670612 -127087899
## [523] 335316719 -1823059416 -1904052602 1002930355 259757589 -546514830
## [529] -1727265600 68557097 98148251 357791852 1342925946 134297007
## [535] 175158457 -724730162 -672089060 -1640980595 -524666377 -1617343232
## [541] -245978274 580729899 -1078824147 15900570 -1616065192 -633790271
## [547] -118988941 -2057657196 1254904098 2003972151 60440833 -375098362
## [553] 956256836 -693953675 -2051670305 1793428024 -1605041642 -56379869
## [559] 2008474085 -1928138750 978455152 -1419198439 1450416651 154087676
## [565] 415247658 981644127 1462485737 1378929150 370578924 -671472323
## [571] 1363102983 -303797008 1230041230 -810841413 120709789 -1694861750
## [577] -930890648 1087231057 1213005635 -278730172 -1725126414 1049973959
## [583] 1568239249 1193291798 578812532 -1378042619 2128802639 -988635832
## [589] -1841721114 -1786837485 876665205 -2025841326 -152531552 -389055351
## [595] 1062948027 884137740 -584403302 -1578537969 1118365785 306709742
## [601] 1580334972 -1452676115 -1814031209 94723936 -1356409026 -636418037
## [607] -1759989619 297904570 -795326472 579494817 1694195027 -1777429388
## [613] -341015038 -1966712681 856558177 -1735097690 783895204 -1044234411
## [619] -975993857 246645400 -1339775050 -618001341 -1682086523 -583074270
## [625] 1407296784 1510369401
guards <- nba_mbb %>%
filter(str_detect(athlete_position_name.x, "Guard"))
guardsModel <- lm(percent ~ mbb_percent, data = guards)
library(lme4)
glmm_shooting <- glmer(above_average ~ mbb_percent + (mbb_attempts | athlete_position_name.x),
family = binomial, data = nba_mbb)
## boundary (singular) fit: see help('isSingular')
summary(glmm_shooting)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula: above_average ~ mbb_percent + (mbb_attempts | athlete_position_name.x)
## Data: nba_mbb
##
## AIC BIC logLik deviance df.resid
## 396.6 416.7 -193.3 386.6 407
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.2201 -0.4933 -0.4542 -0.3911 3.5506
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## athlete_position_name.x (Intercept) 2.536e-02 0.1592474
## mbb_attempts 6.461e-07 0.0008038 -1.00
## Number of obs: 412, groups: athlete_position_name.x, 7
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.5498 0.5918 -4.309 1.64e-05 ***
## mbb_percent 2.9318 1.6817 1.743 0.0813 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## mbb_percent -0.963
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
library(broom.mixed)
group_raneff <- tidy(glmm_shooting, effects = "ran_vals")