Intro

Shark tank is a popular business reality show. The show features a panel of wealthy investors who listen to pitches from aspiring entrepreneurs.

Since launching in 2009, Shark Tank has become popular for its high-stakes negotiations and dramatic moments, as well as its inspirational stories of entrepreneurs who have turned their dreams into successful businesses. This report focuses on describing the relationship between companies and investment. Our main goal is to provide helpful information to entrepreneurs in order to implement in their business strategies better.

Our dataset includes the quantitative variables of:

askedfor - amount of money asked for by entrepreneur

exchangeForStake - %stake asked for shark by entrepreneur

valuation - how much the entrepreneur values their company

We also consider a range of categorical variables including:

description - company description

location - city, state of the company

episode - episode number

season - season of the show

category - category of the company

deal - deal or no deal

shark1 - name of 1st shark

shark2 - name of 2nd shark

shark3 - name of 3rd shark

shark4 - name of 4th shark

shark5 - name of 5th shark

Shark Preferences.

We want to know whether or not each shark has a preference on the category of company.

## 
##  Pearson's Chi-squared test
## 
## data:  table(sharkCat$broaderCategory, sharkCat$deal)
## X-squared = 73.888, df = 12, p-value = 5.957e-11

Since we get a p-value less than the alpha of 0.05, we conclude that there is a statistically significant association between the category and deal. We then continue onto the plot to have a visualization of categories against the sharks.

This plot shows that all companies with the broader categories of Beverages, Electronics and Automotive, and Gardening and Home Improvement are always approved by every Shark. However, each Shark tends to only accept deals in certain broader categories such as Family or Food. So, companies with categories of Beverages, Electronics and Automotive, and Gardening and Home Improvement have a higher likelihood of getting a deal.

Show Progression and Valuation

We also want to explore the relationship between episode number (show progression) and the valuation of companies.

This barchart shows that as the seasons progress, the valuation of each company increases significantly. There could be several reasons why the valuation of companies has increased as the seasons progress in addition to external factors such as changes in the market or overall economic conditions. The first possibility could be exposure since as the show gains more popularity and viewership, the exposure for companies on the show increases. This means that more people are becoming aware of the companies and their products, leading to increased interest and demand. The second possibility could be investor confident since investors may become more confident in their ability to identify successful companies and products. This could lead to higher valuations for the companies, as investors are more willing to take risks and invest larger amounts of money. The last possibility could be show format since the show itself could also play a role in increasing valuations as the seasons progress. For example, as the show gains popularity, it may attract more successful entrepreneurs and companies, leading to a higher quality of pitches and products. This could in turn lead to higher valuations, as the companies on the show are more likely to be successful.

Location and Valuation

In order to learn about the relationship between the location of the entrepreneurs and the size of their valuations, we examined their states’ populations as well as their valuations. A relationship between state population and valuation could indicate that larger states have better resources for entrepreneurship.

This scatterplot shows the relationship between state population and valuation. It appears that there might be a slightly positive correlation between state population and valuation, but we cannot be sure from the graph. A positive relationship would mean that entrepreneurs from larger states have larger companies, which could indicate that states with higher populations have more resources/value even relative to their size.

We can run a linear regression to see if this relationship is significant.

## 
## Call:
## lm(formula = valuation/1000 ~ statePop, data = sharkLoc)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2454.4 -1681.3 -1224.8   -99.6 27835.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1771.94     295.14   6.004 3.74e-09 ***
## statePop       19.45      11.96   1.626    0.105    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3756 on 493 degrees of freedom
## Multiple R-squared:  0.005335,   Adjusted R-squared:  0.003317 
## F-statistic: 2.644 on 1 and 493 DF,  p-value: 0.1046

By looking at the p-value of the slope of the valuation against the population, we can determine if the relationship is significant or not. As the p-value is not less than 0.05, we fail to reject the null hypothesis that there is no relationship. In essence, the evidence from this dataset is not strong enough to support the claim that there is a relationship between state population and valuation.

We can also explore if there is a relationship if we categorize states as either large, medium, or small.

With this categorization of states, it again appears that there is only a marginal difference between state sizes and valuations. However, interestingly, medium states have the lowest valuations, albeit not by much. In addition, large and small states have very similar valuations.

Predicting Shark Tank Success

We are also interested in predicting the success of shark tank applicants. A better understanding of what factors influence investor decisions can help founders create better pitches.

Our response of interest is pitch ‘success’. A pitch is considered successful if the founders earned a deal (denoted as 1). The table below indicates that more pitches are successful than unsuccessful. This may reflect selection bias, with show creators more likely to broadcast high quality pitches. Regardless, the difference is small and does not cause concern about class imbalance.

## 
## Failure Success 
##     244     251

We use logistic regression to model the relationship between deal success and predictors. Our first model will include ‘product category’ and whether or not a company had multiple founders as categorical predictors. In addition we include the following quantitative predictors: description length, title length, website length, valuation, exchange for stake, and amount asked for.

The full model achieves an AIC (an estimator of prediction error) of 728. In adition, we notice that only ‘askedFor’ and ‘descriptionLen’ were considered ‘significant’, with coefficient p-values below the standard .o5 significance threshold.

Next, we run backwards elimination to find a reduced model that minimizes AIC.

## Analysis of Deviance Table
## 
## Model 1: success ~ category + multiple_entreprenuers + descriptionLen + 
##     websiteLen + titleLen + valuation + askedfor + exchangeforstake
## Model 2: success ~ descriptionLen + valuation + exchangeforstake
##   Resid. Df Resid. Dev  Df Deviance Pr(>Chi)
## 1       434     606.00                      
## 2       491     667.26 -57  -61.258   0.3259

Our reduced model only includes description length, valuation, and exchange for stake as predictors. The model achieves an AIC of 675 hich is lower than our full model. To determine whether or not, the full model is justified, we run a deviance analysis test. Our null hypothesis is that the full model does not have a significant increase in predictive power. After running our test, we obtain a p-value of .3259, which falls outside of .05 significance threshold.

The calibration curve of our logistic regression is shown below, with the ideal response shown as a diagonal line from 0 to 1. This corresponds to the case when the fraction of successes is equivalent to our estimated rate of occurrence. Looking at the chart, we can see that the fraction of successes is noticeably higher than our predicted rate for estimated probabilities of .25 and .9. This indicates we are underestimating the amount of successes at these probability levels.

Since the reduced model is simpler and achieves better predictive performance than the full model, we will proceed with the reduced model. Analyzing the model coefficients, we can see that valuation is associated with almost no difference in the odds of getting a deal. We also notice that every one percent increase in exchange for stake is associated with a multiplication of the odds of deal success by ~.977. This means that requests for higher percentage of the company are associated with a decrease in the odds of getting a deal. The confusion matrix of predictions is shown below.

The confusion matrix indicates our model produces more false positives than false negatives at a .5 classification threshold. Overall, the in sample classification accuracy of our reduced model is ~.6.

Conclusion

In summary, our analysis revealed that Sharks on the show tend to show interest in deals only within specific categories, such as Family or Food, and companies related to Beverages, Electronics and Automotive, and Gardening and Home Improvement may have a greater chance of success. Additionally, we observed a positive trend in the requested valuation during certain seasons, likely due to economic shifts and other factors. Conversely, a higher amount requested for a deal decreases the likelihood of it being accepted. These findings can aid entrepreneurs in crafting more effective pitches and identifying patterns in their own business behavior.

For future researchers, a statistically significant causal relationship could be further explored to exactly determine if the correlations are causal. An interesting focus could be splitting up rural and urban locations to see if there are any differences. Ultimately, this report created an introductory analysis that can most definitely be further explored in the future.