Our dataset is obtained from Kaggle and focuses on companies that have appeared on the popular TV show Shark Tank. It provides comprehensive information about these companies, their pitches, and the outcomes of their deals. Each row of the data corresponds to one company and the relevant information. The dataset covers a range of variables that offer insights into the dynamics of the Shark Tank ecosystem.
In our report, we aim to analyze the factors that contribute to the success or failure of companies in securing investments, as well as the characteristics of companies that attract the interest of the “sharks” (the investors on the show). Specifically, we will be examining the following variables in our analysis:
By exploring these variables and their relationships, we aim to gain insights into the factors that influence the success or failure of companies on Shark Tank. Our analysis will provide a deeper understanding of the dynamics of the show and offer valuable lessons for aspiring entrepreneurs seeking investment opportunities.
Overall, we had three main research questions to address: + What are the trends between the different companies and valuations? + What categories (if any) do the sharks prefer throughout the seasons? + What are the trends between the companies that get a deal?
First, we wanted to explore the different variations between the categories and the company valuations. We wanted to see if certain categories tended to have larger/smaller valuations, which would ultimately help us in better understanding the make up of the Shark Tank companies.
We wanted to get a better picture of the valuations of the companies in general to gauge the sample. The histogram below shows the overall distribution of the valuations of all the companies.
Based on this histogram, the distribution of company valuation is right skewed, and ranges from less than $5,000,000 to over $240,000,000. The majority of the companies are valued at $5,000,000 USD or less.
Similarly, we wanted to gauge the sample of categories, so we visualized this through a word cloud of the categories.
The word cloud showed that the top two product categories were “food” and “care”. “Care” can refer to both the “Baby and Child Care” and “Personal Care” categories; however, since “baby” and “child” were more frequent than “personal”, it can mostly be attributed to “Baby and Child Care”. Other prominent categories were “Health and Fitness”, “Men and Women’s Apparel”, and “Professional and Consumer Services”.
In order to see the trends, we thought that there may be some patterns or changes over time, so we visualized the fluctuations of valuations throughout the episodes for each category through a heatmap.
Based on the heatmap that we generated that looks at the categories and their valuations across different episodes, we found that there is an inconclusive trend. There are a couple of darker spots on our map, specifically, Beverage, Electronics, Entertainment, and Novelties. With these categories having a higher valuation, sharks may negotiate the terms of investment based on the valuation of the company. If they find the valuation too high or unreasonable, they may either negotiate a lower valuation or decline the opportunity to invest. Conversely, if the valuation is attractive and aligns with their investment criteria, they may make an offer to invest in the company. However, most of the heatmap is still light to sky blue so many of the companies in the other categories coming in are truly in need of the sharks help. Across all of the episodes, it seems that Shark Tank did not have any bias towards which companies come onto the show since the distribution seems random.
Additionally, we wanted to identify if there were any trends in the sharks’ choices. One way to see the sharks’ preferences was through the categories of products/companies. We decided to see if the sharks favored some categories over others when deciding whether or not to give the company a deal.
In order to do so, we first made a bar graph representing the categories and comparing the deal outcomes.
While this bar graph was helpful in visualizing the variations and distribution of the category variable as a whole, we decided that creating a proportional bar chart would help us compare the deal outcomes between the categories even more.
From these graphs, we can see that some categories tend to receive more deals (such as Storage and Cleaning, Automotive, and Beverages) while others tend to receive less deals (such as Consumer and Professional Services, Apparel, and Personal Care and Cosmetics). This implies that the sharks may find certain categories to be more profitable than others—maybe categories that seem less “trendy” and have more lasting power in the longterm. From the first graph, we can see that the number of companies per category does not have a large effect on whether companies in the category were more likely to get a deal or not; the categories Apparel, Consumer and Professional Services, Food, and Health and Fitness all have 50+ companies each, but Apparel and Consumer and Professional Services companies tend to proportionally get less deals, while Food and Health and Fitness companies tend to proportionally get more deals.
To see if these trends changed across seasons, we further visualized the categories by the season. We plotted a mosaic plot to examine whether certain categories tended to be more or less present depending on the season of the show (i.e. was there a season where one category of companies may have been more prominent than another?).
Based on the shading of the Pearson residuals, we can see that a couple of specific category-season combinations were significantly more or less represented: for instance, the food category was more represented during season 4, and the baby and child care category was more represented during season 5. Conversely, the apparel category was less represented during season 5. However, the facetted bar plot does not show any evidence of these category-season combinations being more or less favored for deals by the sharks.
However, when we run a chi-square test on the table, we get a p-value of 0.6801, which is greater than 0.05.
We determined the hypotheses as the following:
H0: there is an equal likeliness for a company to be in any category and in any season
HA: there is not an equal likeliness of a company to be in any category and in any season
As a result, we do not reject the null hypothesis; this means that we do not have sufficient evidence to suggest that, given any random company in the dataset, it is equally likely to be in a specific category or from a specific season.
##
## Pearson's Chi-squared test
##
## data: table(shark_tank$season, shark_tank$general_category)
## X-squared = 63.972, df = 70, p-value = 0.6801
We then recreated the proportional bar plot from before, but faceted by season, to see if any category was more heavily favored for deals during certain seasons compared to others.
Based on the 6 plots, we can see that there is some variation between season; for instance, every single automotive company from season 4 did get a deal while no automotive company from season 1 got a deal. However, analyzing this graph would require more context of how many companies from each category go onto the show each season; for instance, only 1 automotive company went on the show during season 1, and only 2 automotive companiees went on the the show during season 4. Thus, while there may be significance variation between some seasons’ deal outcomes for specific categories, the actual number of companies may not be greatly different.
To explore the deal outcomes more, we were interested in seeing if there were certain trends or patterns in the companies specifically that ended up getting a deal. Ultimately, identifying any patterns could help future companies that are looking to secure a deal on Shark Tank. For the variables specific to the companies, we chose the company’s valuation/amount of money asked for and the company’s location to determine outcomes.
First, we explored the distributions of the amount of money asked for based on outcome so that we would better understand any differences between the two groups.
We look at the distribution of the total money asked for by the company through these two histograms. Considering majority of the money asked for is within the range of 1 Million Dollars, we plotted the histogram with a bin size of $50,000. We can observe that these two histograms are very similar in shape. The green line that marks the median at around $15,000 in the two histograms show that the median was also very similar between the two sets. From the distributions of the data sets, there seems not to be a pattern that resonates with getting the deal.
Previously, we saw no pattern between getting the deal and the money asked for by the company. However, the total money asked for alone is not good enough to explain the trend between getting the deal because a different share amounts of the company could have been asked for the same money. Therefore, we added in to account another variable “Money Asked for 1% Share” to give us more insight to a possible trend. In the below scatter plot, we plotted the square root of the money asked for per share VS square root of the total money asked for grouped by there success in getting the deal. The square root transformation is applied to both axes to reduce the impact of the outlines and more focus on the impact of the majority of the data.
From the scatter plot, we can see there is no particular difference in the trend between square root of money asked for per share VS square root of the total money asked for in the two groups. Even the regression lines of both graphs are very similar in slope and y-intercept.
To clearly see if our hypothesis that there might be a correlation between the money asked for and getting the deal, we put it to a Two Sample T-Test with the significance level of a = 0.05.
For the following t-test, we used the null and alternate hypothesis as stated below:
H0: there is no relationship between money asked for and deal outcome
HA: there is a significant relationship between money asked for and deal outcome
##
## Welch Two Sample t-test
##
## data: askedfor by deal
## t = 1.4638, df = 479.31, p-value = 0.1439
## alternative hypothesis: true difference in means between group false and group true is not equal to 0
## 95 percent confidence interval:
## -20812.46 142407.98
## sample estimates:
## mean in group false mean in group true
## 289319.7 228521.9
With a p-value of 0.1439, which is greater than 0.05, we rejected the null hypothesis. The t-test once again showed that there was no correlation between the amount asked for and whether or not the company got a deal.
Since the amount of money asked for as inconclusive, we decided to explore the companies’ locations to see if the state they are from had any impact on the deal outcomes. In order to do so, we visualized the percent of successes across a map of the states.
Based on the map, it can be seen that certain states have higher percentage of deals. However, some of these states also only have a couple of companies that have appeared on Shark Tank, so for those states, having a 100% deal outcome is not representative of possible future outcomes.
Therefore, based on these graphs and their conclusions, it can be determined that there are no clear determinants of getting a deal, whether it be monetarily or by location.
In our analysis, we explored the different companies that appeared in Shark Tank and explored relationships between different factors in determining whether or not a company gets a deal at the end.
We first explored the categories and valuations, and found that although there are certain categories that appear more, there are no correlations between the categories and company valuations. We then evaluated the trends between the categories, and found that certain categories may be more represented in certain seasons, but the category itself does not seem to affect the deal outcome significantly. And finally, we explored the possible factors of a company that influence the deal outcome, and found that the amount of money asked for and location of the company also have no significant influence on the deal outcome.
From our analyses, we found that though there are some notable mentions and outliers, there were no trends or relationships in most of the Shark Tank pitches and their respective outcomes. Based on these variables alone, we concluded that there is no determinant in whether or not a company gets a deal, other than simply having a good pitch and/or product.
We would have been able to further these findings with some additional variables, such as company popularity, age of company, which shark invested where, etc. These additions would have allowed us to determine if there really were other factors that influence the outcome of a Shark Tank episode. To add to our future work, we could also explore preferences more specifically with one or all of the Sharks. It would be interesting to see if certain sharks favor certain companies/categories with specific traits.