Our project uses data from the American Kennel Club (AKC), which we
obtained from TidyTuesday. The datasets contain data on various dog
breeds, which includes information regarding physical characteristics,
temperament, and popularity based on registration statistics with the
AKC from 2013-2020. The data comprises three different sets:
dog_ranks, dog_traits, akc_data
.
dog_ranks: contains 195 rows. Each row represents the corresponding breed’s popularity ranking out of 190 breeds based on registration statistics with AKC from 2013-2020. There are 10 variables.
Breed
: the AKC-registered dog breed
2013 Rank
: the breed’s ranking in 2013
2014 Rank
: the breed’s ranking in 2014
2015 Rank
: the breed’s ranking in 2015
2016 Rank
: the breed’s ranking in 2016
2017 Rank
: the breed’s ranking in 2017
2018 Rank
: the breed’s ranking in 2018
2019 Rank
: the breed’s ranking in 2019
2020 Rank
: the breed’s ranking in 2020
group
: the breed’s group classification.
dog_traits: contains 195 rows. Each row represents the corresponding breed’s trait score on a 1 through 5 scale, with 1 being the lowest and 5 being the highest. There are 17 variables.
Breed
: the AKC-registered dog breed
Affectionate With Family
: the breed’s score
corresponding to family affection.
Good With Young Children
: the breed’s score
corresponding to being good with young children.
Good With Other Dogs
: the breed’s score corresponding to
how well they get along with other dogs.
Shedding Level
: the breed’s score corresponding to how
much the breed sheds.
Coat Grooming Frequency
: the breed’s score corresponding
to how often they need to be groomed.
Drooling Level
: the breed’s score corresponding to how
much the breed drools.
Coat Type
: the breed’s type of coat.
Coat Length
: the breed’s coat length, designated by
either short, medium, or long.
Openness To Strangers
: the breed’s score corresponding
to how open they are with strangers.
Playfulness Level
: the breed’s score corresponding to
how playful the breed is.
Watchdog/Protective Nature
: the breed’s score
corresponding to how protective they are.
Adaptability Level
: the breed’s score corresponding to
how adaptive they are.
Trainability Level
: the breed’s score corresponding to
how trainable they are.
Energy Level
: the breed’s score corresponding to how
energetic they are.
Bark Level
: the breed’s score corresponding to how much
they bark.
Mental Stimulation Needs
: the breed’s score
corresponding to how much mental stimulation they need.
akc_data: contains 278 rows. Each row represents the corresponding breed’s trait score on a 0 through 1 scale, there are also character variables representing qualitative traits. There are 4 quantative variables. There are 20 variables.
description
: 1 to 3 paragraphs describing the breed
temperament
: breed temperament described in
keywords
popularity
: popularity ranking (1-195)
min_height
: minimum height in cm
max_height
: maximum height in cm
min_weight
: minimum weight in kg
max_weight
: maximum weight in kg
min_expectancy
: minimum life expectancy in years
max_expectancy
: maximum life expectancy in years
group
: one of 9 breed groups assigned by the akc (7
main groups and 2 extra)
grooming_frequency_value
: A number representing the
level of required grooming
grooming_frequency_category
: A categorization of
grooming requirements
shedding_value
: A number representing the level of
shedding
shedding_category
: A categorization of shedding
frequency
energy_level_value
: A number representing the breed’s
energy level
energy_level_category
: A categorization of energy
level
trainability_value
: A number representing the breed’s
trainability
trainability_category
: A categorization of
trainability
demeanor_value
: A number representing the breed’s
reaction to strangers and other pets
demeanor_category
: A categorization of reaction to
strangers and other pets
The main research questions of the project explore dog breed popularity trends in America across 2013 to 2020. We examined several factors of breeds, including size, temperament, suitability for different living environments, and ease of maintenance. We wanted to explore relationships across different dog breed traits and compatibility. Moreover, we wanted to see what kinds of characteristics of dog breeds have continued to be popular over time and examine their relationship with suitability and preferences for different types of people interested in having dogs.
We wanted to learn more about what kinds of breeds are popular, suggesting with look at the distribution of groups of breeds.
The above graph suggests that distribution of group popularity has remained relatively unchanged from 2013 to 2020. Working, Sporting, and Toy dog breeds have remained the most popular, while Hound, Terrier, and Herding groups remained the least popular. As a result, there seem to not be significant shifts in breed type preferences among dog owners during the time period. Stability in the popularity of certain dog breed groups may be attributed to breed characteristics, which we further examine below.
We also wanted to look at the stability of the top-ranked dog breeds, suggesting we examine the breed rankings across 2013-2020 of the most consistently top-ranked breeds.
The above time-series plot demonstrates the three most popular dog breeds, Labrador Retrievers, German Shepherds, and Golden Retrievers, maintained consistent popularity from 2013 to 2019. French Bulldogs have surged in popularity ranking 11th in 2013 to 2nd in 2020. Overall, from 2013 to 2020, the top three most popular breeds remained relatively consistent until 2019, while the fourth through eighth most popular fluctuate across the period.
Among the eight most consistently popular breeds, we wanted to find out what personality characteristics were most common, suggesting we examine the trait descriptions relating to dog temperament. These eight breeds lacked characteristics regarding independence and strength.
The above word cloud suggests friendliness and intelligence are the two most common characteristics across the eight most popular breeds from 2013 to 2020. Other important traits include the activeness, confidence, and courageousness, suggesting that high energy levels and protective instincts are also characteristic among the popular breeds.
Different dog breeds have varying levels of trainability, affection towards their owner family, and how they behave towards children. We wanted to examine each of these variables together for the top 50 popular dogs.
In this graph, we observe the counts of the Affection with Family trait based on Trainability level of that dog breed. It is clear that most dog breeds have are at least a 4 Trainability level or higher. Each of these bars is colored to reflect the Good with Young Children variable. While there are few dogs with a rating of 4 for Good with Young Children, we see that there is a split between dogs breeds that are a 3 and 5 for this trait. An ideal dog for a family would be a 5 in each of these categories.
We were interested in learning about common physical characteristics among highly rated breeds. To answer this question, we focused on physical characteristics concerning height, weight, and coat. We started off by splitting our overall data set based on the top 50% highly ranked breeds, and compared the density distributions of the minimum heights and weights between popular and non-popular dog breeds.
Upon initial inspection, the density curves between the minimum weights by popularity both show 2 peaks; however, the density of non-popular breeds show smoother peaks. We also notice that the curve for Popular breeds has a higher density value at its 2 peaks, in comparison to the Non-Popular curve; this suggests a potential difference between the distributions of the minimum weights. To compare the distributions between popularity, we ran a two-sample Kolmogorov-Smirnov test to determine if there was a statistically significant difference between minimum weight distributions.
##
## Exact two-sample Kolmogorov-Smirnov test
##
## data: all.pop.dogs$min_weight and all.nonpop.dogs$min_weight
## D = 0.090573, p-value = 0.6831
## alternative hypothesis: two-sided
Based on the output above and our p-value of 0.683, we fail to reject the null hypothesis that assumes the distribution of minimum weights by popularity are equal. This indicates that there is not a significant difference between the minimum weights of dog breeds, suggesting that dog owners do not have a preference towards lighter or heavier breeds. We note that this differs from our initial observation based on the differences between the curves. This contrasting result may be due to the power limitation of the KS-test; we continue our analysis with this in mind.
Similarly, we repeated this process to investigate differences in minimum heights.
##
## Exact two-sample Kolmogorov-Smirnov test
##
## data: all.pop.dogs$min_height and all.nonpop.dogs$min_height
## D = 0.10642, p-value = 0.4975
## alternative hypothesis: two-sided
The plot above shows a clear difference in the peaks between popular and non-popular dogs. After running another Kolmogorov-Smirnov test, we found that the p-value was 0.4975. We fail to reject the null hypothesis and conclude that the distribution of minimum heights between popular and non-popular dog breeds are not significantly different.
Based on these analyses, we found that dog owners do not have a preference towards smaller or larger dog breeds.
Continuing our analysis, we were interested to see if dog owners prefer dogs with fewer grooming needs and potential allergens. To achieve this, we created a bar graph of shedding frequencies for the top 8 ranked dog breeds, conditioned on their grooming frequencies.
Based on this graph, we see that the shedding frequencies generally range from 0.6-0.8 and the most common shedding frequency is 0.4. There is a notable outlier seen in Beagles, who have the lowest shedding frequency of 0.2; however, this breed also has the highest grooming frequency, which may explain the decreased shedding frequency. We can conclude from this graph that shedding and grooming maintenance is not a significant factor between dog popularity. Dog owners do not seem to prefer or reject dog breeds with higher shedding and grooming needs; therefore, potential allergies raised from dog breeds do not seem to be an influential factor when choosing what dog breed to raise.
Through our analysis of dog breed popularity, temperament, and physical characteristics, we present the following findings:
Overall, from 2013 to 2020, the distribution of popular dog breeds remained relatively stable. Labrador Retrievers, German Shepherds, and Golden Retrievers emerged as the top three breeds for seven consecutive years because of their friendliness, intelligence, and protective nature, with their grooming needs and size showing minimal influence on their popularity. One limitation of our research is that the AKC does not recognize any mixed dog-breeds. For example, goldendoodles and labradoodles, which rose in popularity during the 2010s.
For future work, if we had a more updated dataset including popularity information for the years after 2020, we could conduct an analysis on dog popularity trends and characteristics pre-COVID and post-COVID. According to the American Veterinary Medical Association, dog-owning households grew from 38% in 2016 to roughly 45% in 2020-2022. Because individuals and families were spending more time at home, there may have been changes in the personalities and physical traits of the popular dog breeds that were adopted, fostered, or owned during COVID. People might have had more time to dedicate to their pets during COVID, and thus, the trends in popular dog breeds and characteristics might have changed.