The “Marketing Analytics Dataset” is a dataset from Kaggle that consists of data on profiles of 2,240 customers of a company. The dataset includes information such as customer birth year, education level, marital status, income, location, buying patterns, and purchasing preferences. We manually added age categories and total number of purchases columns to make our visualizations more clear.
Using our dataset, we would like to answer three main questions:
Is there a relationship between the number of purchases and customer age? Are there certain types of purchases that are more prevalent depending on customer age?
Does the amount of money spent on different types of products depend on a household’s income?
How are purchases represented by country?
The first research question concerns itself with customer age and purchases, so we explored income, total number of purchases, and types of purchases made by five age groups: “25-34”, “35-44”, “45-54”, “55-64”, “65+”.
We first use a density plot to compare the distribution of income of each age group.
In the density plot, we can see that income ranges from 0-120K for all age groups. The oldest age groups (55-64 and 65+) are unimodal and more symmetric compared to the rest of the age groups, which are bimodal.
Since the older age groups tend to be unimodal while the younger age groups tend to be bimodal, we can infer that older age groups have more cases of individuals with medium income while younger age groups have more extreme cases of individuals with low income and individuals with high income.
Now we use a contour plot to see if there is a difference in total number of purchases of each age group as a result of distributional differences in income.
From the contour plot, we can see that there are two modes: one mode representing individuals with low income and low number of total purchases, and the other mode representing individuals with high income and high number of total purchases. However, the plot does not show much difference between the age groups since all age groups are in both modes.
Finally, we use a stacked bar plot to see if certain types of purchases are more prevalent among certain age groups.
From the stacked bar plot, we can see that all age groups had a higher proportion of store purchases than any other type of purchase. All age groups also had a considerable proportion of web purchases, and the combined proportion of deal and catalog purchases made up roughly a third of each group’s total purchases. The main takeaway from this plot is that the age groups have similar proportions of each type of purchase.
Now, we will explore the relationship between a household’s income and the amount spent on a given type of product in the past two years. We will look at three types of products - daily products (fruits, meat, fish and sweets), wine, and gold. In particular, we want to see if this relationship differs across different types of products.
Let’s first plot the amount spent on each type of product versus a household’s income, with a best-fit linear regression line (in blue) and a local polynomial regression (LOESS) curve (in red) overlayed on the scatterplot.
In each of the three scatterplots above, we can see that there seems to be a moderate, positive non-linear relationship between amount spent on a given type of product and income. Judging from how large the variation of points around the best fit curve is, it seems like the relationship is stronger (fewer variations) for wine and daily products, but weaker for gold.
Also, we can notice that the slopes of the curves for wine and daily products are bigger than those of gold. This tells us that compared to lower income, higher income seems to be associated with much higher amount spent on wine and daily products, but the difference is not very big for gold (higher income households on average don’t spend that much more than lower income ones).
To make it easier to compare among different ranges of income, let’s separate income into three separate bins: low if household income <$25,000, medium if $25,000 <= household income < $75,000, high if $75,000 < household income.
Then, let’s visualize the average amount of money spent on each type of product for different income ranges.
From the bar plot above, we can get a conclusion similar to that of earlier. As income level increases, average amount spent on all three types of products increase. The increase in amount spent on daily products and wines, though, are much higher than that of gold. To conclude, it does seem like the amount spent on a given type of product is associated with a household’s income, and the relationship differs among different types.
The map above shows the relationship between average amount of money spent on various types of products (meat, fish, gold, sweets, wine, and fruit) and country of consumer. The points are colored by country as well as positioned on the map in the center of the country they represent and sized by the average total household amount spent for that given country.
We can see from the map above that all seven of the countries in the data set appear to have similar average household spending. After examining the averages themselves we see that the difference between the highest and lowest average household spending cost varying by country is only about $100. The main conclusion that we can draw from the above map is that there does not appear to be a relationship between country of consumer and the average amount spent on the products listed above.
We can further explore this relationship with another visualization: side-by-side boxplots.
Above we see the side-by-side boxplots of total amount spent on various products. Simply observing the intervals without performing a statistical test, we can say that the distributions look similar given country, since most of them have both similar means and spread. No distribution falls outside of the range of any other distribution. Therefore, it is reasonable to assume that there is no association between average total amount spent per household and country.
Above, we examined the relationship between several demographic variables (age, household income, and country) and market-related variables (number of purchases of different types, amount spent on different product types and total amount spent). After visualizing these relationships, we discovered that household income seems to be associated with amount spent on a given type of product, which makes sense as these two variables are both directly related to money.
On the other hand, although one may expect number of purchases of different types to be associated with age, such a relationship doesn’t seem to exist, since proportion of catalog, deals, web and store purchases does not seem to differ across different age groups. Also, households in different countries seem to spend similar amount of money purchasing products.