36-315 Final Project, Fall 2023

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(googlesheets4)

## Warning: package 'googlesheets4' was built under R version 4.3.2

personality <- read_sheet("https://docs.google.com/spreadsheets/d/1lQySD0DcAxUV78sUHdypIGnD13pOLs9Dw-Hu2OW9b94/edit?usp=sharing")

## ! Using an auto-discovered, cached token.
##   To suppress this message, modify your code or options to clearly consent to
##   the use of a cached token.
##   See gargle's "Non-interactive auth" vignette for more details:
##   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
## ℹ The googlesheets4 package is using a cached token for
##   'yikunw@andrew.cmu.edu'.
## ✔ Reading from "marketing_campaign".
## ✔ Range 'marketing_campaign'.

Introduction

The dataset we are going to focus on within this analysis will be the Customer Personality Analysis dataset. It provides a comprehensive overview of customer personality and behavior for a company’s marketing analysis. The dataset includes demographic information (age, education, and marital status), financial data (income), family situation (number of children at home), purchasing behavior (amount spent on various products and frequency of purchases), online engagement (website visits), as well as responses to marketing campaigns.

Our study will focus on the relationship between different aspects of customers’ personality and their purchase behavior:

Relationship between Family Structure and Product Purchases.
Relationship between Education Level and Purchasing Place.
Relationship between Income Level and All Purchases.

Problem 1: Impact of Family Structure on Product Purchasing Behavior

(i) For this problem, we want to learn how different family structures (Single, Together, Married, Divorced) affect the spending habits on different categories(wine, gold, and meat). This means we will examine part of the Marital_Status variable as well as MntWines, MntMeatProducts, and MntGoldProds using boxplot.

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

family_data <- personality %>% dplyr::select(Marital_Status, MntWines, MntGoldProds, MntMeatProducts)

family_data <- family_data %>% filter(Marital_Status == c("Single", "Together", "Married", "Divorced")) %>% group_by(Marital_Status)

wines <- ggplot(data = family_data, aes(x = MntWines, y = Marital_Status)) + geom_boxplot(alpha = 0.2, aes(fill = Marital_Status)) + labs(title = "Marital Status vs Amount of Wines Boxplot", x = "Amount of Wine", y = "Marital Status")

gold <- ggplot(data = family_data, aes(x = MntGoldProds, y = Marital_Status)) + geom_boxplot(alpha = 0.2, aes(fill = Marital_Status)) + labs(title = "Marital Status vs Amount of Gold Boxplot", x = "Amount of Gold",y = "Marital Status")

meat <- ggplot(data = family_data, aes(x = MntMeatProducts, y = Marital_Status)) + geom_boxplot(alpha = 0.2, aes(fill = Marital_Status)) + labs(title = "Marital Status vs Amount of Meat Boxplot", x = "Amount of Meat", y = "Marital Status")

grid.arrange(wines, gold, meat, ncol = 1)

Analysis:

The above box plots compares how the different family structures may affect spending habits on some of the food categories. We chose Wine, Gold, and Meat as the food categories because it seems more likely to be affected by marital status.

Looking at the first set of box plots on the amount spent on Wine, we can identify that the “divorced” category has the highest median spending on Wine followed by “Together”, “Married”, and “Single”. The “Divorced” category shows the highest range, signifying greater variability in spending on Wine.

The second set of box plots on the amount spend on Gold suggest the “Divorced” and “Together” categories spend the most on Gold given the higher medians. Similar to the first set of box plots, the “Divorced” category has the highest range, indicating more variability in spending. Compared to “Divorced”, it is noticeable that “Together” and “Married” have more outliers than the other two categories. This may suggest that some individuals in these groups spend significantly more on Gold than the average. The IQR seems to be relatively similar across all the marital statuses, which may indicate similar spending habits.

The third set of box plots on Meat, the “Together” category shows the highest median, followed by “Divorced”, “Single”, and “Married”. “Single” category has the highest range, which likely indicates an unstable income of these individuals. On the contrary, the “Divorced” Category has a tighter distribution with more consistent spending on meat, indicating a more stable income.

Overall, we can see that the consumption level of “Together” and “Divorced” customers are higher than “Single” and “Married” customers.

(ii) In addition, we want to explore specifically whether having children (including kids and children) at home affect the spending habits on sweets. To do this, we need to create a new variable that counts the total amount of children, total_child, by adding Kidhome and Teenhome together first, and then compare it with MntSweetProducts using a barplot with Confidence Intervals (With Bonferroni Corrections).

personality <- personality %>% mutate(total_child = Kidhome + Teenhome)

personality_summary <- personality %>%
  group_by(total_child) %>%
  summarize(
    mean = mean(MntSweetProducts, na.rm = TRUE),
    se = sd(MntSweetProducts, na.rm = TRUE) / sqrt(n())
  )

ggplot(data = personality_summary, aes(x = as.factor(total_child), y = mean)) +
  geom_bar(stat = "identity", fill = "pink", color = "skyblue") +
  geom_errorbar(
    aes(ymin = mean - qnorm(0.05/6)*se, ymax = mean + qnorm(0.05/6)*se),
    width = 0.5  # This controls the width of the error bars
  ) + labs(title = "Mean Amount Spent on Sweet Products vs. Total Children", x = "Total Number of Children", y = "Mean Amount Spent on Sweet Products")

Analysis:

Based on the bar graph constructed above, families without children spend the most on sweet products, followed by families with 1 child, families with 2 children, and families with 3 children. The intervals on the bar plot suggest that there is indeed a statistically significant difference between the amount spent on sweets between families without children, families with 1 child, and families with 2 children. However,there does not seem to be a statistically significant difference between families with 2 children and 3 children.

This result is rather counter-intuitive at first as we would think that more children in a family means more sweets purchases in a family for the children. However, we can also make sense of this result as parents may want to avoid their kids to have health issues from too much sweets intake, thereby reducing sweets purchase.

Q1 Conclusion:

To conclude the first question, the box plot first offers valuable insight into how family structures influences spending habits across various product categories. Essentially, marital status plays a significant role in determining spending patterns as differences in spending on Wine, Gold, and Meat in the ‘Divorced,’ ‘Together,’ ‘Married,’ and ‘Single’ categories are clearly displayed.

From the visualizations constructed and the analysis above, we can summarize the patterns in the different categories (‘Divorced,’ ‘Together,’ ‘Married,’ and ‘Single’) in a few sentences:

Divorced: Individuals tend to spend more on wine and gold with high range. This indicates a diverse purchasing behaviors within this group.
Married: In this category, spendings on wine is the most, along with a substantial number of outliers in spendings on gold, which suggests how proportions of this group may be more engaged with luxury goods.
Single: Compared to all other groups, this groups shows low spendings on Gold and Wine categories, with wine spending being the highest. There are fewer outliers in general, suggesting more consistent spending behaviors.
Together: Individuals in this category tend to show more spendings on meat when compared to other groups, suggesting more interest in this category. This group also has high range on all products, suggesting great variability.

Across all marital statuses, wine products seem to have the highest median spending, indicating this category to be highly valued and appealed. The other two categories: Meat and gold spending differ among the different groups, indicating great diversity and spending habits.

When considering the number of children, the data suggests a decrease in spending on sweet products as the number of children increases. This could reflect different priorities in different family structures. The statistical significance of the differences between the different families with different number children also indicate that these are not random, which reflects a consistent pattern.

As a result, it is important to consider marital status and family size when analyzing consumer behavior. By leveraging this information, business can get a grasp of consumer trends and improve their market strategies to fit the needs of the customers.

Problem 2: Impact of Education Level on Digital vs Physical Store Purchases

(i) For this problem, we want to learn about whether the Education Level affects the way people purchase (online or offline). To do so, we can create an MDS plot as MDS plots generally shows similarity and differences between different category’s behaviors. To create an MDS plot, we will have to filter out the correct variables and scale the variables to create place_quant_scaled, then create the distance matrix place_dist using place_quant_scaled. Lastly, we mutate the original personality dataset with mds1 and mds2 and plot them using scatterplot colored by Education.

place_quant <- personality %>% dplyr::select(NumWebPurchases, NumWebVisitsMonth, NumCatalogPurchases, NumStorePurchases)
place_quant_scaled <- place_quant %>% scale(center = FALSE, scale = apply(.,2, sd, na.rm = TRUE))
place_dist <- dist(place_quant_scaled)
place_MDS <- cmdscale(d = place_dist, k = 2)
personality <- personality %>% mutate(mds1 = place_MDS[,1], mds2 = place_MDS[,2])

ggplot(data = personality, aes(x = mds1, y = mds2)) + geom_point(alpha =0.4, aes(color = Education)) + geom_density2d() + labs(title ="MDS plot colored by Education")

Analysis:

Upon initial observation, it may be noticeable that the data points are spread out on the graph. However, closer examination shows a cluster at the center of the plot. There is no clear separation or difference among the different education levels, as all the data points representing distinct education levels are mixed and overlapped at the center. As a result, the above graph suggests that there is no significant difference in purchasing patterns among different education levels.

(ii)_ We can also check whether there are any relationship between pairs of four quantitative variables (NumWebPurchases, NumWebVisitsMonth, NumCatalogPurchases, NumStorePurchases) based on Education by plotting a pairs plot.

library(GGally)

## Warning: package 'GGally' was built under R version 4.3.2

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

ggpairs(data = place_quant, aes(color = personality$Education, alpha = 0.2), title = "Four-way Pair Plot")

Analysis:

The above four-way pair plot suggests that there indeed are relationships between pairs of the four quantitative variable. Looking at the variable NumWebPurchases first, it is noticeable the density plots for all five categories of education level are skewed to the right, with Basic having the highest peak, followed by Graduation, Masters, PdD, and 2n cycle. For NumWebVisitsMonth, the density plots are relatively more centered, but still skewed to the right. Here, the Basic education category still has the highest peak, followed by Master, Graduation, 2n cycle, and PhD. For NumCatalogPurchases, Basic has the highest peak, where the other categories have similar distributions. And for Numstorepurchases, Basic has the highest peak, and the other categories have almost identical distributions. Based on the above observations, it is suggested that people across all the education levels have relatively low web purchases and catalog purchases, with Basic having the least purchases. The range for web visits per month is high for all education levels, suggesting great variability in web visits. Compared to web purchases, there is more variability in store purchases, with Basic having the least purchases.

Now delving into the scatterplots and correlation coefficients: There are no clear relationship between any of the variables when looking at the scatterplots. We can identify that for the scatterplot of NumWebPurchases and Num Store Purchases and the scatterplot of NumStorePurchases and NumWebVisitsMonth, there is roughly a positive linear relationship. But the other scatterplots are not clear enough to make any conclusions. Now looking at the correlation coefficients, we notice that NumWebVisitsMonth and NumWebPurchases are positively related for people of Basic education. NumCatalogPurchase and NumWebPurchases are positively related for people of all education levels. NumStorePurchases and NumWebPurchases are positively related for people of all education levels. NumCatalogPurchase and NumWebVisitsMonth are negatively related for people of all education levels except Basic. NumStorePurchases and NumWebVisitsMonth are negatively related for all education levels. NumStorePurchases and NumCatalogPurchase are positively related for all education levels. Analyzing the coefficients for each education level more specifically, we identify that Basic education level has a more positive relationship between NumStorePurchases and NumWebPurchases, 2n Cycle education level has a more positive relationship between NumStorePurchases and NumCatalogPurchase.

(iii) We notice that NumWebPurchases and NumWebVisitedMonth seems to have the least correlation among all pairs. Therefore, the current test does not give us any information between these two variables. Hence, to extract more information, we can pull out these two variables individually and check their distribution using a heatmap and perhaps find something more interesting.

ggplot(data = personality, aes(x = NumWebVisitsMonth, y = NumWebPurchases)) + geom_hex() + 
  scale_fill_gradient2(low = "white", mid = "yellow", high = "red", midpoint = 50) + geom_point(size = 1, alpha = 0.2)+ labs(title = "Heatmap of Web Purchase vs. Web Visits")

Analysis:

The heat map above suggests that there is not a strong correlation between NumWebVisitsMontha dn NumWebPurchases, but there does seem to be a pattern that is identifiable. We can see that the yellow regions mainly covers the area with web purchases below 10 and web visits between 0 to 10, while the red region centers at web purchase close to 0 and web visits between 5-10. Based on the above observations, we can conclude that most consumers make a certain amount of purchases with a moderate web visit count.

Q2 Conclusion:

According to our analysis and visualizations above, we can make the following conclusions:

From the above four way pair plot, people of Basic education purchase the least.
There is indeed a pattern that shows a positive relationship between store purchases and Web purchases. However, based on the four way pair plot, this relationship is most related to people of basic education, followed by 2n Cycle, master, Graduation, and PhD.
For all spending behavior variables, the education levels tend to show similar trends.

As a result, education levels do affect Digital vs Physical Store Purchases, but only to a certain extent. We do not have sufficient evidence to prove any significant relationship between the two, suggesting that education level is not be the most significant factor affecting spending habits. The analysis and visualization above show the relationship between different spending habits, but the relationship does not differ significantly by education level. In order to develop more effective marketing strategies and analyze customer behaviors, businesses should look into other factors that may be more significant.

Problem 3: Impact of Income Level on Purchases

(i) First, we want to investigate the relationship between Income and number of purchases made with a discount (NumDealsPurchases) using a scatterplot.

personality %>%
  ggplot(aes(x = Income, y = NumDealsPurchases)) +
  geom_point(alpha = 0.5) + xlim(min = 0, max = 200000) +
  geom_smooth(method = "lm") + labs(title = "Scatterplot of Deals Purchase vs. Income")

## `geom_smooth()` using formula = 'y ~ x'

We then use a linear regression model to analyze whether there is a linear relationship between the two variables.

summary(lm(personality$NumDealsPurchases ~ personality$Income))

## 
## Call:
## lm(formula = personality$NumDealsPurchases ~ personality$Income)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6331 -1.2499 -0.4585  0.6624 13.3658 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2.655e+00  9.386e-02  28.290  < 2e-16 ***
## personality$Income -6.351e-06  1.618e-06  -3.924 8.98e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.917 on 2214 degrees of freedom
##   (24 observations deleted due to missingness)
## Multiple R-squared:  0.006906,   Adjusted R-squared:  0.006457 
## F-statistic:  15.4 on 1 and 2214 DF,  p-value: 8.985e-05

Analysis:

From the graph and the linear regression model, we can see that there is a statistically significant but weak negative relationship between income and the number of deals purchases. As income increases, there may be a slight tendency for individuals to purchase fewer items on deals. The low R-squared value indicates that other factors, not included in this model, likely have a more substantial impact on the number of deals purchases.

The variability in the data suggests that individual or behavioral factors could be influencing the number of deals purchases more than income does. Therefore, even though the relationship between income and deals purchases is statistically significant, the actual impact of income on purchasing behavior might be negligible (given the low R-squared value).

(ii) Now, we aim to examine how different income levels influence purchasing behaviors across various product categories. After removing records with missing income data, we applied PCA to the correlated variables of interest—purchases of wine, fruits, meats, fish, sweets, gold, and transactions made via web, store, and catalog.

personality <- personality %>% filter(!is.na(Income))
personality_quant <- personality %>% dplyr::select(MntWines, MntFruits, MntMeatProducts, MntFishProducts, MntSweetProducts, MntGoldProds, NumWebPurchases,NumWebVisitsMonth, NumStorePurchases, NumCatalogPurchases)

personality_pca <- prcomp(personality_quant, center = TRUE, scale = TRUE)
summary(personality_pca)

## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.2637 1.0871 0.90341 0.79953 0.73764 0.65714 0.62935
## Proportion of Variance 0.5124 0.1182 0.08162 0.06392 0.05441 0.04318 0.03961
## Cumulative Proportion  0.5124 0.6306 0.71221 0.77613 0.83054 0.87373 0.91333
##                            PC8     PC9    PC10
## Standard deviation     0.59339 0.52038 0.49372
## Proportion of Variance 0.03521 0.02708 0.02438
## Cumulative Proportion  0.94854 0.97562 1.00000

We then use a elbow plot to determine which principal components we should use for visualization, which displays the percentage of explained variance by each of the principal components.

library(factoextra)

## Warning: package 'factoextra' was built under R version 4.3.2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

fviz_eig(personality_pca) +
  geom_hline(yintercept = 100*(1/ncol(personality_quant)))

The elbow plot indicates a marked decrease in variance after the second component, signifying that the first two components should be sufficient to capture most of the variability in the data without a significant amount of information loss.

To focus on substantial purchasing differences, we transformed the Income variable from a quantitative to an ordinal categorical scale, defining income brackets that better represent the economic segments of interest: Low(0-20,000), Lower-Middle(20,000-40,000), Middle(40,000-60,000), Upper-Middle(60,000-80,000), High(80,000-100,000), and Very High(above 100,000). This approach facilitates a more meaningful comparison of purchasing behaviors across income categories, allowing us to discern broader trends and inform targeted marketing strategies.

personality$IncomeCategory <- cut(personality$Income, 
                           breaks = c(0, 20000, 40000, 60000, 80000, 100000, Inf),
                           labels = c("Low", "Lower-Middle", "Middle", "Upper-Middle", "High", "Very High"),
                           right = FALSE)

We can then plot a pca-biplot as well as its corresponding rotation table to analyze the how each variable can be represented by the linear combination of the principal components (in this case, PC1 and PC2 is selected automatically).

fviz_pca_biplot(personality_pca,
                habillage = personality$IncomeCategory, label = "var", pointshape = 19, alpha = 0.4)

personality_pca$rotation

##                            PC1         PC2        PC3         PC4          PC5
## MntWines            -0.3289765  0.34507437 -0.3755900 -0.03504590 -0.123614284
## MntFruits           -0.3215011 -0.20118936  0.3706709 -0.22860036  0.002164393
## MntMeatProducts     -0.3567497 -0.19038060 -0.2395223  0.05790744 -0.443845841
## MntFishProducts     -0.3317498 -0.21820961  0.3318311 -0.07060327 -0.083902179
## MntSweetProducts    -0.3202360 -0.18230060  0.3230480 -0.38912767 -0.035905484
## MntGoldProds        -0.2665429  0.22359356  0.4409724  0.77503019  0.164638712
## NumWebPurchases     -0.2494291  0.63165485  0.1036393 -0.21662935 -0.019188130
## NumWebVisitsMonth    0.2733199  0.46891199  0.3417699 -0.17887299 -0.515061075
## NumStorePurchases   -0.3314352  0.22960111 -0.2038668 -0.22563213  0.586427610
## NumCatalogPurchases -0.3615509 -0.04213969 -0.2934729  0.23717030 -0.377793027
##                             PC6         PC7         PC8          PC9
## MntWines             0.11777090 -0.01015585 -0.40078539 -0.623576135
## MntFruits            0.67455083  0.42682229  0.08574252 -0.119269790
## MntMeatProducts      0.05119993  0.04814346  0.04932802  0.496738378
## MntFishProducts      0.06526304 -0.83336294  0.04788711 -0.136983342
## MntSweetProducts    -0.66771685  0.28028301 -0.28313624 -0.049792663
## MntGoldProds        -0.07195849  0.11630913 -0.15255639  0.046075768
## NumWebPurchases     -0.15245907  0.02141365  0.67312474 -0.004194152
## NumWebVisitsMonth    0.15987530 -0.08171971 -0.41174479  0.273386523
## NumStorePurchases    0.13793231 -0.13460834 -0.31074139  0.496652562
## NumCatalogPurchases -0.07333126  0.05971234  0.06828168  0.073418214
##                            PC10
## MntWines             0.22731547
## MntFruits           -0.08790356
## MntMeatProducts      0.56977193
## MntFishProducts     -0.01948589
## MntSweetProducts    -0.00979827
## MntGoldProds         0.11235746
## NumWebPurchases      0.06279238
## NumWebVisitsMonth   -0.12171878
## NumStorePurchases   -0.14386273
## NumCatalogPurchases -0.75060366

Analysis:

According to the biplot, we can see that the relatively lower income data points have relatively higher PC1. This combined with the fact that PC1 coefficients in the rotation table are negative for NumWebPurchases, MntWines, NumStorePurchases, MntGoldProds, NumCatalogPurchases, MntSweetProducts, MntFishProducts, MntMeatProducts, MntFruits. We can conclude that as income gets higher, people are more inclined to purchase goods.

On the other hand, NumWebVisitsMonth has a negative coefficient for PC1. This suggests that the lower income people tends to visit websites more frequently than people with higher income, which coincides with the biplot which shows lower income data points have higher PC1.

We can also delve deeper by focusing on the upper half income dataset points (middle, upper-middle, high, very high) to draw further inference. We find out that Middle and Upper-Middle income people are more likely to purchase from website, as it’s closer to the arrow of NumWebPurchases shown on the biplot. On the contrary, Upper-Middle and High income people don’t purchase online a lot, as the points are much more spread out and less grouped around NumWebPurchases. This may be because the Upper level income individuals is likely concerned about the quality of web products, and therefore prefer other ways of purchasing goods.

Q3 Conclusion:

From the above analysis, we explored the relationship between income and purchasing behaviors, particularly focusing on discount purchases and various product categories. The initial investigation revealed a statistically significant yet weak negative relationship between income and the number of deals purchases, indicated by a low R-squared value in the linear regression model. This suggests that income does not have a big impact on taking deals, there are potentially more influential factors.

Further analysis, employing PCA on a range of product purchases and transaction methods, revealed interesting insights. After categorizing income into brackets, a biplot and rotation table analysis indicated that higher income is associated with increased purchases across most categories, including wine, meats, and gold. Conversely, lower income groups showed more frequent web visits, implying a tendency to seek deals or compare prices online. This trend is less pronounced in higher income groups, who might prioritize quality over price.

The study also uncovered distinctions specifically within higher income brackets. Middle and Upper-Middle income groups are more inclined towards online purchases, while the Upper-Middle and High income categories show a dispersed pattern around web purchases, suggesting a preference for alternative shopping methods. This could reflect a perception of quality or a different set of priorities among higher income individuals.

In conclusion, the relationship between income and purchasing behavior is multifaceted, with income influencing the choice of products and shopping platforms. However, the impact of income is complex, and individual preferences or behaviors seem to play a more significant role than income alone in determining purchasing patterns.

Conclusion

This research mainly focused on three aspects of customers and whether they have an effect on purchasing patterns: Family Structure, Education Level, and Income Level.

Regarding family structure, the analysis reveals that marital status significantly affects spending habits, particularly in the categories of wine, gold, and meat. Distinct spending patterns emerge among different marital statuses, with ‘Divorced’ and ‘Together’ individuals tending to spend more in certain categories. Additionally, the presence of children in a household appears to inversely correlate with spending on sweet products, a finding that contradicts initial expectations but resonates with health-conscious parenting trends.

In terms of education level, the study indicates a subtle influence on digital versus physical store purchases. While the data points to some variations in purchasing behaviors among different education levels, these differences are not pronounced, suggesting that education may not be the primary determinant of purchasing choices. This insight is critical for businesses focusing on personalized marketing strategies, indicating that other factors might be more influential in shaping consumer behavior.

The relationship between income level and purchasing behavior is particularly complex. The study finds a weak negative correlation between income and the willingness to make discount purchases, suggesting that higher-income individuals might be less inclined to opt for deals. However, this relationship is not strong, indicating that other factors might play a more significant role. Furthermore, higher-income individuals are shown to spend more across various product categories, with a distinct preference for shopping methods that align with their perceived quality and convenience.

In conclusion, this research showcases the importance of considering a range of demographic and socio-economic factors when analyzing consumer behavior. While family structure, education level, and income each play a role in shaping purchasing patterns, their impacts are complex and often interrelated. For businesses, these insights are invaluable for developing targeted marketing strategies that resonate with diverse consumers. Understanding these dynamics can lead to more effective engagement with customers, ultimately driving better business outcomes.

36-315 Final Project, Fall 2023

Jackie Wang, Jianing Shi, Benjamin Lu

2023-11-23

Introduction

Problem 1: Impact of Family Structure on Product Purchasing Behavior

Problem 2: Impact of Education Level on Digital vs Physical Store Purchases

Problem 3: Impact of Income Level on Purchases

Conclusion