library(tidyverse)
library(ggplot2)
library(dplyr)
library(factoextra)
library(ggseas)
housing_data <- read.csv("ames-housing.csv")

Motivation

The housing landscape has changed dramatically over the last couple of years. With an increase in income, cost of living, and upgrades desired by homeowners, the architecture and characteristics of a house have been updated repeatedly to match the demand. Even in rural areas, houses have seen disproportionate price growth, especially between March 2020 and March 2023 (JCHS Harvard, 2024).

Usually for a house, the neighborhood, type of house, characteristics of the house, type of sale, and time are factored into the price. In this paper, we will try to understand these factors and how they influence the housing market in Ames, a rural city in Iowa. We will also try to understand how they may have affected the price of houses over time to see the drastic change that was explained above.

Ames Housing Data

In this Ames house data, we will analyze a random sample of 2930 houses and 82 variables. The data is organized by City Parcel Identification Number by row and each column is the variable value associated with each observation. Each column is a feature of the house, with more information on specific chosen variables below. Since we are interested in houses in Ames, Iowa, we will examine the type of houses in Ames, how the houses have been/are being purchased, and how have the features of houses changed the price over time. We summarize the variables we will use below:

The first couple of lines of the dataset looks like the following:

head(housing_data)
##   Order       PID MS.SubClass MS.Zoning Lot.Frontage Lot.Area Street Alley
## 1     1 526301100          20        RL          141    31770   Pave  <NA>
## 2     2 526350040          20        RH           80    11622   Pave  <NA>
## 3     3 526351010          20        RL           81    14267   Pave  <NA>
## 4     4 526353030          20        RL           93    11160   Pave  <NA>
## 5     5 527105010          60        RL           74    13830   Pave  <NA>
## 6     6 527105030          60        RL           78     9978   Pave  <NA>
##   Lot.Shape Land.Contour Utilities Lot.Config Land.Slope Neighborhood
## 1       IR1          Lvl    AllPub     Corner        Gtl        NAmes
## 2       Reg          Lvl    AllPub     Inside        Gtl        NAmes
## 3       IR1          Lvl    AllPub     Corner        Gtl        NAmes
## 4       Reg          Lvl    AllPub     Corner        Gtl        NAmes
## 5       IR1          Lvl    AllPub     Inside        Gtl      Gilbert
## 6       IR1          Lvl    AllPub     Inside        Gtl      Gilbert
##   Condition.1 Condition.2 Bldg.Type House.Style Overall.Qual Overall.Cond
## 1        Norm        Norm      1Fam      1Story            6            5
## 2       Feedr        Norm      1Fam      1Story            5            6
## 3        Norm        Norm      1Fam      1Story            6            6
## 4        Norm        Norm      1Fam      1Story            7            5
## 5        Norm        Norm      1Fam      2Story            5            5
## 6        Norm        Norm      1Fam      2Story            6            6
##   Year.Built Year.Remod.Add Roof.Style Roof.Matl Exterior.1st Exterior.2nd
## 1       1960           1960        Hip   CompShg      BrkFace      Plywood
## 2       1961           1961      Gable   CompShg      VinylSd      VinylSd
## 3       1958           1958        Hip   CompShg      Wd Sdng      Wd Sdng
## 4       1968           1968        Hip   CompShg      BrkFace      BrkFace
## 5       1997           1998      Gable   CompShg      VinylSd      VinylSd
## 6       1998           1998      Gable   CompShg      VinylSd      VinylSd
##   Mas.Vnr.Type Mas.Vnr.Area Exter.Qual Exter.Cond Foundation Bsmt.Qual
## 1        Stone          112         TA         TA     CBlock        TA
## 2         None            0         TA         TA     CBlock        TA
## 3      BrkFace          108         TA         TA     CBlock        TA
## 4         None            0         Gd         TA     CBlock        TA
## 5         None            0         TA         TA      PConc        Gd
## 6      BrkFace           20         TA         TA      PConc        TA
##   Bsmt.Cond Bsmt.Exposure BsmtFin.Type.1 BsmtFin.SF.1 BsmtFin.Type.2
## 1        Gd            Gd            BLQ          639            Unf
## 2        TA            No            Rec          468            LwQ
## 3        TA            No            ALQ          923            Unf
## 4        TA            No            ALQ         1065            Unf
## 5        TA            No            GLQ          791            Unf
## 6        TA            No            GLQ          602            Unf
##   BsmtFin.SF.2 Bsmt.Unf.SF Total.Bsmt.SF Heating Heating.QC Central.Air
## 1            0         441          1080    GasA         Fa           Y
## 2          144         270           882    GasA         TA           Y
## 3            0         406          1329    GasA         TA           Y
## 4            0        1045          2110    GasA         Ex           Y
## 5            0         137           928    GasA         Gd           Y
## 6            0         324           926    GasA         Ex           Y
##   Electrical X1st.Flr.SF X2nd.Flr.SF Low.Qual.Fin.SF Gr.Liv.Area Bsmt.Full.Bath
## 1      SBrkr        1656           0               0        1656              1
## 2      SBrkr         896           0               0         896              0
## 3      SBrkr        1329           0               0        1329              0
## 4      SBrkr        2110           0               0        2110              1
## 5      SBrkr         928         701               0        1629              0
## 6      SBrkr         926         678               0        1604              0
##   Bsmt.Half.Bath Full.Bath Half.Bath Bedroom.AbvGr Kitchen.AbvGr Kitchen.Qual
## 1              0         1         0             3             1           TA
## 2              0         1         0             2             1           TA
## 3              0         1         1             3             1           Gd
## 4              0         2         1             3             1           Ex
## 5              0         2         1             3             1           TA
## 6              0         2         1             3             1           Gd
##   TotRms.AbvGrd Functional Fireplaces Fireplace.Qu Garage.Type Garage.Yr.Blt
## 1             7        Typ          2           Gd      Attchd          1960
## 2             5        Typ          0         <NA>      Attchd          1961
## 3             6        Typ          0         <NA>      Attchd          1958
## 4             8        Typ          2           TA      Attchd          1968
## 5             6        Typ          1           TA      Attchd          1997
## 6             7        Typ          1           Gd      Attchd          1998
##   Garage.Finish Garage.Cars Garage.Area Garage.Qual Garage.Cond Paved.Drive
## 1           Fin           2         528          TA          TA           P
## 2           Unf           1         730          TA          TA           Y
## 3           Unf           1         312          TA          TA           Y
## 4           Fin           2         522          TA          TA           Y
## 5           Fin           2         482          TA          TA           Y
## 6           Fin           2         470          TA          TA           Y
##   Wood.Deck.SF Open.Porch.SF Enclosed.Porch X3Ssn.Porch Screen.Porch Pool.Area
## 1          210            62              0           0            0         0
## 2          140             0              0           0          120         0
## 3          393            36              0           0            0         0
## 4            0             0              0           0            0         0
## 5          212            34              0           0            0         0
## 6          360            36              0           0            0         0
##   Pool.QC Fence Misc.Feature Misc.Val Mo.Sold Yr.Sold Sale.Type Sale.Condition
## 1    <NA>  <NA>         <NA>        0       5    2010       WD          Normal
## 2    <NA> MnPrv         <NA>        0       6    2010       WD          Normal
## 3    <NA>  <NA>         Gar2    12500       6    2010       WD          Normal
## 4    <NA>  <NA>         <NA>        0       4    2010       WD          Normal
## 5    <NA> MnPrv         <NA>        0       3    2010       WD          Normal
## 6    <NA>  <NA>         <NA>        0       6    2010       WD          Normal
##   SalePrice
## 1    215000
## 2    105000
## 3    172000
## 4    244000
## 5    189900
## 6    195500

Research Questions

  1. What is the housing market like in Ames, Iowa? Specifically, what kind of houses are in Ames, how do they vary by neighborhood, and how have amenities of houses changed over time?

  2. How does the nature of a housing sale impact its sale price? Specifically, how do sale type and sale condition relate to sale price to show homeowner patterns?

  3. How do the quality and condition of a house impact the price of a house? Also, how does the average sale price change over time in accordance with economic shifts?

Data Analysis with Graphs

Research Question 1

First, we want to explore what kind of houses are being sold in Ames, Iowa to give us a better understanding of what the demographic looks like. This will require us to look into variables such as the house style, neighborhoods, and how amenities such as garages and porches have changed. By analyzing the types of houses in Ames, we will be able to further contextualize what the housing market is truly like and get a broader understanding of why specific trends regarding pricing may occur.

To start this analysis, let’s look at the distribution of houses to see if distinct clusters emerge by house style using multi-dimensional scaling:

table(housing_data$House.Style)
## 
## 1.5Fin 1.5Unf 1Story 2.5Fin 2.5Unf 2Story SFoyer   SLvl 
##    314     19   1481      8     24    873     83    128
haa <- housing_data %>% 
  select(Lot.Area, Mas.Vnr.Area, Total.Bsmt.SF, X1st.Flr.SF, X2nd.Flr.SF,
         Gr.Liv.Area, Garage.Area, Wood.Deck.SF, Open.Porch.SF, Enclosed.Porch,
         X3Ssn.Porch, Screen.Porch, Pool.Area)
haa[is.na(haa)] <- 0
haa <- haa %>%
  apply(MARGIN = 2, FUN = function(x){x / sd(x)})
house_dist <- dist(haa)
housing_mds <- cmdscale(house_dist, k = 2)
housing_mds <- housing_data %>%
  mutate(mds1 = housing_mds[,1], mds2 = housing_mds[,2])
housing_mds %>% 
  ggplot(aes(x = mds1, y = mds2)) +
  geom_point(aes(color = House.Style), alpha = 0.3) +
  theme_minimal() +
  labs(title="MDS1 vs. MDS2 Colored by Style of the House",
       x="MDS 1",
       y="MDS 2")

In the plot above, we can see that three large clusters form: a blue cluster indicating houses with features similar to a typical 2-story home, a green cluster indicating houses with features similar to a typical 1-story home, and a slightly smaller red cluster indicating houses with features similar to a one and one-half story home where the second level has been constructed. Besides showing similarities between houses, this plot also shows us the relative frequencies of how common each house style is. Visually, there are far more 1-story, 2-story, and 1.5-story homes with other housing styles not as prevalent.

The fact that three major clusters emerge, highlights that the quantitative variables from the housing dataset have specific feature differences that distinguish these types of housing styles from one another. Intuitively, this makes sense because on average, in terms of pricing alone, 1-story homes will be less expensive than 2-story homes.

Another question that comes to mind when talking about housing styles, is how these various housing styles are spread across the state of Iowa. In our house pricing dataset, there are over 25 neighborhoods in which housing prices were recorded. That leads us to the question of, how are housing styles different per neighborhood. Are there neighborhoods that are more affluent than others? How does this relate to pricing? To answer these questions, we create the following stacked bar chart (filtering out all houses that aren’t either 1-story, 2-story, or 1.5-story for readability):

housing_data %>% filter(House.Style == c("1.5Fin", "1Story", "2Story")) %>% 
  ggplot(aes(x = Neighborhood, fill = House.Style)) +
  geom_bar() +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  labs(title = "Distribution of House Style per Neighborhood", y = "Frequency")

From this chart many things become apparent. First off, the amount of houses that were sold in each neighborhood was not distributed uniformly. There are a few neighborhoods with several houses sold that far exceed the mean houses sold per neighborhood: North Ames, College Creek, and Old Town.

Beyond that, looking at the marginal distribution of housing style for each neighborhood shows us that there are many areas in which there exists the area and space to create 2-story houses, weeding out the number of 1 and 1.5-story houses. Based on the ratio of 2-story homes to non-2-story homes, the more affluent neighborhoods in Ames, Iowa are Gilbert, Somerset, and Northridge. As a check, I browsed the internet and found that among many of the cities in Iowa, these three neighborhoods in particular are regarded as some of the safest, most welcoming neighborhoods within Ames. According to niche.com, Gilbert specifically is ranked the number one best place to raise a family within the county.

Among these neighborhoods, less than half feature houses with 1.5 stories. Based on further internet research, houses with 1.5 levels are not considered houses that would be primarily found in low-income neighborhoods and thus, we can only make the claim that neighborhoods with a high percentage of 2-story homes are more affluent even though they typically have fewer houses within said neighborhood.

Lastly, when it comes to understanding what the housing market looks like, one thing that’s important to track is the number of amenities a house would have. Intuitively, we expect that the number of houses with additional benefits such as larger garages or basements, would increase as homes modernize. With this in mind, how did the prevalence of home luxuries increase over time? Are any amenities more important than others? What is the general trend of each amenity? We demonstrate this by creating the following time-series plot.

amenities <- housing_data %>%
  group_by(Year.Built) %>%
  summarize(avg_garage = mean(Garage.Area),
            avg_basement = mean(Total.Bsmt.SF),
            avg_porch = mean(Open.Porch.SF))
amenities_plot <- amenities %>%
  pivot_longer(cols=c(avg_garage,avg_basement, avg_porch),
               names_to="Amenities", values_to="Average_Value")
ggplot(amenities_plot, aes(x=Year.Built,y=Average_Value, color=Amenities)) +
  geom_line() +
  geom_smooth(method = "lm", se = FALSE, aes(group = Amenities)) +
  labs(title="Average of Amenities Over Time",
       x="Year Built",
       y="Average Value")

This time-series plot features the year of creation on the x-axis and the average value (in dollars) on the y-axis. We separate each amenity by its own colored line to track its movement over time. For basement and garage space, the regression line slope is positive, thus, over time, there has been heavier emphasis on such features/amenities in Iowan homes. On the contrary, the porch size shows a downwards trend, implying that porches are less of a priority for Iowan homeowners. This plot also shows that basements are naturally larger in size than garage or porch size. Lastly, the slope of the basement line is steeper than the slopes of the garage and porch lines. This implies that basement size has had the most significant growth over the last century. However, it is important to note that this trend could be contextually dependent on the Iowan landscape due to having more space to work with, while with garages, people only need to fit in two cars and thus only need to accommodate one or two vehicles.

For further quantitative analysis, we ran a regression analysis on showing how the Year.Built, Garage Area, Total Basement Area, and Open Porch Area affected Sale Price. The linear regression model is shown below:

model <- lm(SalePrice~Year.Built + Garage.Area + Total.Bsmt.SF + Open.Porch.SF, data = housing_data)
summary(model)
## 
## Call:
## lm(formula = SalePrice ~ Year.Built + Garage.Area + Total.Bsmt.SF + 
##     Open.Porch.SF, data = housing_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -510568  -28928   -6537   21372  427162 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.177e+06  7.022e+04 -16.760   <2e-16 ***
## Year.Built     6.223e+02  3.635e+01  17.120   <2e-16 ***
## Garage.Area    1.240e+02  5.355e+00  23.164   <2e-16 ***
## Total.Bsmt.SF  6.326e+01  2.521e+00  25.094   <2e-16 ***
## Open.Porch.SF  1.215e+02  1.448e+01   8.392   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50690 on 2923 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.598,  Adjusted R-squared:  0.5975 
## F-statistic:  1087 on 4 and 2923 DF,  p-value: < 2.2e-16

We see that all the variables are significant for the model and all have a very low p-value. The R^2 value, 0.598, is also relatively okay for a linear model. We see that the Total Basement Area has a large coefficient indicating that it has a larger effect on the sale price of a house. Overall, the linear regression model is a good estimate of the sale price, however, it has certain limitations related to diagnostics and perhaps omitted variable bias that should be looked into more in the future.

Research Question 2

Following our understanding of the housing demographic in Ames, we will now look into how the houses are being purchased by homeowners and the relationship between these methods and the price of the sale. This will require us to look into the sale type of the house, the sale condition of the house, and the sale price. This will benefit us because understanding the nature of how sales are conducted in Ames, Iowa can help us gain a better understanding of market dynamics, property valuation, and other areas of interest in real estate research.

ggplot(data = housing_data, aes(y = Sale.Type, x = SalePrice)) +
  geom_boxplot(outlier.color = "red") +
  theme_minimal() +
  labs( y = "Type of sale",
        x = "Price of sale",
        title = "Boxplot of sale price vs. sale type")

housing_data %>%
  group_by(Sale.Type) %>%
  summarise(Median_SalePrice = median(SalePrice))
## # A tibble: 10 Ă— 2
##    Sale.Type Median_SalePrice
##    <chr>                <dbl>
##  1 "COD"               127500
##  2 "CWD"               160750
##  3 "Con"               215200
##  4 "ConLD"             127500
##  5 "ConLI"             119000
##  6 "ConLw"              92500
##  7 "New"               250580
##  8 "Oth"               116050
##  9 "VWD"               137000
## 10 "WD "               157000

New home sales have the highest median sale price of all sale types, at around $250,000. This logically makes sense, as customers are generally willing to pay a premium for newly built homes.

Contracts with 15% down payment and regular terms have the second highest median sale price, and while it has a higher interquartile range than other sale types, it has no outliers. This is likely because these are more traditional sales, with standard down payments (15% is the American median). These are likely to be middle-income Americans, which follows the median sale price being somewhat middling, compared to newer sales and distressed property sales.

Conventional warranty deeds have a somewhat low median sale price, with a high number of outliers. This could indicate that a number of high-valued properties, like estates, manors, or other more expensive properties, are being sold through this method.

Court officer deeds, contracts with low down payments, low interests, and low down payments and interests all have the lowest median sale prices of the sale types, besides Other. This makes sense, as court officer deeds are judicially mandated sales of properties, and are sold expeditiously due to a foreclosure or a judicial proceeding. Properties with low down payments and low interest rates would logically sell for less, as these are properties on the lower end of the market to begin with.

ggplot(data = housing_data, aes(y = Sale.Condition, x = SalePrice)) +
  geom_violin() +
  geom_boxplot(width = 0.1, outlier.color = "red") +
  theme_minimal() +
  labs(x = "Price of sale",
       y = "Condition of sale",
       title = "Violin + box plot of sale condition vs sale price")

housing_data %>%
  group_by(Sale.Condition) %>%
  summarise(Median_SalePrice = median(SalePrice))
## # A tibble: 6 Ă— 2
##   Sale.Condition Median_SalePrice
##   <chr>                     <dbl>
## 1 Abnorml                  129450
## 2 AdjLand                  110000
## 3 Alloca                   149617
## 4 Family                   144400
## 5 Normal                   159000
## 6 Partial                  250000

Partial construction sale conditions have the highest median sale price of any condition of sale. This could be due to the sale of partially-completed high-value properties. These properties could also be over particularly valuable land, meaning that the actual construction of the property wasn’t the reason behind the sale, but rather, the value of the land the property was on. The timeframe of the dataset (2006-2010) spans the initial periods before, during and after the global financial crisis. As such, it’s possible that sales of high-value partially completed properties were speculative sales, and part of the general bubble of housing prices at the time. Normal sale conditions had the second highest median of sale price, along with a high number of outliers. These outliers could also be speculative sales made before the housing crisis, with the buyer’ assumption being that the high price was justified since housing prices (at the time) kept rising and rising. Inter-family and two linked property sale conditions have around the same median sale price, slightly lower than normal sale conditions. This could be due to family sales being discounted thanks to the generally non-competitive market dynamics within a family, and linked property sales tend to be of smaller units. Adjoining land purchase and abnormal sale conditions had the lowest median sale prices. Abnormal sale conditions also had several outliers. Adjoining land purchases having low median sale prices makes sense, as these are just purchases of land immediately next to a property, i.e., land-only transactions. With no housing property to purchase, the sale price would logically be lower. Abnormal sales occur under foreclosures or short sales, which are sold below market value to attract buyers quickly, so their low median sale price (likely exacerbated by the financial crisis) makes perfect sense. The presence of high outliers could be due to speculative purchasing and price spikes, much like normal sale conditions.

In conclusion, taking into consideration the historical climate of sale purchases at the time, newer home sales, conventional warranty deed sales, partial construction, and normal sale conditions had the highest median sale prices. Court officer deeds, contracts with low interest rates, down payments, or both, adjoining land purchases, intra-family sales, and abnormal sale conditions had the lowest median sale prices.

Research Question 3

Finally, we want to understand how the features of a house like the overall condition and quality differ among houses in Ames, and if they affect the sale price of a house. We will also see how the average sale prices have changed over time to provide insight into possible spikes and dips during critical time-periods like the Great Depression and The Financial Crisis. We will use Year.Built, SalePrice, Overall.Qual, and Overall.Cond as the variables to build our visualizations.

To start off, we will examine the density of houses in Ames with various quality and condition ratings. This will give us a better idea of the overall housing market in Ames and specifically what kind of houses exist in Ames. Below is a heat map of Overall Quality by Overall Condition of houses.

ggplot(housing_data, aes(x=Overall.Qual, y=Overall.Cond)) +
  stat_density_2d(aes(fill=after_stat(density)),
                      geom = "tile",
                      contour=FALSE) +
  geom_point(alpha=0.2) +
  coord_fixed() +
  scale_fill_gradient(low="white",
                      high="red") +
  theme_bw() +
  labs(title="Heat Map of Overall Quality by Overall Condition of Houses",
       x="Overall Quality",
       y="Overall Condition")

table(housing_data$Overall.Qual)
## 
##   1   2   3   4   5   6   7   8   9  10 
##   4  13  40 226 825 732 602 350 107  31
table(housing_data$Overall.Cond)
## 
##    1    2    3    4    5    6    7    8    9 
##    7   10   50  101 1654  533  390  144   41

The houses with a certain combination of quality and condition are brighter in red and slowly lose color as the density of such houses decreases. In the plot above, we can see that houses with an overall quality of around 7 and an overall condition of 5 are the densest in Ames while the next densest houses are the ones with an overall condition of 5 and overall qualities of 5, 6, and 8. This is interesting because it shows that even though the overall condition of the houses is constant at around Average, the houses’ quality varies between Average, Above Average, Good, and Very Good. This is quite interesting, and it might be worthwhile to look more into the distribution of Overall Quality. Specifically, we will look into how the year a house was built affects the sale price of a house while dividing the points by Overall Quality. This will help us understand how the overall quality of houses in Ames has changed over time and if there has been a prolonged period where only houses with overall qualities of Average, Above Average, Good, and Very Good were built. We will also look at how it affects sale prices and how different qualities result in different prices to better understand buying patterns. Below is a time series of Year Built by Sale Price Separated by Overall Quality.

ggplot(housing_data, aes(x=Year.Built, y=SalePrice, color=as.factor(Overall.Qual))) +
  geom_point(alpha=0.5) +
  labs(title="Year Build by Sale Price Separated by Overall Quality",
       x="Year Built",
       y="Sale Price")

The time series shows that there is a slight exponential growth in sale price over time. Sale prices remained fairly constant till around 1980 when they slowly started to increase and peaked around 2000. The overall quality of the houses has also changed drastically as we can see by the colors of the points. Most of the houses in the early 1900s and throughout have had an overall quality of around 5(Average), 6(Above Average), and 7(Good). Only from around 1980, do we see houses with quality ratings of around 8(Very Good), 9(Excellent), and 10(Very Excellent). This shows how there were many houses with overall qualities of 5, 6, and 7 built over a longer time period than other quality houses, which aligns with the heat map we saw before. Perhaps more houses with somewhat average quality were built to match the rural environment of Ames because of peoples’ cost of living and housing expectations. This is also probably why the sale price of houses has remained around the same for a while because the quality of houses has not changed until around 1980. The sale price has tripled from 1980 to the early 2000s, which indicates that there was something that happened during that period. To better understand the trend during this period, we will make another plot that shows the moving average of prices over time.

average_sale_price <- housing_data %>%
  group_by(Year.Built) %>%
  summarize(Avg_Sale_Price = mean(SalePrice))

ggplot(average_sale_price, aes(x=Year.Built,y=Avg_Sale_Price)) +
  geom_line(color="purple") +
  stat_rollapplyr(width=2, align="left") +
  labs(x="Year", y="Average Sale Price", title="Moving Average of House Sale Prices")

The graph above shows that the moving average of sale prices has been increasing over time with dramatic increases and decreases during specific years maybe due to recessions and financial crises, specifically around 1890 and 1940. Some research tells us that there was the Panic of 1893, which led to unemployment and bank failures. This could have led to a sharp decline in house prices because the demand for housing went down. Around 1930-1940, the Great Depression occurred, which could have also led to a decrease in sale prices because demand decreased during that time period too. After that, housing prices seem to have been steadily rising due to economic activity, which has in turn led to higher costs of living.

In conclusion, these graphs show that the disproportionate growth of housing prices and housing in general is not only limited to the urban areas but has also affected rural areas. Ames has been average in housing for a while in terms of both quality and condition, but recently, has increased both due to cost of living and economic booms, resulting in higher sale prices.

Conclusion

In this analysis, we learned that the housing landscape in Ames, Iowa can be explored in many different ways to see how houses have changed.

First, we looked at what the overall housing market looks like in Ames to get a better understanding of what kind of houses or features are most prevalent. We did this by looking at the style of houses, how the most prominent types of houses vary by neighborhoods, and some interesting amenities have changed over time. We did by running an MDS plot on house size (square feet) and colored it by house style to understand, which styles were most prevalent. We saw that one story, one and a half stories, and two stories houses were most prevalent. We also ran a stacked bar chart on those house styles by neighborhood to explore which houses were most common in different parts of Ames. We saw that neighborhoods such as North Ames and College Creek had more houses than other places and affluent neighborhoods like Gilbert had larger houses. Lastly, we ran a time series and a regression analysis to see how amenities such as garage area and open porch square feet changed over time and how they may have affected sale price. We saw that basement and garage space had a positive relationship and that basement size had the most effect in the linear regression analysis. All the listed amenities were significant in the relationship, but as mentioned, there may be some limitations that should be addressed in future analysis.

Second, we looked at how houses are being purchased and the relationship between buying methods and the house sale price. We did this by running a boxplot on the type of sale and the price of sale. We also ran a violin plot on the condition of sale and price of sale. We saw that newer home sales, conventional warranty deed sales, partial construction, and normal sale conditions had the highest median sale prices. Also court officer deeds, contracts with low interest rates, down payments, etc. had the lowest median sale price. This is all useful in understanding what methods and conditions homeowners in rural areas are using even sale price changes.

Last, we looked at how the overall quality and condition of houses impact sale price and how sale prices have changed over time. We did this by running two time series with the year the houses were built, sale prices, and overall quality and condition of the houses. We saw that houses higher overall quality and condition had higher sale prices, but majority of the houses in Ames were around average in both quality and condition. Only recently was there an increase in houses with greater quality and condition related maybe to economic boom and demand from cost of living. This ties in with the research we conducted in the motivation section where there has been a disproportionate increase in sale prices, even in rural areas, due to demand and cost of living.

Overall, the data analysis on the housing environment in Ames, Iowa portrays the ever-growing housing industry, especially in rural areas. It provides important insight into possible factors behind sale price growth and how the houses themselves have changed in terms of characteristics and features to meet the modern age of housing. Some questions that may have been left unanswered due to time constraints or data constraints could be how consumers’ incomes have affected buying habits for houses in Ames. This could provide important insight into how income has shifted perspectives on sales and demand during economic fluctuations. However, the dataset does not have household income listed as a variable. Also, even though the data is observational, it might be useful to use econometrics techniques to identify causal relationships between variables using regression discontinuity or difference-in-differences methods. For regression discontinuity, we would need a policy or an event where we could compare homeowners before and after to identify potential causal relationships. For difference-in-differences we would also need a policy change or program implementation to identify a control and treatment group, which would reveal causal relationships. These are more advanced data analysis techniques and would require more data, but it would be interesting to investigate. We look forward to perhaps continuing this data analysis as we gather more data and learn more advanced statistical techniques.