Introduction

Assessing the market value of a house is a challenging task, and one every prospective homeowner must do. Homeowners must access the value of the house’s location, exterior, interior, and other details and add all that up to a price they are willing to pay. We are interested in using data to understand the value that home owners generally place on different aspects of a property. Our goal is to understand how different factors such as location, quality, condition, year built, year remodeled, and date sold impact housing prices in Ames, Iowa.

Dataset

We used the ‘House Prices in Ames, Iowa’ dataset (https://cmustatistics.github.io/data-repository/money/ames-housing.html) which consists of information regarding the value of houses sold in Ames from 2006 to 2010. This dataset includes 82 variables - we have decided to use 9 variables to determine how different factors such as quality, condition, zone, year built, and year remodeled impact housing prices.

Neighborhood - Location within Ames neighborhoods. Consult documentation file for full names. (Categorical)

Overall.Qual - Overall material and finish of the house. 1 to 10 scale, where 1 = very poor and 10 = very excellent. (Ordinal)

Overall.Cond - Overall condition of the house, on same 1 to 10 scale. (Ordinal)

SalePrice - Price house sold for (dollars) (Quantitative)

Lot.Area - Lot size (square feet) (Quantitative)

MS.Zoning - Zoning of the lot. A = agriculture, C = commercial, FV = floating village residential, I = industrial, RH = residential high density, RL = residential low density, RP = residential low density park, RM = residential medium density (Categorical)

Yr.Sold - Year Sold (Discrete)

Mo.Sold - Month Sold (Discrete)

Year.Remod.Add - Year the home was remodeled or added to. Same as the year built if no major remodeling or additions have been done. (Discrete)

square_feet - House size (Quantitative), defined as a sum of above ground living area (Gr.Liv.Area) and basement living area (Total.Bsmt.SF)

price_square_foot - Price per square foot in dollars (Quantitative), defined as SalePrice divided by square_feet

Misc.Val - Value of the miscellaneous features, in dollars (Quantitative)

Research Questions

Question 1: What is the effect of location on housing prices in Ames, Iowa?

To answer this question, we first consider the relationship between housing prices and house size. This is necessary because certain neighborhoods likely are richer and have bigger houses. However, we want to simply see if there is a premium put on these locations beyond just house size, which may be a confounding variable. To do this, we must first show the relationship between house size, measured by square_feet and sale price. We create a scatterplot of sale price against square footage of houses, grouping by neighborhood.

This scatterplot shows a positive correlation between house sale price and house size, which means that bigger houses do indeed sell for higher prices. This relationship also looks relatively linear, which means there is not an additional premium per square foot for larger houses, which means we can control for this easily by dividing price by square footage (as opposed to potential upward pressure on price per square foot for extremely large and scarce houses). We conclude that in order to accurately compare housing prices across neighborhoods, it is necessary to control for housing size in order to capture the intrinsic premium paid to houses in a certain location beyond just size.

Knowing this, we can now plot housing sale price per square foot against neighborhood using a faceted box plot. If location, represented by Neighborhood, does not have a relationship with sale price per square foot, then we would expect to see the boxplots have similar medians and interquartile ranges. If location does have a relationship with sale price per square foot, then we would expect to see the boxplots differ in its median and interquartile ranges.

These box plots show that houses in certain neighborhoods sell for higher prices, and that this relationship remains even after controlling for housing size. Location does seem to be a significant factor in predicting sale price per square foot. This is further reflected in the global f test below, which is able to test if any of the neighborhoods are different from in price per square foot to a statistically significant degree (comparing to a base neighborhood).

## Analysis of Variance Table
## 
## Response: price_square_foot
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## Neighborhood   27 274182 10154.9   63.76 < 2.2e-16 ***
## Residuals    2901 462039   159.3                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The global f-test shows that neighborhood is a statistically significant predictor, in that at least one of the neighborhoods differs from the base neighborhood in sale price per square foot.

Lastly, we are interested in whether some neighborhoods have an unusually high amount of house sales in the 2006 to 2010 period. Since this period coincides with the 2007-2009 financial crises, it would be interesting to note if neighborhoods within Ames, Iowa were affected differently from the crises.

This world cloud shows which neighborhoods have more houses sold relative to others in Ames, Iowa. Our data shows that North Ames (shortened to Names) had the most sold houses, followed by College Creek (collgcr), Old Town (oldtown), and Edwards (edwards). Neighborhoods with the least houses sold include Greens (greens), Blue Stem (blueste), and Pine Knoll Village (pnkvill). We reason that neighborhoods with more houses being sold in this period were likely more adversely affected relative to neighborhoods with less houses being sold in this period.

Question 2: How do housing prices in Ames differ across time, including market shocks and house age?

Delving deeper into the effects of time, we explore the influence of time on changing housing prices, specifically focused on market shocks and housing age.

First, we are interested in seeing how housing market volatility from the 2006 to 2010 period affected house sales in Ames, Iowa. Was Ames relatively insulated from these shocks, or is there a significant drop in sale prices after 2007? We control for size of the house by using sale price per square foot.

The first relationship we seek to examine is whether the average sale price has changed over the time depending on when the house was sold. We control for housing sizes by looking at the average price per square foot, and by using a rolling average of width 6 (months), we can see some fluctuations but a slight downward trend to the average price per square foot for houses in Ames sold from 2006 to 2010. This shows that price per square foot in Ames Iowa was somewhat affected by the 2008 recession, but not as much as the broader market. This is good as it lets us make some broader conclusions from our data without worrying about the effects of housing inflation and deflation that were occurring in the overall market at the time.

Changing sale prices could have effects on buyer’s preferences, as hotter and cooler markets could draw more or less potential customers. Hence, we examine the distribution of house sales across the time period of 2006-2010.

We want to look at when houses were being bought most often across the window from 2006 to 2010. The first plot shows that the number of houses sold per year across the first 4 years was fairly level, before taking a dip in 2010. To understand why this could be the case, we can look at the number of houses sold per month across these 5 years. When we break down the number of houses sold per month, we see two notable features. Firstly, there seems to be some seasonality to people buying houses, with the summer months of June and July having the highest quantity every year. Next, we also understand why there is a slight dip in houses sold in 2010. The data that we have does not run until the end of the year, and cuts short in July 2010.

Since we do not see any natural change in house sales across the years, and there is little evidence that the prices are changing based on when the house was sold, we examine whether the age and modernity of a house affects how much it sells for.

Finally, we want to examine the relationship between the housing sale price per square foot and the year that the house itself was built, to understand whether newer houses go for higher prices. Simply using data about when a house was built isn’t the best method of assessment however as houses are often modernized and remodeled. Instead we can examine the relationship between the housing sale price and the year a house was most recently remodeled (or simply when the house was built in the case of houses that were never remodeled). Through the seasonal decomposition plot, we can observe the general trend of the data, which shows that housing prices do increase when a house is more recently built/remodeled. The third plot, representing seasonality, looks at the detrended data to decipher any fluctuations, and there appear to be peaks and valleys, likely representing the natural fluctuations in the housing market.

Question 3: What is the impact of housing quality and condition on sale price, and how is this affected by other variables?

We have seen that houses that have been more recently built, remodeled, or added to typically command higher sale prices per square foot. When thinking of reasons for this, possible explanations may be that the upkeep of houses is not maintained and the condition of the house goes down over time. Alternatively, maybe newer houses are made with more modern materials that are of higher quality and desirability by home buyers. We now seek to understand the influence of overall condition and overall quality of homes on sale price per square foot.

We created density plots of sale price per square foot, grouped by overall condition and overall quality. In each plot, we consolidated overall condition and overall quality into levels from 1-5 (where 1-2 became 1, 3-4 became 2, etc.) to make the plots more readable. These plots show that price per square foot does seem to be impacted by both condition and quality. However, the effect of improvements in condition seem to fall off after the third level (overall condition of 5-6), while improvements in quality consistently increase sale price per square foot.

The findings from the density plots are supported by the output of the linear model and associated t-tests shown below. The t-test associated with each variable tests to see if there is sufficient evidence that the coefficient on each variable is statistically significant. That is, the tests determine if overall quality and overall condition have a significant association with sale price per square foot.

## 
## Call:
## lm(formula = price_square_foot ~ Overall.Qual + Overall.Cond, 
##     data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -80.362  -7.617  -0.400   7.353 102.609 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   22.1450     1.6979  13.043  < 2e-16 ***
## Overall.Qual   6.3265     0.1723  36.716  < 2e-16 ***
## Overall.Cond   1.7134     0.2187   7.834 6.57e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.1 on 2926 degrees of freedom
## Multiple R-squared:  0.3185, Adjusted R-squared:  0.318 
## F-statistic: 683.7 on 2 and 2926 DF,  p-value: < 2.2e-16

The linear model output shows that although both overall quality and condition have a statistically significant positive association with sale price per square foot, quality has a greater effect on price per square foot relative to condition.

Lastly, we are interested in understanding the interplay between the variables we have previously looked at. As hypothesized before making the density plots, is year built/remodeled indeed associated with overall condition and quality? Are bigger houses associated with better quality, and do they have additional features people may be willing to pay a premium for on a per square foot basis? To answer these questions and potential others that may arise for a viewer, we create a correlation plot that includes quantitative variables we have previously looked at and adds “Misc.Val”.

This shows that overall condition is not correlated with Year.Remod.Add, which indicates that it is not guaranteed that people will neglect their homes and condition will go down over time. The plot also shows that quality and condition are not correlated, which indicates that better homes don’t necessarily get better upkeep. Additionally, square footage is not necessarily related to overall condition, but it is positively correlated to overall quality. This implies bigger houses were more likely built with better materials, but that these houses aren’t necessarily going to be better taken care of. The size of the house, measured with square_feet, does not have a strong correlation with sale price per square foot, indicating that there isn’t a significant premium per square foot paid for larger homes (sale price scales relatively linearly with size). Lastly, the value of other miscellaneous features homebuyers may want are not correlated with square_feet, indicating that bigger houses don’t necessarily have additional features that homebuyers may want.

Conclusion

In summary, our study highlights several key determinants impacting housing prices in Ames, Iowa. Location, as demonstrated through neighborhood variations, emerges as a significant predictor even after controlling for house size. The analysis shows certain neighborhoods have higher prices per square foot which indicates a premium associated with specific areas. Additionally, exploring the impact of market shocks revealed a modest effect of the 2008 recession on housing prices in Ames. Furthermore, the investigation into housing age and remodeling highlighted a positive correlation between more recent builds or remodels and higher sale prices per square foot. This suggests a preference for newer or renovated properties. Quality and condition also play pivotal roles, with higher-quality homes consistently demanding higher prices. However, while quality significantly influences prices, the effect of condition diminishes after a certain threshold. Interdependencies between variables revealed that better quality doesn’t necessarily equate to better upkeep over time. Larger houses tend to correlate positively with quality but do not necessarily ensure better maintenance. Moreover, the value of miscellaneous features in homes doesn’t correlate with square footage, suggesting buyers may prioritize specific amenities irrespective of house size. Looking ahead, future considerations might involve a more detailed exploration of specific neighborhood characteristics, such as amenities, schools, or infrastructural developments. Additionally, understanding the fluctuations in housing prices across economic cycles could provide deeper insights into the Ames housing market.