Analysing Portugal’s Real Estate Market

The real estate market is a dynamic and complex sector that directly impacts individuals, communities, and economies. In Portugal, recent years have witnessed significant shifts in property prices, driven by factors such as urban development, tourism, and changing housing demands. This report focuses on a comprehensive dataset of over 100,000 real estate listings from across Portugal and explores the interactions between property features, geographical locations, and asking prices.

By analysing these trends, we aim to inform readers on insights that can inform investment strategies, urban planning, and market predictions. Beyond individual property assessments, the dataset allows for broader market analysis. Understanding price trends over time and identifying correlations between features can reveal larger economic narratives, such as the influence of governmental housing policies or shifts in buyer preferences. The findings from this study can serve a wide audience, from homeowners and investors to urban planners and policymakers, to better understand the evolving Portuguese real estate market.

As such, the aim of our analysis is threefold: we seek to discover the regional variations in house asking prices across Portugal, identify how specific house features such as location and amenities influence pricing, and examine the role of energy efficiency ratings in shaping market trends. Additionally, we aim to conduct a time-series analysis to understand how prices have evolved over time, providing a comprehensive perspective on the dynamics of the Portuguese housing market.

In this report, we identify features of interest by conducting a one-way analysis of variance (ANOVA) test to uncover significant variables, and utilise those results in further analysis.

Consequently, the following questions arise:

  1. How do amenities, like the number of bathrooms in a house, affect the asking price of real estate properties in Portugal?

  2. How does energy certification affect the asking price of properties across different regions in Portugal?

  3. Are there significant regional disparities in real estate pricing trends across Portugal, and how do these disparities align with economic or demographic factors?


Analysing the Impotance of House Features

We wanted to learn about how various features of real estate listings varied by geographical location across Portugal. To do this, we examined three features—property area, asking price, and construction year—across five major districts in Portugal.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The above graph provides a comparative analysis of the three features across the five districts (Braga, Faro, Lisboa, Porto, and Setúbal) using side-by-side boxplots. For construction year, the districts exhibit a relatively narrow range, with the median construction year remaining consistent across districts, suggesting that most property in Portugal was built in the late 1900s, though Lisboa shows a slightly older building stock on average. Gross area displays greater variability, with Braga having the widest range and the highest median area, while Faro shows smaller properties on average, indicated by its lower median and interquartile range. Property asking price depicts substantial differences across the 5 districts, with Lisboa and Setúbal showing higher property prices compared to Braga. All districts appear to have a wide range of property prices. These variations highlight how property characteristics vary geographically, with the disparities in gross area and price across various districts being particularly significant.

Now that we’ve performed a visual comparison of each of the three features, we’ll use one-way ANOVA for each variable separately to test whether the mean differs across districts.

GrossArea:

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Value and District
## F = 54.791, num df = 4.0, denom df = 5555.4, p-value < 2.2e-16

ConstructionYear:

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Value and District
## F = 148.92, num df = 4, denom df = 17470, p-value < 2.2e-16

Price:

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Value and District
## F = 929.85, num df = 4, denom df = 26003, p-value < 2.2e-16

All three tests have sufficiently small p-values < 2.2e-16, so we reject the null hypothesis in each case and conclude that all three features show statistically significant differences across the five districts. We note that one-way ANOVA can be skewed by extreme outliers present in the distributions of these features and may indicate significant differences between groups that are not actually present.

Based on the F-statistics, Price shows the most significant difference between districts, so we will continue our analysis on property asking prices across districts.

The choropleth map of Portugal depicts the average property asking prices across its districts. The colors range from blue (indicating lower prices around €100k) to red (indicating higher prices, €600k and above). The map highlights a clear geographical disparity in property prices, with southern and coastal areas generally being more expensive, whereas districts in the northern and inland regions are shaded in blue and light purple, indicating relatively lower property prices. The difference in average asking prices between districts is also quite drastic, with some districts averaging around €100k and others averaging over €600k.

The graph also includes labels for our districts of interest. All five districts—Braga, Porto, Lisboa, Setúbal, and Faro—are coastal, which correlates with higher average property prices. The southern districts, such as Lisboa, Setúbal, and Faro, are colored in red and fuchsia, indicating an average property price of around €500k to €600k. In contrast, the northern districts—Braga and Porto—are colored in purple, representing more moderate average asking prices of around €300k. Therefore, the price disparity between northern and southern districts is consistent even among the most populated districts in Portugal.

Amenities discussion

One key area of focus is the relationship between house features—specifically, the number of bathrooms—and property prices. Bathrooms often serve as a proxy for a property’s level of comfort and utility, factors that significantly influence buyer preferences and valuation. They are not only a functional necessity but also a reflection of a home’s modernity and convenience, making them a critical element in assessing property worth. By emphasising bathrooms as a predictor, this report aims to highlight their impact on pricing trends, offering nuanced perspectives for stakeholders, including investors, developers, and policymakers. Ultimately, understanding these correlations provides a window into broader economic patterns and buyer behaviours shaping the Portuguese real estate landscape.

library(ggplot2)
library(dplyr)


filtered_data <- data[, c("Price", "NumberOfBathrooms", "ConstructionYear")]
filtered_data <- na.omit(filtered_data)  
filtered_data$Price <- as.numeric(filtered_data$Price)
filtered_data$NumberOfBathrooms <- as.numeric(filtered_data$NumberOfBathrooms)
filtered_data$ConstructionYear <- as.numeric(filtered_data$ConstructionYear)

filtered_data <- filtered_data %>%
  filter(ConstructionYear > 2000)


Q1 <- quantile(filtered_data$Price, 0.25)
Q3 <- quantile(filtered_data$Price, 0.75)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR

filtered_data <- filtered_data %>%
  filter(Price >= lower_bound & Price <= upper_bound)


filtered_data <- filtered_data %>%
  mutate(BathroomGroup = case_when(
    NumberOfBathrooms == 0 ~ "0 Bathrooms",
    NumberOfBathrooms == 1 ~ "1 Bathroom",
    NumberOfBathrooms == 2 ~ "2 Bathrooms",
    NumberOfBathrooms == 3 ~ "3 Bathrooms",
    NumberOfBathrooms == 4 ~ "4 Bathrooms",
    NumberOfBathrooms >= 5 ~ "5+ Bathrooms"
  ))


ggplot(filtered_data, aes(x = BathroomGroup, y = Price)) +
  geom_violin(fill = "lightpink", color = "brown", alpha = 0.7) +
  labs(
    title = "Violin Plot of House Prices by Bathroom Groups (Constructed After 2000)",
    x = "Bathroom Groups",
    y = "House Price (€)"
  ) +
  theme_minimal()
## Warning: Groups with fewer than two data points have been dropped.

The above violin plot illustrates the distribution of house prices across bathroom groups for houses constructed after 2000. Properties with no bathrooms are concentrated at lower price ranges, primarily below €100,000, reflecting their limited appeal and functionality. As the number of bathrooms increases, the price distributions shift upwards, indicating higher property values. Houses with one or two bathrooms show broader price ranges, with significant density around €100,000–€500,000, appealing to mid-market buyers. Properties with three or four bathrooms exhibit higher concentrations in the €300,000–€700,000 range, while houses with five or more bathrooms are predominantly in the upper price brackets (€600,000–€900,000), highlighting their status as luxury homes. The trend shows a clear correlation between the number of bathrooms and increasing house prices, emphasising the value of additional bathrooms in modern housing markets.

Another goal is to explore how specific features of real estate listings, particularly the number of bathrooms, vary across different geographical locations in Portugal. Bathrooms often serve as a key indicator of a property’s functionality and appeal, reflecting its suitability for modern living standards. By examining this feature alongside other attributes, such as asking price and property area, across five major districts, we aim to uncover how regional differences influence the role and valuation of bathrooms. This focus provides valuable insights into how this essential feature contributes to property dynamics in diverse locations, offering a nuanced perspective on the interplay between geography and housing preferences.

Scatterplot of price and number of bathrooms

filtered_data <- data[, c("Price", "NumberOfBathrooms", "District")]  
filtered_data <- na.omit(filtered_data)  
filtered_data$Price <- as.numeric(filtered_data$Price)
filtered_data$NumberOfBathrooms <- as.numeric(filtered_data$NumberOfBathrooms)


filtered_data <- filtered_data %>%
  mutate(PriceRange = cut(
    Price,
    breaks = c(0, 100000, 250000, 500000, 1000000, Inf),
    labels = c("0-100K", "100K-250K", "250K-500K", "500K-1M", "1M+"),
    include.lowest = TRUE
  ))


ggplot(filtered_data, aes(x = District, y = NumberOfBathrooms, color = PriceRange)) +
  geom_jitter(alpha = 0.6, size = 2) +
  scale_color_brewer(palette = "Spectral", name = "House Price Range") +
  labs(
    title = "Number of Bathrooms Across Regions by Price Range",
    x = "District",
    y = "Number of Bathrooms"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This scatter plot illustrates the relationship between the number of bathrooms in properties and their distribution across various districts in Portugal, with house prices categorised into five distinct ranges. The majority of properties cluster around 0–5 bathrooms, with lower-priced homes (€0–100K, red) dominating this range. Higher-priced properties (€500K–1M and €1M+, green and blue) are more prevalent in metropolitan and high-demand districts like Lisbon, Porto, and Faro, and they are often associated with higher bathroom counts, indicating luxury features. Smaller districts, such as Bragança and Évora, mostly feature lower-priced properties with fewer bathrooms, reflecting more modest housing markets. Notably, a few properties exhibit extremely high bathroom counts (e.g., over 50), likely representing large estates or commercial buildings. The plot highlights a clear correlation between bathroom count and property price, with luxury and high-end properties concentrated in wealthier regions, while affordable homes with fewer bathrooms dominate rural and less populated districts.

Energy Efficiency and House Price

We wanted to learn about how energy certification ratings vary across property types and districts in Portugal, which suggests we should examine the EnergyCertificate, District, and Type variables.

library(ggplot2)
library(dplyr)

data$EnergyCertificate <- recode(data$EnergyCertificate,
                                  "A+" = 5, "A" = 4, "B" = 3, "C" = 2, "D" = 1, "NC" = NA_real_)
## Warning: Unreplaced values treated as NA as `.x` is not compatible.
## Please specify replacements exhaustively or supply `.default`.
bar_chart_data <- data %>%
  group_by(Type) %>%
  summarize(AverageEnergyRating = mean(EnergyCertificate, na.rm = TRUE)) %>%
  filter(!is.na(AverageEnergyRating)) %>%
  arrange(desc(AverageEnergyRating))

ggplot(bar_chart_data, aes(x = reorder(Type, AverageEnergyRating), y = AverageEnergyRating, fill = AverageEnergyRating)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "blue", high = "red") +
  labs(
    title = "Average Energy Certification Ratings by Property Type",
    x = "Property Type",
    y = "Average Energy Certification Rating"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The above graph compares the average energy certification ratings across various property types in Portugal, with ratings ranging from 1 (least efficient) to 5 (most efficient). Land and Garage properties exhibit the highest ratings, reflecting greater energy efficiency or adherence to modern building standards, while Studio, Industrial, and Apartment properties also show relatively high ratings, likely due to recent construction practices. In contrast, Mansions, Buildings, and Other Residential properties have lower average ratings, suggesting older building stock or limited energy efficiency measures. Other categories, such as Farm, Office, and Store, display moderate ratings, indicating a mix of older and newer constructions. These trends highlight how property type influences energy efficiency, providing valuable insights for targeting improvements in specific categories.

## `summarise()` has grouped output by 'District'. You can override using the
## `.groups` argument.

The heatmap above provides a comparative analysis of average energy certification ratings across districts (on the y-axis) and property types (on the x-axis) in Portugal. Each cell represents the average energy rating for a specific combination of district and property type, with a color gradient ranging from blue (low ratings) to red (high ratings) and grey indicating missing or insufficient data. The graph reveals clear variability in energy certification across property types and districts. Certain districts, such as Lisboa and Porto, show more data coverage with a diverse range of ratings, while others, like Ilha da Madeira and Guarda, have more grey cells, indicating gaps in available data for specific property types. Property types such as Land and House tend to have higher energy ratings across several districts, while others, such as Building, Office, and Warehouse, often show lower ratings. This heatmap highlights significant disparities in energy efficiency, suggesting that property type and geographical location play crucial roles in influencing energy certification ratings.

Time Series Analysis Graph 1:

## 
## Attaching package: 'readr'
## The following object is masked from 'package:scales':
## 
##     col_factor
## Reading layer `pt' from data source `/Users/julialiu/Downloads/pt_shp/pt.shp' using driver `ESRI Shapefile'
## Simple feature collection with 20 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -31.2849 ymin: 30.02924 xmax: -6.205947 ymax: 42.15363
## Geodetic CRS:  WGS 84
## Simple feature collection with 20 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -31.2849 ymin: 30.02924 xmax: -6.205947 ymax: 42.15363
## Geodetic CRS:  WGS 84
## First 10 features:
##      id             name                 source avg_price
## 1  PT07            Évora https://simplemaps.com 323348.79
## 2  PT12       Portalegre https://simplemaps.com 169275.07
## 3  PT16 Viana do Castelo https://simplemaps.com 215425.39
## 4  PT05   Castelo Branco https://simplemaps.com  97930.23
## 5  PT09           Guarda https://simplemaps.com 123499.78
## 6  PT04         Bragança https://simplemaps.com 182441.15
## 7  PT17        Vila Real https://simplemaps.com 218037.00
## 8  PT03            Braga https://simplemaps.com 249120.28
## 9  PT08             Faro https://simplemaps.com 683788.60
## 10 PT02             Beja https://simplemaps.com 326602.47
##                          geometry
## 1  MULTIPOLYGON (((-7.225878 3...
## 2  MULTIPOLYGON (((-7.225878 3...
## 3  MULTIPOLYGON (((-8.179265 4...
## 4  MULTIPOLYGON (((-7.555872 3...
## 5  MULTIPOLYGON (((-6.942101 4...
## 6  MULTIPOLYGON (((-6.927448 4...
## 7  MULTIPOLYGON (((-7.209316 4...
## 8  MULTIPOLYGON (((-8.8198 41....
## 9  MULTIPOLYGON (((-7.514613 3...
## 10 MULTIPOLYGON (((-8.162132 3...
## Warning in cartogram_cont.sf(portugal_with_prices_transformed, weight =
## "avg_price"): NA not allowed in weight vector. Features will be removed from
## Shape.

This graph displays a Cartogram of Portugal where Portugal is split into it’s Districts and Districts are sized by the average price of housing within that district. From this graph, we can observe that many of the Districts on the Southern area of Portugal seem to have higher average prices. We can also see the disparity in prices when comparing this graph to just a basic map of Portugal, where some districts that are originally smaller than others are now almost double the size. This highlights how much of a difference there is between Districts in terms of average price. This graph also allows us to see the regional disparity within Portugal as the higher priced Districts are almost all concentrated around the southern part of Portugal while the lower priced Districts are concentrated are the northern part of Portugal.

Graph 2:

## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
## `summarise()` has grouped output by 'District'. You can override using the
## `.groups` argument.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 1 row containing missing values (`geom_line()`).

The two side by side graphs display a time-series of average price over time with certain outliers removed. Outliers were removed as there was the occasional listing worth over 2 million spread out through the years that were not significant to the trend and made the graph more difficult to interpret. On the left graph, the top two highest average priced districts and the two lowest average priced districts were recorded. This was to see whether or not the regional disparity seemed to decrease or increase over time. We can see that although the price for the highest average priced districts remained about the same, the price seems to increase for the lowest districts starting in the 2000s, narrowing the disparity in price. The right graph shows time-series for average price in Portugal to compare the trends for high average priced districts and low average priced districts to the overall average price. We observe that the prices for the higher priced Districts seem to more closely resemble that of the overall average price in Portugal rather than the lower districts. This can be explained through further analysis of the data which shows that the amount of housing listings in the higher priced districts was much larger than that of the lower priced districts.

Discussion

#Conclusion

In conclusion, the analysis of Portugal’s real estate dataset, comprising over 100,000 property listings, reveals valuable insights into the interplay between property characteristics, geographical location, and market trends. The dataset’s diverse features, such as property types, prices, energy certifications, and construction years, provide a comprehensive foundation for exploring the Portuguese housing market. The initial boxplot analysis highlighted significant differences in property characteristics across key districts, such as Braga, Faro, Lisboa, Porto, and Setúbal. These boxplots demonstrated how construction years remain relatively consistent across districts, while gross area and asking prices vary considerably, reflecting regional disparities in property size and market value. The choropleth map further illustrated these differences, showing a stark contrast between higher property prices in southern and coastal districts and more affordable prices in northern inland regions. Together, these visualizations underscored the geographic and structural dynamics shaping Portugal’s real estate market.

Expanding this analysis, the bar chart and heatmap provided a focused examination of energy certification ratings, offering critical insights into property efficiency standards across districts and property types. The bar chart revealed clear trends, with property types like land and garage achieving the highest average energy certification ratings, while mansions and older buildings lagged behind, suggesting a relationship between property type, construction year, and energy efficiency. The heatmap complemented this by mapping average energy ratings across districts and property types, highlighting variability and gaps in data coverage. For instance, densely populated districts like Lisboa and Porto exhibited greater coverage and higher ratings in certain property types, while other districts, like Ilha da Madeira, had limited data or lower ratings. These visualizations collectively emphasize the critical role of geographic and structural factors in influencing market and energy trends, providing actionable insights for policymakers, urban planners, and investors aiming to improve energy efficiency or capitalize on market opportunities. This holistic exploration of the Portuguese real estate market sheds light on its complexity and dynamism, offering a rich foundation for further research and informed decision-making.

Conclusion and future areas

This analysis provides valuable insights into the association between the number of bathrooms, house prices, and regional variations. However, there is considerable scope for further research to enhance understanding and uncover more intricate patterns in the Portuguese real estate market.

One promising direction is to investigate the relationship between other key property features, such as total floor area, energy efficiency ratings, or proximity to amenities, and their impact on house prices. For example, combining the number of bathrooms with total square footage could provide a more holistic view of how property size and utility influence pricing. Similarly, studying the interplay between the construction year, energy ratings, and bathroom counts could reveal how modern construction trends are driving market dynamics.

Future plotting efforts could explore multidimensional relationships using more sophisticated visualizations, such as 3D scatterplots or heatmaps, to identify how multiple variables interact simultaneously. For instance, a heatmap comparing regions, price ranges, and average bathroom counts could help visualize regional disparities more effectively. Alternatively, faceted plots that break down bathroom distributions by additional categorical variables, such as urban versus rural settings or proximity to tourist attractions, could reveal insights tailored to specific buyer profiles.

Incorporating time as a variable in plots—such as analyzing trends in bathroom counts and prices over the years—could also provide valuable insights into how the market is evolving. Temporal analyses could highlight whether there is an increasing demand for luxury properties (with more bathrooms) or a growing market for compact, affordable housing.

Beyond visualizations, future research could dive deeper into regional socioeconomic contexts to explain why certain districts exhibit specific price and bathroom distributions. For example, integrating demographic data, such as population density, average income, and employment rates, could enrich the analysis and help identify patterns that are tied to local economies. Additionally, incorporating external factors, such as government housing policies or real estate tax incentives, could further clarify why certain districts, like Lisbon and Porto, attract higher concentrations of luxury homes.

In essence, while the current analysis offers a solid foundation, the integration of additional variables, more advanced plotting techniques, and regional or temporal contexts would provide a richer, more nuanced understanding of the real estate market. These approaches would not only benefit academic research but also provide actionable insights for investors, developers, and policymakers.