Introduction:

As technology evolves and advances, the car industry is one of the sectors that exhibit and reflect the most changes with social, technological, and even political changes. Specifically, the supply, demand, and prices of the used car market frequently fluctuate based on different types of economic activities. Thus, this project is motivated to investigate the different types of factors that can potentially impact the prices traded in the used market and how those different factors are correlated together.

Data Source Description:

The Used-cars-catalog dataset contains various used car features that stem from 1942 to December 2019, based on the used cars market in Belarus (western Europe). There are 30 variables: the categorical variables include manufacture_name, model_name, transmission, etc. while the quantitative variables include odometer_value, engine_capacity, price_usd, etc. Due to the scope of the dataset, each research question examines the relationship between different combinations of variables. Therefore, a detailed description of variables chosen will be listed in each section.

Research Question 1:

Does the Appearance of the Used Cars affect their Prices in the Market?

In this question, we wanted to learn about how the appearance of the used cars influenced the price of the used cars. We are picking two main aspects of the appearance: number of photos and body type. We choose these two aspects because we want to investigate whether the high exposure and the shape of the used cars will determine people’s minds. The variables that we used are price_usd, number_of_photos, and body_type.

Relationship between Number of Photos Provided and Car Price

The variable of number of photos represents the rate of exposure of used cars: the more photos, the more exposure. We choose the silver used cars to investigate because we have too many data points if we choose to use all the data, which is difficult for us to view the relationship. We choose to use the heat map because it can more directly tell where the most dense of the data points fall at.

From this graph, we can see that there is only one mode at the lower left corner that consists of low number_of_photos values (around 6.5 units) and low price_usd values (around 3000 units). The areas with denser information have warmer colors and the dots are closer together. This result means that when selling the silver used cars, people tend to upload small number of pictures and the prices of the cars are low. We can also see that there are many dots when the number of photos is 15. There are fewer dots when the number of photos and the price increases. Notably, the number of dots decreases dramatically when the number of photos exceeds 30. When the price is higher than 30000, there is no dot that consists of number_of_photo value larger than 30. In conclusion, we believe that the popular number of photos that people upload of their silver used cars is relatively small, no larger than 30.

Relationship between Body type and Price

The body type of the used cars demonstrate the shape of cars and they depend on intended use, market position, location and many other elements. Some of the body types include sedan, suv, hatchback and universal. We decide to use the word cloud because it can directly visualize the most frequent body types of used cars in our dataset. We seperate the data set based on the mean value of the price because we want to investigate whether the body type will change when the price of used cars changes. Therefore, we create the comparison word cloud based on the expensive used cars and cheap used cars.

The word cloud on the left is based on the used cars that are more expensive than the mean. The word cloud on the right is based on the used cars that are cheaper than the mean. It is obvious that the largest proportion of used cars in both circumstances is sedan. However, when the used cars are more expensive, the second largest proportion of used cars is SUV, followed by universal, hatchback, minivan, minibus and van. When the used cars are cheaper, the second largest proportion of used cars is hatchback, followed by universal, minivan, SUV, minibus and van. The main difference here is the body type of SUV and hatchback. The result is not surprising because in general, SUV cars are more expensive than the hatchback cars. We also find out that sedan is the most popular body type of used cars in this data set.

Research Question 2:

What is the Relationship between Specification and Price of Used Cars?

In this section, we are going to investigate the relationship between the specifications and the prices of used cars. Specifically, we will focus on three specifications: the drive train type of the car, the transmission type of the car and the engine type of the car. It is important to understand which specification might affect the car price because it can help consumers choose an ideal car within their price budget.

Relationship between Drivetrain and Price

The drive train of a car works with the engine to deliver power to wheels. There are four common types of drive drive trains: front-wheel drive (FWD), rear-wheel drive (RWD), and all-wheel drive (AWD). We choose to use a violin and box plot to visualize the distribution of price (USD) given different types of drive trains because it allows us to see the density and the summary statistics at the same time.

From the graph, we can see that the distributions of the price of FWD cars and RWD cars are quite similar: the median price is about 4000 dollars and the density curves are both heavily right-skewed, which means that the mean price is greater than the median price. This indicates that there are probably many cars with much higher prices, and these cars correspond to the outliers to the right. For the distribution of the price of AWD cars, the median is about 12000 dollars and the interquartile range is from 7500 to 17500 dollars, which is much higher than the prices of FWD and RWD cars. The density curve is also slightly right-skewed and there are many outliers to the right. This makes sense because AWD cars have better traction than FWD and RWD as every wheel gets the power. If one wheel starts to slip, other three can work to retain traction, which makes AWD cars good for driving in snow.

Relationship between Transmission & Price and Engine Type & Price

Next, we will investigate the conditional distribution of price given transmission type and engine type. There are two types of transmission: automatic and mechanical. There are three types of engines: diesel, electric, and gasoline. We choose to use a side-by-side histogram because we can compare the prices of automatic cars and mechanical cars by comparing the trend of the left plot and the right plot. We can compare the prices of cars with different engine types by comparing the proportion of colors of each bar.

By comparing the left and right plot, we can see that there are much more mechanical cars in this dataset. The distribution of the price of automatic cars is right-skewed, with many outliers to the right. Most mechanical cars cost 3000 to 10000 dollars and the most expensive one can cost as high as 50000 dollars. The distribution of the price of mechanical cars is heavily right-skewed, but there are not many outliers comparing with the automatic cars. Note that approximately 8000 mechanical cars cost about 2000 dollars, and most mechanical cars cost 1000 to 5000 dollars. This shows that in general, automatic cars are more expensive than mechanical cars. This also makes sense because automatic gears are more complex and are more expensive to produce.

Then we can compare the proportion of colors to determine the relationship between price and engine types. First note that there are almost no used electric cars in this dataset as we can barely see any green part in the plot above. There are more gasoline engines than diesel engines for both automatic cars and mechanical cars. This is probably because diesel engines are more popular for trucks than passenger cars and this dataset mainly consists of used passenger cars.

In both plots, we can see that most cars with diesel engines cost about 2000 to 5000 dollars, which is similar to the price of most cars with gasoline engines, and the distribution of the two colors are similar. This probably means that engine type does not effect price much, and we can verify it by calculating the correlation between price and engine type. The output below shows that the correlation between price and engine type is 0.089, which is very weak and it aligns with our observation from the plot.

## [1] 0.08949016

Research Question 3:

What is the Relationship between Usage/Depreciation and Price of Used Cars?

In this section, we examine the relationship between the usage/depreciation of second-hand cars and the price. Specifically, we use the following three variables to measure the usage state and depreciation of used cars:

year_produced:The year the car has been produced.

odometer_value: Odometer state in kilometers.

state: New/owned/emergency. Emergency means the car has been damaged, sometimes severely.

We first explore how the average price of used cars changes based on the year produced (1942~2019) by a time series plot. Further, we use a scatter plot to explore how different states of the used car and its odometer value affect its price.

Relationship between Car’s Production Year and Price

Here we calculated the average price of used cars each year and see if there is a trend that cars produced in recent years are more likely to have a higher price in the second-hand car market. Overall, we observe a clear upward trend from 1980 to 2019, suggesting that newer trade-ins typically bring higher prices in evaluating used cars. This makes sense as cars tend to lose their value rapidly after three to five years. We found it interesting that the price of cars produced around the 1940s reaches a peak and we also observe a slight uptick for cars produced from the 1960s to the 1970s. This finding suggests that certain types of used cars become a collectible or a classic, therefore the deprecation and pricing trend is not applicable in this situation.

Since we had unexpected findings about collectible used cars and classic types with high prices, we would like to explore more detailed factors that may affect the depreciation of used cars. In the next plot, we compare the relationship between odometer value and prices by three types of states of used cars.

Relationship between odometer, state, and price

In the above scatterplot, we observe that most shapes are green squares, followed by blue triangles and red dots. This suggests that most used cars are pre-owned while very few used cars are in an emergency state, as shown by the red dots in the plot.

For new cars, all the odometer values are zero and the prices are relatively high as the depreciation and usage could be minor. The price of new cars in the second-hand market still varies significantly from 15,000 to 50,0000, which may be affected by its manufacturer name and will be explored in the next section.

For cars in the emergency state which have been severely damaged, most prices are relatively low, below 5000(in USD), and there’s no clear trend in odometer values. This makes sense as regardless of age or odometer, condition plays a significant part in used car trade-in value. Dealers consider what it will cost them to repair or replace the things necessary for the car to sell at a high price.

For pre-owned cars, there’s a clear downward trend that greater odometer values lead to lower prices. This makes sense as even if the used car has a great appearance and clean accident history, higher mileage with worn equipment can sustain less in the future, which negatively impacts its value.

In conclusion, odometer values play a significant role in assessing used car prices as higher mileage leads to lower prices. While newer trade-ins typically have a higher price considering the year produced, collectible and class cars may be exceptions and do not follow the depreciation trend. In general, in the used car market, most cars are pre-owned while very few are in an emergency state with low prices.

Research Question 4:

How do Features relating to the Supply and Demand of the Cars affect their Prices?

In this section, we would like to investigate how factors relating to the supply and demand of the cars affect the trading prices in the used car market. Specifically, we will focus on the cars’ manufacturer and the length of duration which the cars were listed on the market as indicators of the supply and demand of those cars and examine how those factors are associated with the car prices.

Relationship between Cars’ Listed Duration and Price

From the word cloud above, we are able to see that the most common car manufacturers in our dataset are Volkswagen, Opel, BMW, Ford, Renault, Audi, and Mercedes-Benz. This means that cars by those manufacturers are the most commonly traded ones in the used cars market from our dataset. Hence, we can reasonably infer and conclude that cars by those manufacturers have a higher supply and demand in the used cars market than cars by other manufacturers.

Now, we would like to filter our data by the commonality of the car’s manufacturer and analyze the car prices given their manufacturer.

The listing duration measures the number of days the car is listed in the catalog. From the scatter plot shown above, we can able to observe that most cars have a listed duration ranging from 0 to 500 days, with a high concentration of them ranging from 0 to 250. We are able to visibly observe that almost all cars with a high trading price of above $30,000 have a listed duration of less than 250 days, and most cars with significantly longer listed duration are having the low trading prices. Specifically, the majority of cars with long listed duration have trading prices below $10,000. This observation supports the our hypothesis that shorter listing duration means the demand of the car is high, which allows the car to have a higher trading price. This also supports the basic economic theory that higher demand gives higher prices.

An interesting fact to notice is that BMW stands out from the other car manufacturers in this dataset having some high trading prices and the least amount of long listed duration. This can signify that BMW is a very popular car manufacturer in the sampling region.

Relationship between Car Manufacturer and Car Price

The two side-by-side word clouds shown above display the car manufacturers by commonality separated by the prices of cars. The left graph depicts the more expensive cars that have trading prices above the mean trading price from the whole dataset, and the right graph depicts the cheaper cars with prices lower than the mean trading price.

We are able to observe that Volkswagen is the most common car manufacturer for both sets of data. This is reasonable since Volkswagen is a large car manufacturer selling cars with a wide range of prices. However, the other common car manufacturers are very different for the two data sets. For the expensive cars, BMW, Mercedes-Benz, and Audi are the most common cars following Volkswagen, which are commonly known as “BBA” for their higher prices and quality. For the cheaper cars, Opel, Ford, Renault, and Peugeot are the most common car manufacturers following Volkswagen, which are usually more economic choices than BBA. This observation allows us to conclude that car manufacturers typically associated higher new car prices can have a higher demand for used-cars buyers with higher budget and are more likely to be traded with higher prices in the used car market.

Conclusion

In this study, we investigated different factors that can affect the prices of used cars and how these factors are correlated. First, we found that most sellers provides less than 30 photos for silver used cars and their prices are typically below 20000 dollars, but we don’t know whether this is true for different colors of used cars. For body type, the result shows that, in general, SUV are more expensive than hatchback cars, and sedan is the most popular body type for both expensive and cheap cars in our dataset. In the second research question, we investigated the relationship between specification and price. We found that the prices of AWD cars are likely to be higher than the prices of FWD cars and RWD cars. Moreover, in general, the prices of automatic cars are higher than the prices of mechanical cars. There is no significant relationship between engine type and car price. Then, we examined how depreciation affect the prices of used cars. As expected, newer trade-ins usually have higher prices than others, but some collectible cars produced in the 1960s are more expensive than the others. In addition, cars with higher mileage and an emergency state tend to have lower prices. In the last section, we concluded that shorter listing duration corresponds to a higher demand of the car, which leads to a higher price. Besides, Volkswagen produces both cheap cars and expensive cars. BMW, Mercedes-Benz, and Audi tend to produce more expensive cars, while Opel, Ford, Renault, and Peugeot tend to produce cheaper cars.

We hope that these information can help customers gain a rough idea of the price of their ideal car based on the appearance, specification, usage, and supply of the car, so that they would be able to choose a better car within their price budget.

Future Research

Several questions we have not investigated include how the long-term costs influence the price of used cars. Long-term costs include loan interest, depreciation, fuel, insurance, maintenance, and fees. Specifically, we would like to know more about maintenance costs and fuel costs. Therefore, we need data of fuel consumption, fuel efficiencies, general maintenance costs, and maintenance intervals of different brand models. These two indicators have important reference values for buyers. One of the limitations of our project is that the data we are mining only represents the situation in Belarus; therefore, there might be some special elements of Belarus, such as weather, affecting our results. Therefore, our future research can be conducted by expanding the sampling area. With more information, our project can provide reliable references for buyers of used cars.