Instagram is becoming an increasingly popular marketing tool, with influencers leveraging their platform on the app to advertise various products and services. Instagram also has an algorithm that introduces certain posts to people’s ‘Explore’ feed. The marketing power of a post increases significantly as the post gets noticed by the algorithm and gets pushed to more people’s recommendations. Here, two of the criteria for the algorithm to consider a post desirable are a high engagement rate and a high number of likes. Therefore, a good influencer is a user who can create posts that garner not only a high number of likes but also a high engagement rate. Engagement rate is a measure of the proportion of their followers that interact with the influencer’s post ((# of likes + # of comments) / # of followers). In this research, we are interested in exploring which characteristics of an Instagram post might be associated with its engagement rate.
The Instagram Like Predictor Dataset contains information on posts from approximately 1800 Instagram influencers who are featured on a social media analytics platform Inconosquare Index Influencers. This dataset collected its information from the same analytics platform. The dataset consists of 30000 rows and 14 columns. Each row corresponds to one post that was posted by an influencer on Instagram. For each post, there is a variable or column that details the post influencer’s username, the number of followers they have, the number of people they are following, the number of likes the post has garnered, the number of comments on the post, the post’s text or caption, the number of tags on the post, the list of tags on the post, the date the post was shared, the day the post was shared (0 being Monday to 6 being Sunday), the type of post (1 being photo and 2 being video), the numbers of users in the photo, a link to the post, and the location from which the post was shared. The quantitative variables of the dataset are ‘followers’, ‘following’, ‘likes’, ‘comments’, ‘number of tags’, and ‘users in the photo’. The categorical variables include ‘username’, ‘text’, ‘list of tags’, ‘date’, ‘day(0 Monday,6 Sunday)’, ‘type(1 photo, 2 video)’, ‘link’, and ‘location’. There are a variety of categorical variable types, including text, dates, ordinal categorical variables represented by numbers, and coordinates. Given the diversity and number of variables, there are more possibilities and directions to explore when asking questions about the data. This dataset is created by a team of independent researchers, Corentin Dugué, Giovanni Alcantara, Joseph Shalabi, and Sahil Shah.
Our research questions of the engagement rate of Instagram posts are:
Are the hashtags used by high engagement rate posts and low engagement rate posts different?
Is there an association between the number of accounts tagged in the post and the engagement rate of the post?
Are the high engagement rate posts and the low engagement rate posts from different places in the world? Does this factor vary based on the type of the post?
Do posts made on different days of the week tend to have different engagement rates? Does this relationship vary based on the type of post?
We are also interested in whether posts with different numbers of users tagged in them tend to have different engagement rates. To investigate this, we again split the posts into those with high engagement rates and those with low engagement rates and examined the distribution of the number of users tagged across the two categories.
We noticed that the distributions of the number of users tagged in posts seemed very similar between posts with high and low engagement rates. The majority of both types of posts had less than three users tagged in them, and the maximum number of users in a single post was 26.
Since posts with more users tagged show up on the feeds of everyone who follows at least one of the users, we hypothesized that there may be a difference between the engagement rates on posts with a low, medium, and high number of users. We split the posts into three groups by evenly dividing the range of the number of users tagged, and plotted the distribution of engagement for each.
The median engagement rate appears to be similar across all categories for the number of users tagged in the post, but posts with a low number of users have more outlier engagement values than the other two groups. Thus, from this plot, we can make a nuanced conclusion that engagement rate is not strongly dependent on the number of users tagged in the post.
Looking at the dataset, we wondered whether Instagram posts amassed higher or lower engagement rates based on where in the world they were posted and which type of post they were (photo or video). To investigate further, we decided to create a map that displayed all the available locations and types of Instagram posts from the data. After cleaning the data and finding the boundaries for latitude and longitude, we created our first map.
The map above pinpoints the available locations for all Instagram posts in the insta_data dataset. The color of each point indicates how much engagement the post received. The shape of each point indicates the type of post (photo or video). Once we displayed this world map, we noticed that there were certain places where more points were clustered, so we created more maps to zoom in on those locations and view the points more clearly.
In North America, posts seem to have fairly uniform engagement rates. Approximately half the posts in North America are hovering around the 0.1 mark, while the rest seem to have lower engagement rates. In addition, Cuba and the Dominican Republic boast posts with much higher engagement rates, with posts in the Dominican Republic reaching up to near 0.4. Most of the posts are photos, but we can observe some video posts on the southwest coast of the United States. This may be due to the fact that there aren’t many more data points for the central or eastern states. One possibility for Cuba and the Dominican Republic having higher engagement rates is that they are both popular tourist destinations.
In South America, posts with moderately higher engagement rates are found in Peru or Argentina. However, the engagement rates across South America seem, for the most part, uniform. Most posts are photos, and barely any videos can be seen (we can see about two videos on the eastern border of Brazil). The point of the post that appears to have the highest engagement rate in South America is located near or in Lima, Peru.
There are many more points in Europe than in North America and South America. Posts with higher engagement rates can be seen in several countries, though they are generally in the area of central Europe or Russia. Other points of note are posts with high engagement rates found in Denmark, Russia, and Croatia.
Finally, the point cluster in Europe is largely based around Greece, with posts with higher engagement rates being fairly evenly scattered (particularly near the islands and borders). There is also a visibly higher amount of video posts in this cluster than in previous maps. One hypothesis for the greater volume and engagement of posts is that many more influencers are simply attracted to Greece and the Mediterranean area. Greece relies on its tourism industry and many people go every year for vacation and sightseeing.
Ultimately, we were able to observe post types and locations around the world. Using maps was the clearest way to show post location. Visually, it was easiest to see where there were the most posts, as well as posts with higher or lower engagement rates. We noted key locations with high engagement rate posts such as Greece, Denmark, Russia, Croatia, and the Dominican Republic.
Judging by the presence of blue colors, we have very significantly more observation of videos posted except on Mondays that have low engagement rate than what we would expect under independence while we have more observation of the photos posted on Monday, Thursday, and Sunday that have high engagement rates than what we would expect under independence.
Surprisingly, judging by the opaque red color, we have very significantly less observation of videos posted on any days of the week that have high engagement rate than what we would expect under independence while we have significantly less observation of photos posted on Thursday and Saturday that have low likes-to-followers ratio than what we would expect under independence.
Thus, we can make a nuanced conclusion that photos generally do better than videos in terms of engagement rate, and there is an association between the day of the week and the engagement rate.
In conclusion, there are several noteworthy findings from our graphical analyses.
In terms of hashtags, there was a clear divide between the types of tags used in high engagement and low engagement posts. Our word analysis showed that personal and original hashtags were more frequently used in high engagement posts, while generic hashtags that existed to merely attract followers (e.g. #follow, #repost, etc.) were more frequently used in low engagement posts.
Moreover, our investigation into the number of users in the post revealed that there seemed to be no significant association between this attribute and the engagement on a post, especially compared to the other post attributes we explored. However, we did note that a heavy majority of posts had fewer than three users tagged in them, so future research could focus on only these posts to see if there are any trends involving the number of users when we ignore outliers.
Furthermore, after viewing the map of Instagram post locations, post types, and engagement rates, we concluded that while Instagram posts with higher engagement rates were found more in certain areas of the world, post type did not seem to be an important contributing factor. Rather, a larger variety of post types was observed where there were simply more posts to be found. Post location yielded more interesting results, as posts with greater engagement appeared to be found in very particular locations such as Greece, Denmark, Croatia, Peru, and the Dominican Republic. One hypothesis for this observation is that influencers are more likely to post engaging content at tourist locations. Additionally, we were able to connect certain places with high engagement rates, such as Lima, Peru, with results from the word cloud.
Lastly, we found that both post type and the day of the week on which the post was made appeared to be associated with its engagement rate. Photo posts generally received higher engagement than video posts regardless of the day of the week. In particular, photo posts received higher engagement on Monday, Thursday, and Sunday.
In future research, we could better align our methods with limitations of the data set; for example, we need to take into account that the dataset only contains a year’s worth of posts. Moreover, Instagram only has been tackling down the ‘bot account’ issues (e.g. creating automated accounts for spamming or inflating the number of likes or followers) since late 2017. Moreover, we are uncertain about the real reason that an unusually high number of posts that used hashtags relating to Greece were featured in the dataset. Therefore, further research can include a study on the link between events that took place in Greece between 2016 and 2017, or a duplicate study using more recent datasets.