Table of Contents[Hide][Show]
Instagram Feed is a platform for sharing and connecting with the people and things that matter to you. When you open Instagram or refresh your feed, the photographs and videos we think you’ll be interested in will show towards the top.
The news feed is a collection of items containing text, images, or videos created by other entities in the system that are targeted for you to read. It’s always changing, while other organizations are making fresh posts.
In this post, we will closely look at the system design of Instagram’s feed. So, let’s begin.
- The user’s news feed is created from posts from other entities in the system that the user has followed or is interested in.
- Text, pictures, and videos can all be found in posts.
- The user’s news feed should be updated with new postings created by others.
- The creation of news feeds should take place in real-time. The end-user should experience only 12 seconds of delay.
- Appending a new post: It should take no more than 5 seconds for a new post to appear in a news feed request after it is submitted to the system.
2. Estimation of Capacity
- As of March 2021, the world’s population is just 7.8 billion people. It indicates that 21% of the world’s population is a Facebook DAU (Daily Active User) and 32% is a Facebook MAU (Monthly Active User) (Monthly Active User). That is amazing.
- Let’s pretend the system we’re constructing has 1 billion DAU to make things easier.
- Assume a person follows 500 people or businesses on Facebook. A group or a page might be considered an entity.
Assume that one user downloads the news feed 10 times each day on average. So it’s roughly 116K QPS and 1e10 requests each day.
Estimates of Storage
Assume we maintain 500 posts from each user’s news feed in memory on average for quick retrieval, and each post is 1KB in size. So 500 KB per user, 500 TB for all DAUs, and 5000 computers with 100 GB RAM each.
3. APIs for Systems
userId (GUID): the user whose news feed is being fetched.
The following fields are available in the optional options parameter:
- afterPostId (GUID): get the news feed from the post following this one. If not specified, get the most recent posts.
- count (number): the maximum number of posts that each request can return. The backend sets a default maximum number if none is supplied.
- excludeReplies (boolean): prevents replies from being included in the news feed.
- The JSON returned contains a list of news feed items.
4. Designing a Database
- entityId, name, description, and timestamp are all required fields.
- The following fields are required: PostId, title, text, authorId, and timestamp.
- timestamp, url, and mediaId
- Other Users or Entities can be followed by a user. (m:n)
- Author-Post: Both users and entities can create posts. Assume that only Users can create Posts for the sake of simplicity. (1:n; authorId is embeddable).
- Each post is accompanied by some form of media. (1:n)
5. High-Level Design
When Jay requests her news stream, the system will do the following:
- Retrieve the IDs of all the people and things that Jay follows.
- Aggregate posts: given those IDs, obtain the most recent, popular, and relevant posts.
- Rank the posts according to their relevancy and timing.
- Cache: save the created feeds and send Jay the top 20 posts.
- When Jay has finished reading the first 20 posts, another request is sent to get the next 20 posts.
Assume Jay is following Aayush and that Aayush posts anything new. Jay’s news feed will need to be updated by the system:
- Retrieve the IDs of Aayush’s followers.
- Add new articles: Add Aayush’s post to the news feed pool of those IDs’ followers.
- Rank the posts according to their relevancy and timing.
- Update the cache of the ranking post.
- Followers should be notified when new posts are published.
Users’ connections are maintained by web servers.
The procedures indicated above are executed by the application server.
Cache and database:
- Relational database user/entity
- Relational database (post)
- Image/video attribute: Aayush storge
- Relational database metadata
- Feed production
- Notification of feeds
6. Detailed Design
Generation of feed
Fan-out read naive implementation:
Problems with this sloppy implementation include:
- Users with a large number of friends/followers will notice a significant slowdown since we must sift, merge, and rank a large number of postings.
- When a user loads their page, we construct the timeline. This can be sluggish and have a lot of latency.
- Each status update will result in feed updates for all followers for live updates. This can cause significant delays in our Newsfeed Generation Service.
We can pre-generate the chronology and save it in memory to increase efficiency.
Offline Production (Fan-out write)
We can have dedicated servers that are constantly creating and storing users’ newsfeeds in memory. We can just deliver the news feed from the pre-generated, saved location whenever a user wants it.
How many feed items should a user’s feed be stored in memory?
Adapt based on your usage behavior.
Should we make a newsfeed for all users (and preserve it in memory)?
- For people who don’t log in very often.
- LRU-based caching is a simple approach.
- A better solution is to figure out how users log in. When is it? Which weekdays are you talking about?
Publication of feed
Fanout is the process of sending a post to all of your followers.
When you request a news feed, the system receives a read request. Fanout read sends a read request to all of your followers, asking them to read their content.
- The procedure of writing is inexpensive.
- When reading data, it’s easier to use various aggregation algorithms.
- For a person with a lot of followers, the read operation is rather expensive.
- Users will not see fresh data until they pull it.
- When we pull to fetch the most recent postings on a regular basis, it’s difficult to find the proper pull cadence, and most pull requests will return an empty answer, wasting resources.
A write request is made to the system when you send a new post. The write request is sent out to all of your followers to update their newsfeed using fanout write.
- The read process is inexpensive.
- For a user with millions of followers, the write procedure is too expensive.
The rank of the Feed
Instead of just ordering the feeds chronologically, today’s ranking algorithms additionally attempt to guarantee that items with greater relevance are prioritized.
- Choose factors that can help you decide the relevancy of a feed item, such as the number of likes, comments, and shares, the time the item was last updated if the article contains photos or videos, and so on.
- Calculate the score based on the characteristics.
- Use the score to rank the posts.
Set up KPIs like user retention, ad income, and so on to see how effective our ranking system is.
Despite the fact that Instagram or its parent business Facebook is a huge corporation, it has a better understanding of system design.
I tried my hardest to provide you with a high-level summary of the Instagram feed.
I hope it was helpful and that you will put it to good use.