Large-scale online applications have come a long way in the previous two decades. These innovations have altered our perceptions of software development. Facebook, Instagram, and Twitter, for example, are all scalable platforms.
These systems must be built to manage massive volumes of traffic and data since billions of people use them at the same time throughout the world. This is when system design enters the picture.
The process of establishing the architecture, interfaces, and data for a system that meets certain criteria is known as system design. Through cohesive and efficient systems, system design satisfies the demands of your business or organization.
Once your company or organization has determined its criteria, you can start incorporating them into a physical system design that meets your consumers’ demands.
Whether you choose to go with bespoke development, commercial solutions, or a combination of the two, how you design your system will determine how you build it.
We’ll take a detailed look at the system design of the Twitter timeline in this post, complete with a tutorial. Let’s get started.
Step 1: Outline use case & constraints
Use case
- A user uploads a tweet.
- The service sends push notifications and emails to followers of tweets.
- The user’s timeline is viewed (activity from the user)
- The user looks at the home timeline (activity from people the user is following)
- Keywords are searched by the user.
- The service is really accessible.
Out of scope
- Tweets are sent to the Twitter Firehose and other streams using this service.
- The service removes tweets based on the visibility settings of the user.
- If the user isn’t also following the person being replied to, hide the reply.
- Observe the ‘hide retweets’ option.
- Analytics
Constraints & assumptions
State Assumptions
- The traffic is not dispersed equally.
- It should be simple to send a tweet.
- Unless you have millions of followers, sending a tweet to all of your followers should be quick.
- There are 100 million active users.
- 15 billion tweets each month or 500 million tweets every day
- Each tweet has a fanout of 10 delivery on average.
- Every day, fanout delivers 5 billion tweets.
- Fanout delivers 150 billion tweets every month.
- 250 billion monthly read requests
- 10 billion monthly searches
Timeline
- The timeline should be easy to navigate.
- Twitter is more about reading than writing.
- Optimize for quick tweet reading
- Tweet consumption is time-consuming.
Search
- The search process should be quick.
- It’s time-consuming to search.
Calculate usage
Size of each tweet:
- 8 bytes tweet id
- 32 bytes user-id
- 140 bytes of text
- media – average of 10 KB
- Total: ~10 KB
Every month, 150 TB of fresh tweet content is generated.
- * 500 million tweets every day * 30 days per month * 10 KB per tweet
- In three years, there has been 5.4 PB of fresh tweet content.
There are 100,000 read requests each second.
- * (400 requests per second / 1 billion requests per month) 250 billion read requests each month
There are 6,000 tweets each second.
- * (400 requests per second / 1 billion requests per month) 15 billion tweets every month
On fanout, 60 thousand tweets are sent every second.
- Fanout delivers 150 billion tweets each month* (400 requests per second / 1 billion requests per month).
4,000 requests for information every second
- * (400 requests per second / 1 billion requests per month) 10 billion searches each month
Some useful conversion
- Every month, 2.5 million seconds pass.
- 2.5 million requests per month at 1 request per second
- 100 million requests per month x 40 requests per second
- 1 billion requests per month = 400 requests per second
Step 2: High-level diagram
Step 3: Explaining core components
We could save the user’s own tweets to populate the user timeline (activity from the user) in a relational database if they submit a tweet. It’s more difficult to deliver tweets and develop the home timeline (activity from individuals the user follows).
A typical relational database would be overwhelmed by fanning out tweets to all followers (60 thousand tweets delivered each second). We’ll probably want to go with a fast-write data storage like a NoSQL database or Memory Cache.
Reading 1 MB sequentially from memory takes roughly 250 microseconds, but reading from SSD takes 4 times as long, and reading from disk takes 80 times as long.
An Object Store can be used to store data such as images and videos.
- The Web Server, which is acting as a reverse proxy, receives a tweet from the Client.
- The request is sent to the Write API server by the Web Server.
- The Write API saves the tweet to a SQL database in the user’s timeline.
The Fan-Out Service is contacted by the Write API, and it performs the following tasks.
- Finds the user’s followers in the Memory Cache by querying the User Graph Service.
- On a Memory Cache, the tweet is saved in the home timeline of the user’s followers.
- 1,000 followers = 1,000 lookups and inserts = O(n) operation.
- The tweet is saved in the Search Index Service for quick searching.
- The Object Store is used to store media.
- Sends push alerts to followers via the Notification Service.
- To send out alerts asynchronously, it uses a Queue.
We can utilize a native Redis list with the following structure if our Memory Cache is Redis:
The user’s home timeline would be updated with the new tweet, which would be stored in the Memory Cache. We’ll utilize the following public REST API:
The user timeline is viewed by the user.
- The Web Server receives a user timeline request from the Client.
- The request is sent to the Read API server by the Web Server.
- The Read API queries the SQL Database for the user timeframe.
The REST API would work similarly to the home timeline, with the exception that all tweets would originate from the user rather than the people they follow.
A user searches for keywords:
- The Web Server receives a search request from the Client.
- The request is sent to the Search API server by the Web Server.
Step 4: Twitter timeline
Timeline creation is a difficult task. A timeline generating server that links to the web or application servers is required.
Every time a user signs in, the timeline service maintains track of the newest tweets from the users in the follower’s table and updates or refreshes the user’s timeline.
We don’t implement any sort of ranking system here; instead, we presume that the top 5 tweets from the user’s followers are presented in the timeline in order of creation time. We can maintain a 50-tweet refresh cutoff. We still cease refreshing or constructing a timeline after that threshold is reached until the user refreshes the page.
High latency and performance concerns will come from live user feed creation. Instead, creating an offline stream that can be presented instantly is the best way to improve performance. Run dedicated timeline servers that ping the application server on a regular basis to refresh the feed based on the time it was created.
The ranking algorithm should take into consideration crucial signals and provide weight to guarantee that a user’s timeline is not dominated by material from one or more of the accounts they follow.
More precisely, we can choose features related to the relevance of any feed item, such as the number of likes, comments, shares, and update time. Each of these criteria should be used to rate the tweet, and then that rank should be used to show tweets on the timeline.
Should we constantly alert users when new content for their newsfeed becomes available? Users can find it beneficial to be alerted when new data becomes available. On mobile devices, however, when data use is quite costly, it can waste bandwidth.
As a result, we can opt not to push data to mobile devices and instead allow users to “Pull to Refresh” for new postings.
Step 5: Scaling design
A potential bottleneck is the Fanout Service. Twitter users with millions of followers will have to wait several minutes for their tweets to roll out. This might cause a race with replies to the tweet, which we could avoid by re-ordering the tweets at serve time.
We could also prevent spreading tweets from people with a large number of followers. Instead, we may do a search for tweets from highly-followed individuals, integrate the search results with the user’s home timeline results, and then reorder the tweets at serve time.
Additional enhancements include:
- Keep only a few hundred tweets in the Memory Cache for each home timeline.
- In the Memory Cache, only active users’ home timeline information is saved.
- We can reconstruct the chronology from the SQL Database if a user had not been active in the preceding 30 days.
- To find out who the user is, utilize the User Graph Service.
- Add the tweets to the Memory Cache by retrieving them from the SQL Database.
- The Tweet Info Service can only save a month’s worth of tweets.
- In the User Info Service, only active users are saved.
- To keep latency low, the Search Cluster would most likely need to maintain the tweets in memory.
Conclusion
Although Twitter is a large organization, it has a better understanding of system design. I did my best to provide you with a high-level overview of the Twitter timeline.
I hope you gained useful information from it and can put it to good use.
Leave a Reply