Twitter System Design
Design a Twitter timeline generation and posting a tweet functionality
Before discussing the overall system architecture, it is important to consider these different user categories and how we can best serve each group while maintaining optimal performance.
Famous Users are typically high-profile individuals such as celebrities, sportspeople, politicians, or business leaders who have a substantial number of followers.
Active Users are those who have accessed the system within the last few hours or days. For our purposes, we will consider individuals who have accessed Twitter within the last three days as active users.
Live Users are a subset of active users who are currently using the system, similar to those online on Facebook or WhatsApp.
When a user submits a tweet, the tweet processor service saves the tweet in a permanent data store, using Cassandra due to its efficiency in handling a large volume of tweets and query patterns.
The tweet processor service is responsible solely for posting tweets and does not expose any GET APIs for retrieving them. Upon posting a tweet, the tweet processor service sends an event to Kafka indicating the tweet ID and user ID. A tweet service is deployed on top of Cassandra that exposes APIs for fetching tweets by tweet or user ID.
Now let's consider the user's perspective. On the read-flow, users can view their user timeline (tweets from that user) or home timeline (tweets from people they follow). Given that a user could follow many users, querying all the relevant information at runtime could delay rendering. To address this, we cache the user's timeline, pre-calculating the timeline of active users and caching it in Redis. This can be achieved using a timeline processor. As a result, active users can instantly view their timeline.
We have examined how the system handles active users, but how can we optimize the flow for live users? As previously mentioned, when a tweet is posted, an event is sent to Kafka, which communicates with the tweet processor to generate timelines for active users and store them in Redis. However, if the tweet processor detects that a user who requires an update is live, it will trigger an event to Kafka, which will then interface with the live websocket service mentioned earlier. This websocket service will send a notification to the app and update the timeline in real-time.
Here's the video of me explaining the design 👇
Thank you for reading this edition of the newsletter!