Instagram System Design
Functional Requirements
- user can create post with images/videos
- user can like, comment on posts
- user can follow others
- view timelines (home timeline, user timeline)
NonFunctional Requirements
- post creation can have lag (eventual consistency)
- timeine generation should be very fast, low latency
- Highly available
- Data should be persistent
- hot and cold data (can archieve)
- (Famous, Active, Live, Passive, Inactive) - different types of users
- Global Scale
Scale
2 Billion MAU 1 Billion DAU 500 million posts/day
Each user likes 10 posts, comments on 3 posts (on an average)
500 million/10^5 posts/sec = 5k posts/sec 1 Billion* 10 /10^5 likes/sec = 100k likes/sec 1 Billion* 3/10^5 comments/sec = 30k comments/sec
Entities
- User
- Post
- Like
- Comment
- Media/Asset
API
POST /posts
{caption, tagline, mediaUrl, ..
}media upload should happen using presigned url, that client uploads directly to the blob storage
POST /likes/{:postId}
POST /comments/{:postId}
{text
}POST /follow/{:userId}
POST /unfollow/{:userId}
home timeline
GET /timelines
user timeline
GET /timelines/{:userId}
High Level Design

###Asset Service - store media with different resolution and size for different types of devices stores in S3 and CDN
User A's timeline:
Fetch the people user A follows from Follow DB For each of the person in the above list - fetch their recent post Sort & limit the latest posts for timeline generation
Problem 1: If user follows a lot of users
follows table will return a lot of rows for each of the user needs to find posts
How should we shard post table?
- shard post by post id? or user Id?
- we don't know the post id, we need posts for all these users. so lets shard based on user id
- but then we have to aggregate the data
User A follows user B, C, D. So we need posts from user B, C, D
Solution: populate feed cache on write
when User A posts -> find people who follows A for each of those peoples' feed we will add that post Id
Problem 2: What if User A is followed by a lot of people?
on post write, we have to find followers of the creator, for each followers feed table/cache to be updated That's a lot of work if user has lots of followers
Solution: We can use a async flow

Famous users: followers feed generation on read Active users: active this month -> populate Feed cache on post write Live User: populate feed cache, send live update via websocket Passive user: can ignore generating feed cache (not active over a month) Inactive User: No need to generate feed
How do you inform Live users:

Hot and Cold Posts:

Feed Personalization
User -> interests[]
