YourPlatform

Instagram System Design

Functional Requirements

  • user can create post with images/videos
  • user can like, comment on posts
  • user can follow others
  • view timelines (home timeline, user timeline)

NonFunctional Requirements

  • post creation can have lag (eventual consistency)
  • timeine generation should be very fast, low latency
  • Highly available
  • Data should be persistent
  • hot and cold data (can archieve)
  • (Famous, Active, Live, Passive, Inactive) - different types of users
  • Global Scale

Scale

2 Billion MAU 1 Billion DAU 500 million posts/day

Each user likes 10 posts, comments on 3 posts (on an average)

500 million/10^5 posts/sec = 5k posts/sec 1 Billion* 10 /10^5 likes/sec = 100k likes/sec 1 Billion* 3/10^5 comments/sec = 30k comments/sec

Entities

  • User
  • Post
  • Like
  • Comment
  • Media/Asset

API

POST /posts

{

caption, tagline, mediaUrl, ..

}

media upload should happen using presigned url, that client uploads directly to the blob storage

POST /likes/{:postId}

POST /comments/{:postId}

{

text

}

POST /follow/{:userId}

POST /unfollow/{:userId}

home timeline

GET /timelines

user timeline

GET /timelines/{:userId}

High Level Design

image

###Asset Service - store media with different resolution and size for different types of devices stores in S3 and CDN

User A's timeline:

Fetch the people user A follows from Follow DB For each of the person in the above list - fetch their recent post Sort & limit the latest posts for timeline generation

Problem 1: If user follows a lot of users

follows table will return a lot of rows for each of the user needs to find posts

How should we shard post table?

  • shard post by post id? or user Id?
  • we don't know the post id, we need posts for all these users. so lets shard based on user id
  • but then we have to aggregate the data

User A follows user B, C, D. So we need posts from user B, C, D

Solution: populate feed cache on write

when User A posts -> find people who follows A for each of those peoples' feed we will add that post Id

Problem 2: What if User A is followed by a lot of people?

on post write, we have to find followers of the creator, for each followers feed table/cache to be updated That's a lot of work if user has lots of followers

Solution: We can use a async flow

image

Famous users: followers feed generation on read Active users: active this month -> populate Feed cache on post write Live User: populate feed cache, send live update via websocket Passive user: can ignore generating feed cache (not active over a month) Inactive User: No need to generate feed

How do you inform Live users:

image

Hot and Cold Posts:

image

Feed Personalization

User -> interests[]

image

That was a free preview lesson.