Instagram System Design

Instagram is a globally distributed, media-heavy social platform optimized for content creation, engagement, and extremely fast feed consumption. The system must balance write-heavy workloads with ultra-low-latency reads at massive scale.

Functional Requirements

The system must support the following Functional Requirements:

Users can create posts (images / videos)
Users can like posts
Users can comment on posts
Users can follow / unfollow others
Users can view:
- Home timeline (personalized feed)
- User timeline (profile posts)

Non-Functional Requirements

Key system constraints:

Eventual consistency acceptable for writes
Feed generation must be extremely fast
Highly available system
Durable & persistent storage
Support hot vs cold data
Handle multiple user classes
Operate at global scale

User Behavior Diversity

Different user classes influence architecture:

Famous Users → Who has Millions of followers
Active Users → Frequent consumers
Live Users → Real-time updates required
Passive Users → Rarely open app
Inactive Users → No optimization needed

System behavior must adapt per user type.

Scale Estimation

Assumptions:

2 Billion monthly active users
1 Billion daily active users
500 Million Posts / Day

For simplicity, assume:

1 day ≈ $10⁵$ seconds (rounded for mental math)

POST TPS (transaction per second):

$\frac{500 \;million}{10^5}$ posts per sec = $\frac{500 \times 10^6}{10^5}$ posts per sec = $5k$ posts per sec

Engagement TPS:

Assume Each user likes 10 posts, comments on 3 posts (on an average)

$\frac{1\; Billion \times 10} {10^5}$ likes per sec = $\frac{10^9 \times 10} {10^5}$ likes per sec = 100k likes per sec

$\frac{1\; Billion \times 3} {10^5}$ comments per sec = $\frac{10^9 \times 3} {10^5}$ likes per sec = 30k comments per sec

Feed Read QPS

Reads vastly outnumber writes. Consider 20 feed requests per user per day.

$10^9$ users×20= $2×10^{10}$ feed requests/day

$\frac{2×10^{10}}{10^5}$ feed requests/sec = 200,000 feed reads/sec

Even if we consider Peak QPS = 3× to 5× of Average QPS => $200K×5=1M$ feed reads/sec which is huge.

Core Entities

Primary data models:

User
Post
Like
Comment
Media / Asset

Each entity has distinct scaling and storage needs.

APIs

Post Creation

POST /posts
{
  caption,
  mediaUrl,
  ...
}

media upload should happen using presigned url, that client uploads directly to the blob storage

Client → Blob Storage (direct upload)

Avoids backend bottlenecks.

Engagement APIs

POST /likes/{postId}
POST /comments/{postId}
POST /follow/{userId}
POST /unfollow/{userId}

Timeline APIs

GET /timelines          → Home feed
GET /timelines/{userId} → User profile feed

High Level Design

Instagram’s architecture separates responsibilities into independent services to support massive scale, high availability, and low-latency reads.

Think about different responsibilities:

User Creation → identity heavy
Follow/UnFollow Users - Graph Network
Post Creation → Write-heavy
Timeline / Feed → Read-heavy & latency critical

So let's create separate Microservices for different Responsibilities. Databases and other storage integrations are handled within each service boundary, allowing services to optimize their own storage and delivery strategies.

API Gateway

Role: Entry point for all client interactions.

Responsibilities

Routes requests to internal services
Handles authentication / authorization
Applies rate limiting & throttling
Centralizes logging & monitoring

The gateway prevents clients from directly accessing backend services and simplifies request management.

User Service

Owns: User identity and profile domain.

Responsibilities

User creation & updates
Profile retrieval
Account metadata management
User-related validations

Relational DB or strongly consistent KV store is the right choice for User Database.

This service is frequently accessed by nearly all system components, making it a foundational dependency.

Follow Service

Owns: Social graph relationships.

Responsibilities

Follow / unfollow operations
Fetch followers & followees
Maintain user relationship edges

Wide-column DB (Cassandra) or Graph-optimized store is the best choice for Follow Database as connections create a Graph type network.

The follow graph is a core driver of feed generation and requires high scalability and efficient querying.

Post Creation Service

Owns: Post write workflow.

Responsibilities

Validate post payloads
Handle post creation requests
Stores Post Images/Videos in Blob Storage, from where those can be pulled to CDN (content delivery network)
Persist post metadata (caption, description, creator_id etc.) in some Metadata database

Post creation is a write-heavy operation and must scale independently from feed reads.

A NoSQL distributed DB with high write throughput is the best choice for Post Metadata because of the Continuous high-volume writes.

Timeline / Feed Service

Owns: Feed generation and retrieval.

Responsibilities

Generate user home timeline
Fetch relevant posts
Rank & order feed items
Serve low-latency responses

Feed retrieval is the most latency-sensitive and read-heavy path in the system.

Cache-first design (Redis / Memcached) + backing store is the right approach for Timeline / Feed Service.

Deep Dive - Post Creation:

Media Processing & Multi-Resolution Support

In media-heavy systems like Instagram, storing a single uploaded image/video is insufficient. Different devices, screen sizes, and network conditions require multiple optimized variants of the same media. This introduces additional components into the post creation workflow. The system never serves the raw original directly to most users.

Optimized pipeline:

Client Upload → Post Ingestion Service → Blob Storage (Original) → Media Processing Pipeline → Blob Storage (Variants) → CDN

Media Processing Service (Critical Component)

Responsibilities

Generate multiple resolutions (thumbnail, medium, high)
Resize & compress images
Transcode videos into adaptive formats
Optimize for bandwidth & device constraints

Why needed:

Different devices require different sizes
Reduces payload & latency
Improves user experience
Saves CDN bandwidth cost

Example variants:

Thumbnail (low resolution)
Feed version (compressed)
High-resolution viewer version

Async Processing Trigger

When media is uploaded:

Post Ingestion Service → Emit MediaUploaded Event

Media Processing Service consumes events:

Fetch original media
Generate variants
Store derived assets

Avoids blocking post creation latency.

Blob Storage

Blob storage now holds:

Original media
Processed variants
Device-optimized formats

Consider these Variants:

/media/{postId}/original
/media/{postId}/thumbnail
/media/{postId}/medium
/media/{postId}/high

CDN Integration

CDN sits in front of blob storage. If Media not found in CDN then pulled from Blob Storage.

Client → CDN → Blob Storage

Benefits:

Edge caching
Low-latency global delivery
Offloads origin traffic

CDN primarily serves processed variants.

Metadata Implications

Post Metadata Store now keeps:

Media identifiers / keys
URLs for variants
Media type & attributes

Example:

{
  postId,
  media: {
    thumbnail_url,
    medium_url,
    high_url
  }
}

Timeline Generation

Timeline generation is one of the most critical read paths in Instagram.
A simple design often works at small scale but quickly collapses under real-world traffic.

Below are the intuitive naïve strategies for both home and user timelines.

User Timeline

Simpler naïve strategy:

Query Post Store by creator_id
Sort by timestamp
Paginate results

Why This Works Better?

Single-partition access pattern
Predictable query cost
No cross-user aggregation
Naturally cacheable

User timeline is fundamentally easier than home timeline.

Home Timeline - Naïve Approaches

A straightforward approach:

Fetch users followed by User A from Follow DB
For each followed user:
- Query Post Store for recent posts
Aggregate posts
Sort by creation time
Return top N results

Though this is a simple approach, it fails at Scale because of Explosive Fan-Out Reads.

If User A follows 1,000 users:

1 Follow query
1,000 Post queries

Latency & DB pressure explode. Slowest dependency determines response time. This Direct aggregation approach becomes extremely expensive.

Even a few slow queries → feed delays.

How should we shard post table?

If User A follows user B, C, D. So we need posts from user B, C, D

shard post by post id? or user Id?
we don't know the post id, we need posts for all these users. so lets shard based on user id
but then we have to aggregate the data

Will caching posts solve the problem?

No, Even if posts are cached, the system still performs fan-out reads.

Alternative Naïve Strategy: Full Scan + Filter

Another bad but intuitive idea:

Scan recent posts globally
Filter posts from followed users
Sort & limit
Cache

Why This Is Worse?

This approach is inefficient because it requires scanning a massive global dataset for every feed request, leading to extreme read amplification and wasted computation. Caching does not fix the problem because the system must still perform the expensive global scan before knowing what to cache, and personalized feeds have low cache reuse.

Timeline Generation Deep Dive: Solution

Populate Feed Cache on Write (Fan-Out on Write)

Instead of generating timelines at read time, the system shifts work to the write path.

When a user creates a post:

Identify followers from Follow Service
For each follower:
- Insert post_id into their feed cache

Result:

Feed reads become simple lookups
No expensive aggregation during read requests
Predictable low-latency timeline retrieval

This converts read amplification into controlled write amplification, which is far more scalable for feed-heavy systems.

Problem: High-Follower Users & Write Amplification

With fan-out on write, post creation requires updating the feed cache of every follower.

For highly popular users:

Millions of followers
Millions of feed updates per post
Increased latency on write path
Risk of system overload

Synchronous updates become impractical.

Solution: Asynchronous Fan-Out

Shift feed updates to an async pipeline.

Post Ingestion Service:

Persist post metadata
Emit PostCreated event in Post Topic

Background workers / consumers:

Process follower fan-out
Update feed caches independently
Retry on failures

Benefits:

Write latency remains low
Workload spreads across workers
Prevents traffic spikes from blocking requests
Improves system resilience

Async processing decouples user-facing latency from heavy fan-out computation.

Hybrid Approach

Famous users: followers feed generation on read

Active users: active this month -> populate Feed cache on post write

Live User: populate feed cache, send live update via websocket (discussed in next section)

Passive user: can ignore generating feed cache (not active over a month)

Inactive User: No need to generate feed

How do you inform Live users:

Live users require real-time awareness of new posts without waiting for feed refresh.

Flow:

Post created → Post Ingestion Service emits event

Post Workers process fan-out / feed updates

Timeline Cache updated

Notification event pushed to WebSocket Manager

Websocket Manager has mapping of Websocket Handlers -> connection (stored in cache)

WebSocket Handler delivers update to active connections

Result:

✔ Connected users instantly see new posts

✔ No polling / refresh required

✔ Low latency user experience

Hot and Cold Posts:

Hot and cold data separation is essential in Instagram due to extreme read skew.

Most content access concentrates on:

✔ Recent posts ✔ Popular posts ✔ Frequently viewed media

Older content becomes rarely accessed.

New Post → Hot Tier (cache / fast DB)

→ Gradual Demotion

→ Cold Tier (cheap storage)

Promotion/demotion driven by:

Access frequency
Recency
Engagement signals

We are having one Archival Service that fetches old/cold posts from post database and puts them in Archival database.