Instagram System Design
Instagram is a globally distributed, media-heavy social platform optimized for content creation, engagement, and extremely fast feed consumption. The system must balance write-heavy workloads with ultra-low-latency reads at massive scale.
Functional Requirements
The system must support the following Functional Requirements:
- Users can create posts (images / videos)
- Users can like posts
- Users can comment on posts
- Users can follow / unfollow others
- Users can view:
- Home timeline (personalized feed)
- User timeline (profile posts)
Non-Functional Requirements
Key system constraints:
- Eventual consistency acceptable for writes
- Feed generation must be extremely fast
- Highly available system
- Durable & persistent storage
- Support hot vs cold data
- Handle multiple user classes
- Operate at global scale
User Behavior Diversity
Different user classes influence architecture:
- Famous Users β Who has Millions of followers
- Active Users β Frequent consumers
- Live Users β Real-time updates required
- Passive Users β Rarely open app
- Inactive Users β No optimization needed
System behavior must adapt per user type.
Scale Estimation
Assumptions:
- 2 Billion monthly active users
- 1 Billion daily active users
- 500 Million Posts / Day
For simplicity, assume:
1 day β seconds (rounded for mental math)
POST TPS (transaction per second):
posts per sec = posts per sec = posts per sec
Engagement TPS:
Assume Each user likes 10 posts, comments on 3 posts (on an average)
likes per sec = likes per sec = 100k likes per sec
comments per sec = likes per sec = 30k comments per sec
Feed Read QPS
Reads vastly outnumber writes. Consider 20 feed requests per user per day.
usersΓ20= feed requests/day
feed requests/sec = 200,000 feed reads/sec
Even if we consider Peak QPS = 3Γ to 5Γ of Average QPS => feed reads/sec which is huge.
Core Entities
Primary data models:
- User
- Post
- Like
- Comment
- Media / Asset
Each entity has distinct scaling and storage needs.
APIs
Post Creation
POST /posts
{
caption,
mediaUrl,
...
}
media upload should happen using presigned url, that client uploads directly to the blob storage
Client β Blob Storage (direct upload)
Avoids backend bottlenecks.
Engagement APIs
POST /likes/{postId}
POST /comments/{postId}
POST /follow/{userId}
POST /unfollow/{userId}
Timeline APIs
GET /timelines β Home feed
GET /timelines/{userId} β User profile feed
High Level Design
Instagramβs architecture separates responsibilities into independent services to support massive scale, high availability, and low-latency reads.
Think about different responsibilities:
- User Creation β identity heavy
- Follow/UnFollow Users - Graph Network
- Post Creation β Write-heavy
- Timeline / Feed β Read-heavy & latency critical
So let's create separate Microservices for different Responsibilities. Databases and other storage integrations are handled within each service boundary, allowing services to optimize their own storage and delivery strategies.
API Gateway
Role: Entry point for all client interactions.
Responsibilities
- Routes requests to internal services
- Handles authentication / authorization
- Applies rate limiting & throttling
- Centralizes logging & monitoring
The gateway prevents clients from directly accessing backend services and simplifies request management.
User Service
Owns: User identity and profile domain.
Responsibilities
- User creation & updates
- Profile retrieval
- Account metadata management
- User-related validations
Relational DB or strongly consistent KV store is the right choice for User Database.
This service is frequently accessed by nearly all system components, making it a foundational dependency.
Follow Service
Owns: Social graph relationships.
Responsibilities
- Follow / unfollow operations
- Fetch followers & followees
- Maintain user relationship edges
Wide-column DB (Cassandra) or Graph-optimized store is the best choice for Follow Database as connections create a Graph type network.
The follow graph is a core driver of feed generation and requires high scalability and efficient querying.
Post Creation Service
Owns: Post write workflow.
Responsibilities
- Validate post payloads
- Handle post creation requests
- Stores Post Images/Videos in Blob Storage, from where those can be pulled to CDN (content delivery network)
- Persist post metadata (caption, description, creator_id etc.) in some Metadata database
Post creation is a write-heavy operation and must scale independently from feed reads.
A NoSQL distributed DB with high write throughput is the best choice for Post Metadata because of the Continuous high-volume writes.
Timeline / Feed Service
Owns: Feed generation and retrieval.
Responsibilities
- Generate user home timeline
- Fetch relevant posts
- Rank & order feed items
- Serve low-latency responses
Feed retrieval is the most latency-sensitive and read-heavy path in the system.
Cache-first design (Redis / Memcached) + backing store is the right approach for Timeline / Feed Service.
Deep Dive - Post Creation:
Media Processing & Multi-Resolution Support
In media-heavy systems like Instagram, storing a single uploaded image/video is insufficient. Different devices, screen sizes, and network conditions require multiple optimized variants of the same media. This introduces additional components into the post creation workflow. The system never serves the raw original directly to most users.
Optimized pipeline:
Client Upload β Post Ingestion Service β Blob Storage (Original) β Media Processing Pipeline β Blob Storage (Variants) β CDN
Media Processing Service (Critical Component)
Responsibilities
- Generate multiple resolutions (thumbnail, medium, high)
- Resize & compress images
- Transcode videos into adaptive formats
- Optimize for bandwidth & device constraints
Why needed:
- Different devices require different sizes
- Reduces payload & latency
- Improves user experience
- Saves CDN bandwidth cost
Example variants:
- Thumbnail (low resolution)
- Feed version (compressed)
- High-resolution viewer version
Async Processing Trigger
When media is uploaded:
Post Ingestion Service β Emit MediaUploaded Event
Media Processing Service consumes events:
- Fetch original media
- Generate variants
- Store derived assets
Avoids blocking post creation latency.
Blob Storage
Blob storage now holds:
- Original media
- Processed variants
- Device-optimized formats
Consider these Variants:
/media/{postId}/original
/media/{postId}/thumbnail
/media/{postId}/medium
/media/{postId}/high
CDN Integration
CDN sits in front of blob storage. If Media not found in CDN then pulled from Blob Storage.
Client β CDN β Blob Storage
Benefits:
- Edge caching
- Low-latency global delivery
- Offloads origin traffic
CDN primarily serves processed variants.
Metadata Implications
Post Metadata Store now keeps:
- Media identifiers / keys
- URLs for variants
- Media type & attributes
Example:
{
postId,
media: {
thumbnail_url,
medium_url,
high_url
}
}
Timeline Generation
Timeline generation is one of the most critical read paths in Instagram.
A simple design often works at small scale but quickly collapses under real-world traffic.
Below are the intuitive naΓ―ve strategies for both home and user timelines.
User Timeline
Simpler naΓ―ve strategy:
- Query Post Store by
creator_id - Sort by timestamp
- Paginate results
Why This Works Better?
- Single-partition access pattern
- Predictable query cost
- No cross-user aggregation
- Naturally cacheable
User timeline is fundamentally easier than home timeline.
Home Timeline - NaΓ―ve Approaches
A straightforward approach:
- Fetch users followed by User A from Follow DB
- For each followed user:
- Query Post Store for recent posts
- Aggregate posts
- Sort by creation time
- Return top N results
Though this is a simple approach, it fails at Scale because of Explosive Fan-Out Reads.
If User A follows 1,000 users:
- 1 Follow query
- 1,000 Post queries
Latency & DB pressure explode. Slowest dependency determines response time. This Direct aggregation approach becomes extremely expensive.
Even a few slow queries β feed delays.
How should we shard post table?
If User A follows user B, C, D. So we need posts from user B, C, D
- shard post by post id? or user Id?
- we don't know the post id, we need posts for all these users. so lets shard based on user id
- but then we have to aggregate the data
Will caching posts solve the problem?
No, Even if posts are cached, the system still performs fan-out reads.
Alternative NaΓ―ve Strategy: Full Scan + Filter
Another bad but intuitive idea:
- Scan recent posts globally
- Filter posts from followed users
- Sort & limit
- Cache
Why This Is Worse?
This approach is inefficient because it requires scanning a massive global dataset for every feed request, leading to extreme read amplification and wasted computation. Caching does not fix the problem because the system must still perform the expensive global scan before knowing what to cache, and personalized feeds have low cache reuse.
Timeline Generation Deep Dive: Solution
Populate Feed Cache on Write (Fan-Out on Write)
Instead of generating timelines at read time, the system shifts work to the write path.
When a user creates a post:
- Identify followers from Follow Service
- For each follower:
- Insert
post_idinto their feed cache
- Insert
Result:
- Feed reads become simple lookups
- No expensive aggregation during read requests
- Predictable low-latency timeline retrieval
This converts read amplification into controlled write amplification, which is far more scalable for feed-heavy systems.
Problem: High-Follower Users & Write Amplification
With fan-out on write, post creation requires updating the feed cache of every follower.
For highly popular users:
- Millions of followers
- Millions of feed updates per post
- Increased latency on write path
- Risk of system overload
Synchronous updates become impractical.
Solution: Asynchronous Fan-Out
Shift feed updates to an async pipeline.
Post Ingestion Service:
- Persist post metadata
- Emit
PostCreatedevent in Post Topic
Background workers / consumers:
- Process follower fan-out
- Update feed caches independently
- Retry on failures
Benefits:
- Write latency remains low
- Workload spreads across workers
- Prevents traffic spikes from blocking requests
- Improves system resilience
Async processing decouples user-facing latency from heavy fan-out computation.
Hybrid Approach
Famous users: followers feed generation on read
Active users: active this month -> populate Feed cache on post write
Live User: populate feed cache, send live update via websocket (discussed in next section)
Passive user: can ignore generating feed cache (not active over a month)
Inactive User: No need to generate feed
How do you inform Live users:
Live users require real-time awareness of new posts without waiting for feed refresh.
Flow:
Post created β Post Ingestion Service emits event
Post Workers process fan-out / feed updates
Timeline Cache updated
Notification event pushed to WebSocket Manager
Websocket Manager has mapping of Websocket Handlers -> connection (stored in cache)
WebSocket Handler delivers update to active connections
Result:
β Connected users instantly see new posts
β No polling / refresh required
β Low latency user experience

Hot and Cold Posts:
Hot and cold data separation is essential in Instagram due to extreme read skew.
Most content access concentrates on:
β Recent posts β Popular posts β Frequently viewed media
Older content becomes rarely accessed.
New Post β Hot Tier (cache / fast DB)
β Gradual Demotion
β Cold Tier (cheap storage)
Promotion/demotion driven by:
-
Access frequency
-
Recency
-
Engagement signals
We are having one Archival Service that fetches old/cold posts from post database and puts them in Archival database.
