WhatsApp System Design
WhatsApp is a globally distributed, real-time messaging platform used by billions of users across hundreds of countries. It supports instant messaging, group chats, multimedia sharing, encrypted storage, and presence awareness, all while operating under strict performance and reliability constraints. This design document walks through the architecture and design considerations for building a WhatsApp-like messaging system without diving into encryption internals (by choice), focusing instead on messaging workflows, data flows, scalability primitives, offline synchronization, fault tolerance, client-server interactions, and non-functional trade-offs.
Functional Requirements
A WhatsApp-like system must support the following features:
1. One-to-One Messaging
- Users should be able to send text messages to one another in real time.
- Messages must appear instantly when both users are online.
- If the recipient device is offline, the message must be queued and delivered later.
2. Group Messaging
- Users can create group chats with multiple participants.
- Sending a message in a group must fan-out to multiple clients.
- Group membership must be consistent and synchronized across participants.
3. Real-Time Delivery
- Message delivery should feel instantaneous.
4. Offline Message Retrieval
- Messages sent while offline must be queued on the server.
- Once the client reconnects, queued messages are delivered in batch.
- Delivery order must be preserved per conversation.
5. Ordered Delivery
- Delivery order must be preserved per conversation.
5. Media Messaging
- Support sharing images, videos, documents, voice notes.
6. Multi-Device Synchronization
- WhatsApp historically tied accounts to a single mobile device, but modern WhatsApp supports multiple linked clients (e.g., Web + Mobile).
- Messages must sync across devices without duplication or reordering.
Lower Priority (Optional Enhancements)
Though not critical use cases, we can have other low priority functionalities:
- Online / last seen presence
- Sent/Delivered/Seen ticks
Sent= message reached WhatsApp serverDelivered= message reached recipient deviceSeen= read by user on device
- Contact availability/status
These features enhance usability but are not required for a minimal viable messaging system.
Out of Scope for This Discussion
We will explicitly not cover:
- End-to-end encryption handshake protocols
- Voice and video calling
- Message deletion (for me / delete for everyone)
- Message retention policies in detail
- Multi-device cryptographic session sync
- Typing indicators
Non-Functional Requirements (Verbose Explanation)
Building a messaging system at WhatsApp scale involves complex non-functional constraints:
1. Low Latency
- Users expect messages to appear nearly instantly.
- The delivery target is typically < 500 ms end-to-end under normal network conditions.
- Latency includes client uplink, server routing, and client downlink.
2. Guaranteed Delivery
- Once the sender receives the
'sent to server'acknowledgment, the system should guarantee delivery eventually, unless:- The user is deleted
- The message expires (e.g., undelivered > 30 days)
WhatsApp’s server temporarily stores undelivered messages for a limited window.
3. Enormous Scale
At global scale, capacity planning looks like:
- If WhatsApp has 2B users
- And each sends 100 messages/day
Then total traffic ≈ 200B messages/day
Peak throughput may exceed millions of messages per second during busy hours.
4. Fault Tolerance
Failures are expected:
- Device failures
- Network partitions
- Regional outages
- Datacenter failures
System must continue operating without global outage.
5. Minimal Message Storage
WhatsApp intentionally does not store delivered messages on its servers.
Implications:
- Server-side state is minimized
- Storage cost is reduced dramatically
- Privacy user expectations are reinforced
6. User-Centric Storage Model
Messages are stored:
- On the device indefinitely (until deleted)
- In cloud backups (optional)
- On server only until delivered or expired
7. Efficient Network Usage
Many WhatsApp users are on:
- Limited data plans
- 2G/3G networks
- Unreliable connections
8. Highly Available Global Infrastructure
Data centers must be:
- Distributed globally
- Geographically redundant
- Load balanced intelligently
Message routing should minimize cross-continent RTTs.
Entities
User
Represents a registered WhatsApp user.
id— unique user identifiermobileNo— phone number used for identification/logincreated_at— account creation timestamp
Client
Represents a device/session linked to a user (e.g., phone, web, desktop).
userId— reference toUserclientId— unique identifier for the client deviceadded_at— when the device was linked
Chat
Represents a conversation (1:1 or group).
id— chat identifiermetadata— optional settings (e.g., group name, icon)users[]— participants in the chatcreated_at— chat creation timestamp
Message
Represents a single chat message.
id— message identifiersenderUserId— reference toUserchatId— reference toChatcontent— text bodyasset_url(s)— optional media attachmentstimestamp— when the message was sent
High Level Design

Polling, Long Polling, SSE, WebSockets
stateful, persistent connections to send and push messages instantly.
Scale
1 billion user
300 million daily active users
100 msgs per day per user (on an average) -> each msg 100 bytes => 300 million * 100 * 100 bytes = 300 * 10^6 * 10^4 bytes = 3 * 10^12 bytes = 3 TB
3 TB storage per day = 3 * 30 TB per month = 90 TB per month
Deep dive

How websocket handlers talk to each other?

User A is connected to ws handler 1 User B is connected to ws handler 2
User A sends message to user B
message is added to topic A-B
User B sends message to a user A
message is added to topic B-A
Ws handler 1 is subscribed to topic B-A ( or all channels X-A) to receive message for A Ws handler 2 is subscribed to topic A-B ( or all channels X-B) to receive message for B

User A is connected to ws handler 1 User B is connected to ws handler 2
User A sends message to user B
message is added to channel A-B
User B sends message to a user A
message is added to channel B-A
Ws handler 1 is subscribed to channel B-A ( or all channels X-A) to receive message for A Ws handler 2 is subscribed to channel A-B ( or all channels X-B) to receive message for B

Message sent/Delivered/Read status - ticks
User A sends msg to User B
- A sends msg
- server sends ack (msg received in server)
- msg sent to User B
- User B sends ack - received msg
- server sends ack - sender received
- User B sends ack - read msg
- server sends ack - sender read
User A sends msg to User B
- A sends msg (write)
- server sends ack (msg received in server) (read)
- msg sent to User B (read)
- User B sends ack - received msg (write)
- server sends ack - sender received (read)
- User B sends ack - read msg (write)
- server sends ack - sender read (read)
300 million DAU - 100 messages per day
300 million * 100 * 3 read requests/day = 90 billion read requests/day = 90 * 10^9/10^5 read requests/sec = 9 * 10^5 = 90k read requests/sec
300 million * 100 * 3 write requests/day = 90 billion write requests/day = 90 * 10^9/10^5 write requests/sec = 9 * 10^5 = 90k write requests/sec