YourPlatform

WhatsApp System Design

WhatsApp is a globally distributed, real-time messaging platform used by billions of users across hundreds of countries. It supports instant messaging, group chats, multimedia sharing, encrypted storage, and presence awareness, all while operating under strict performance and reliability constraints. This design document walks through the architecture and design considerations for building a WhatsApp-like messaging system without diving into encryption internals (by choice), focusing instead on messaging workflows, data flows, scalability primitives, offline synchronization, fault tolerance, client-server interactions, and non-functional trade-offs.


Functional Requirements

A WhatsApp-like system must support the following features:

1. One-to-One Messaging

  • Users should be able to send text messages to one another in real time.
  • Messages must appear instantly when both users are online.
  • If the recipient device is offline, the message must be queued and delivered later.

2. Group Messaging

  • Users can create group chats with multiple participants.
  • Sending a message in a group must fan-out to multiple clients.
  • Group membership must be consistent and synchronized across participants.

3. Real-Time Delivery

  • Message delivery should feel instantaneous.

4. Offline Message Retrieval

  • Messages sent while offline must be queued on the server.
  • Once the client reconnects, queued messages are delivered in batch.
  • Delivery order must be preserved per conversation.

5. Ordered Delivery

  • Delivery order must be preserved per conversation.

5. Media Messaging

  • Support sharing images, videos, documents, voice notes.

6. Multi-Device Synchronization

  • WhatsApp historically tied accounts to a single mobile device, but modern WhatsApp supports multiple linked clients (e.g., Web + Mobile).
  • Messages must sync across devices without duplication or reordering.

Lower Priority (Optional Enhancements)

Though not critical use cases, we can have other low priority functionalities:

  • Online / last seen presence
  • Sent/Delivered/Seen ticks
    • Sent = message reached WhatsApp server
    • Delivered = message reached recipient device
    • Seen = read by user on device
  • Contact availability/status

These features enhance usability but are not required for a minimal viable messaging system.

Out of Scope for This Discussion

We will explicitly not cover:

  • End-to-end encryption handshake protocols
  • Voice and video calling
  • Message deletion (for me / delete for everyone)
  • Message retention policies in detail
  • Multi-device cryptographic session sync
  • Typing indicators

Non-Functional Requirements (Verbose Explanation)

Building a messaging system at WhatsApp scale involves complex non-functional constraints:

1. Low Latency

  • Users expect messages to appear nearly instantly.
  • The delivery target is typically < 500 ms end-to-end under normal network conditions.
  • Latency includes client uplink, server routing, and client downlink.

2. Guaranteed Delivery

  • Once the sender receives the 'sent to server' acknowledgment, the system should guarantee delivery eventually, unless:
    • The user is deleted
    • The message expires (e.g., undelivered > 30 days)

WhatsApp’s server temporarily stores undelivered messages for a limited window.

3. Enormous Scale

At global scale, capacity planning looks like:

  • If WhatsApp has 2B users
  • And each sends 100 messages/day

Then total traffic ≈ 200B messages/day

Peak throughput may exceed millions of messages per second during busy hours.

4. Fault Tolerance

Failures are expected:

  • Device failures
  • Network partitions
  • Regional outages
  • Datacenter failures

System must continue operating without global outage.

5. Minimal Message Storage

WhatsApp intentionally does not store delivered messages on its servers.

Implications:

  • Server-side state is minimized
  • Storage cost is reduced dramatically
  • Privacy user expectations are reinforced

6. User-Centric Storage Model

Messages are stored:

  • On the device indefinitely (until deleted)
  • In cloud backups (optional)
  • On server only until delivered or expired

7. Efficient Network Usage

Many WhatsApp users are on:

  • Limited data plans
  • 2G/3G networks
  • Unreliable connections

8. Highly Available Global Infrastructure

Data centers must be:

  • Distributed globally
  • Geographically redundant
  • Load balanced intelligently

Message routing should minimize cross-continent RTTs.


Entities

User

Represents a registered WhatsApp user.

  • id — unique user identifier
  • mobileNo — phone number used for identification/login
  • created_at — account creation timestamp

Client

Represents a device/session linked to a user (e.g., phone, web, desktop).

  • userId — reference to User
  • clientId — unique identifier for the client device
  • added_at — when the device was linked

Chat

Represents a conversation (1:1 or group).

  • id — chat identifier
  • metadata — optional settings (e.g., group name, icon)
  • users[] — participants in the chat
  • created_at — chat creation timestamp

Message

Represents a single chat message.

  • id — message identifier
  • senderUserId — reference to User
  • chatId — reference to Chat
  • content — text body
  • asset_url(s) — optional media attachments
  • timestamp — when the message was sent

High Level Design

image

Polling, Long Polling, SSE, WebSockets

image stateful, persistent connections to send and push messages instantly.

Scale

1 billion user

300 million daily active users

100 msgs per day per user (on an average) -> each msg 100 bytes => 300 million * 100 * 100 bytes = 300 * 10^6 * 10^4 bytes = 3 * 10^12 bytes = 3 TB

3 TB storage per day = 3 * 30 TB per month = 90 TB per month

Deep dive

image image image

How websocket handlers talk to each other?

image

User A is connected to ws handler 1 User B is connected to ws handler 2

User A sends message to user B

message is added to topic A-B

User B sends message to a user A

message is added to topic B-A

Ws handler 1 is subscribed to topic B-A ( or all channels X-A) to receive message for A Ws handler 2 is subscribed to topic A-B ( or all channels X-B) to receive message for B

image

User A is connected to ws handler 1 User B is connected to ws handler 2

User A sends message to user B

message is added to channel A-B

User B sends message to a user A

message is added to channel B-A

Ws handler 1 is subscribed to channel B-A ( or all channels X-A) to receive message for A Ws handler 2 is subscribed to channel A-B ( or all channels X-B) to receive message for B

image

Message sent/Delivered/Read status - ticks

User A sends msg to User B

  • A sends msg
  • server sends ack (msg received in server)
  • msg sent to User B
  • User B sends ack - received msg
  • server sends ack - sender received
  • User B sends ack - read msg
  • server sends ack - sender read

User A sends msg to User B

  • A sends msg (write)
  • server sends ack (msg received in server) (read)
  • msg sent to User B (read)
  • User B sends ack - received msg (write)
  • server sends ack - sender received (read)
  • User B sends ack - read msg (write)
  • server sends ack - sender read (read)

300 million DAU - 100 messages per day

300 million * 100 * 3 read requests/day = 90 billion read requests/day = 90 * 10^9/10^5 read requests/sec = 9 * 10^5 = 90k read requests/sec

300 million * 100 * 3 write requests/day = 90 billion write requests/day = 90 * 10^9/10^5 write requests/sec = 9 * 10^5 = 90k write requests/sec

That was a free preview lesson.