πŸš€ πŸš€ Launch Offer β€” Courses starting at β‚Ή1499 (Limited Time)
CortexCookie

Uber System Design

Designing a large-scale ride-hailing platform such as Uber or Ola requires building a highly distributed, real-time system that seamlessly connects millions of riders and drivers across diverse geographic regions. These platforms function as dynamic marketplaces, continuously processing live location updates, demand spikes, pricing decisions, and matching operations β€” all under strict latency, availability, and reliability constraints.

At its core, such a system must efficiently solve challenges in real-time location tracking, low-latency proximity search, elastic scaling, fault tolerance, and consistency management, while operating reliably under unpredictable traffic patterns and regional surges.

Functional Requirements

I have categorised the Functional Requirements from the users' point of view. There are two types of users in Uber/Ola like systems - Rider and Driver.

Core user flows:

  • Rider searches for a cab using source & destination
  • System shows ETA & fare estimates for cab types (Go, Moto, Sedan, etc.)
  • Rider books a ride
  • System finds nearby drivers
  • Ride requests sent to drivers
  • Driver accepts / denies request
  • Driver navigates to pickup location
  • Rider verification using OTP
  • Trip execution β†’ Drop-off
  • Rider makes payment

Out of Scope

  • Profile creation
  • Ratings & reviews
  • Notification service internals
  • Monitoring / analytics

Non-Functional Requirements

Availability vs Consistency

Different subsystems require different guarantees:

  • Highly Available β†’ ETA, fare estimates
  • Highly Consistent β†’ Driver matching (avoid double booking)

Latency Sensitivity

  • Location updates must be near real-time
  • Matching decisions must be fast
  • UI interactions should feel instantaneous

Durability

  • Ride lifecycle data must persist
  • Payments & trip records cannot be lost

Scalability

  • Must handle peak traffic (rush hours)
  • Rapid geographic expansion

Core Entities

  • Rider
  • Driver
  • Ride
  • Estimate
  • Location
  • Cab

APIs

Fare Estimate

GET /estimates?src_lat={}&src_long={}&dest_lat={}&dest_long={}

   Output β†’ estimateId, fare, ETA (per cab type)

Book a Ride

POST /rides

{
  "estimateId": "...",
  "cabType": "sedan"
}

Location Update

POST /location/update

{
  "latitude": ...,
  "longitude": ...
}

Driver Accepts/Denies Ride Request


PATCH /rides/driver/accept
{
  "rideId": "...",
  "accept": true
}

Driver Updates The Ride For Pickup and Dropoff

PATCH /rides/driver/update

{
  "rideId": "...",
  "update": "pickedUp"
}

Make Payment

POST /payments
{
  "rideId": "...",
  "paymentInfo": {}
}

Scale Estimation

Let's assume the below numbers for riders, drivers and rides:

100β€…β€Šmillion100 \; million total riders

10β€…β€Šmillion10 \; million daily active riders

1 million drivers

Ride requests:

20β€…β€Šmillion20 \; million rides/day β†’ 20Γ—106105\frac{20 \times 10^6}{10^5} rides per second = ~200 TPS (assuming there are 10510^5 seconds in a day)

Peak TPS (~3Γ—) β†’ ~600 TPS

Driver location updates:

Assume 1β€…β€Šmillion1 \; million drivers update their locations every 5 seconds. β†’ 1065\frac{10^6}{5} request per second = 200k200k TPS (write-heavy workload)

High Level Design

Core Services Overview

Let's not focus on the huge scale and high TPS requirement first and come up with a very High Level Design.

  • API Gateway – Entry point handling authentication, authorization, routing, rate limiting, and request orchestration.

  • Estimate Service – Accepts Source and Destination location from riders and Computes fare estimates and ETA using distance, traffic data, and pricing logic.

  • Ride Service – Accepts Ride requests from riders, manages ride lifecycle, persists ride state, and coordinates downstream workflows. Ride Service calls Driver Matching Service to find a suitable driver for the ride. Also when ride is complete it invokes Payment Service to accept payments.

  • Driver Matching Service – Finds suitable nearby drivers and controls ride assignment with consistency guarantees. Driver Matching is a complex task so introducing a separate Service makes sense. It Separates the ride creation/updation task from Driver Matching.

  • Location Service – Ingests high-frequency driver location updates and maintains latest coordinates.

  • Payment Service – Processes ride payments, charges users, and records transaction outcomes.

image

There is something wrong with the above image. We need an websocket layer that we will add in the next section. Before that let's go through the overall flow.

Basic Uber / Ola Ride Flow

1. Rider Requests Estimate

  • Rider enters source & destination
  • Estimate Service calculates ETA & fare using distance, traffic data, and pricing logic.

2. Rider Books Ride

  • Rider selects cab type
  • Ride Service creates ride with status = REQUESTED
  • Rider Service requests Driver Matching Service to find driver.

3. Driver Discovery

  • Driver Matching Service searches nearby drivers

4. Driver Allocation

  • Driver Matching Service sends requests to nearby drivers
  • First accepting driver gets the ride
  • Driver Matching Service sends driver details to Ride Service

5. Ride Confirmation

  • Ride Service updates ride β†’ ACCEPTED with driver id
  • Rider & driver receive ride details

6. Driver Navigation

  • Driver moves to pickup location
  • Real-time location updates via WebSocket

7. Trip Start (OTP / Verification)

  • Driver verifies rider
  • Ride status β†’ STARTED

8. Trip Completion

  • Driver reaches destination and completes the trip
  • Ride status β†’ FINISHED

9. Payment Processing

  • After ride completion Ride Service invokes Payment Service to accept payment
  • Rider completes payment via Payment Service
  • Payment status recorded

10. Post-Ride Updates

  • Driver marked available
  • Trip stored for history & analytics

Driver Matching Logic

Let's talk about a naive Driver Matching Logic to start with.

Rider sends source and destination location. Any location has latitude and longitude (here shown as x and y in the world map)

Feature Extraction

How do we find nearby drivers of the rider. Let's say the service will find drivers within 2 km range of the riders source location. Database query scans drivers within coordinate range

Example query pattern:

SELECT driver_id, latitude, longitude FROM location WHERE latitude BETWEEN (user_lat - radius) AND (user_lat + radius) AND longitude BETWEEN (user_long - radius) AND (user_long + radius);

Feature Extraction

With Uber Scale this will definitely fail because:

  • Traditional databases are not optimized for multi-dimensional geo queries. These databases typically utilize one index efficiently per query. Combined latitude + longitude range filtering becomes costly. Let's say you filter by latitude and longitude one after another, even then queries return large intermediate result sets. Result β†’ Inefficient execution plans and rising latency
  • Many drivers fall inside the bounding box. The DB must scan and post-filter large portions of the table. Query latency increases non-linearly with scale
  • Range queries produce a square region, not an actual radius. Drivers near corners may be far away. Extra distance calculations required to solve this which is a significant wasted computation.

We will come up with better approaches in next sections.

Why we need websocket connection?

Ride-hailing systems involve continuous real-time interactions such as:

  • Frequent driver location updates
  • Dispatching ride requests to drivers
  • Instant ride lifecycle updates (accepted, arrived, started, finished)

These workflows demand low-latency communication and immediate message delivery.
Traditional polling mechanisms introduce unnecessary overhead, while external notification systems may add delays.

Why Not Use SSE (Server-Sent Events)?

Although SSE supports server-to-client streaming, it has critical limitations:

  • Unidirectional communication (server β†’ client only)
  • Requires separate APIs for client-to-server events
  • Inconsistent behavior across mobile environments
  • Less suitable for highly interactive systems

Ride-hailing platforms require both parties to exchange events dynamically, which SSE does not handle efficiently. So we can utilize a bidirectional connection like websocket

WebSocket Infrastructure – Enables low-latency bidirectional communication for live updates and dispatch events.

  • WebSocket Manager – Maintains active client connections registry and routes real-time events to the correct sockets.

  • WebSocket Handler – Owns individual WebSocket connections and handles incoming/outgoing messages for a connected client.

image

WebSocket Interaction Flow

Driver Matching

  1. Driver Matching Service identifies candidate drivers for a ride.

  2. It sends a dispatch event to the WebSocket Manager targeting specific driver IDs.

  3. WebSocket Manager looks up active connections (driverId β†’ socket).

  4. Manager forwards the event to the appropriate WebSocket Handlers.

  5. WebSocket Handlers push the ride request to the driver’s live WebSocket connection.

  6. Driver response (accept / deny) flows back through Handlers β†’ Manager β†’ Driver Matching Service

Deep Dive: Handling frequent driver location updates

Assuming there are 1 million daily active drivers, and they are sending location update every 5 seconds:

10610^6 location updates in every 5 seconds

= 1065\frac{10^6}{5} location updates every second = 200k200k TPS (which is huge)

Accepting Updates β€” HTTP vs WebSocket

HTTP (REST / Polling)

  • Each update requires full request–response overhead
  • Repeated headers & protocol costs
  • Higher latency under heavy load
  • Inefficient for small, frequent messages

Suitable for infrequent, stateless communication.

WebSocket

  • Persistent connection (no repeated handshake)
  • Minimal framing overhead per message
  • Very low latency delivery
  • Efficient for continuous updates

Ideal for high-frequency, real-time streaming workloads like driver locations.

High TPS Storage Strategy

Driver updates generate extremely high write throughput - 200k TPS.

  • SQL database supports max 10k TPS (single node)

  • dynamoDB supports upto 100k +

  • redis supports upto 1 million TPS

  • can we do batch updates to database to reduce write overhead?

  • it introduces delay in driver location updates -> driver matching might not be appropriate

NaΓ―ve approach:

Our Naive approach was writing every update to primary database which leads to Bottlenecks & latency spikes. How do we make things better here?

Consider the key observation below.

Key observation:

  • Matching logic needs latest location only for drivers, not full history

So why don't we store latest location in cache and historical location details in DB?

But can we store 1 million drivers' latest location in Cache?

How much storage does it need?

What Data Are We Storing Per Driver?

For latest location only, you typically store:

  • driver_id β†’ 8 bytes

  • latitude β†’ 8 bytes

  • longitude β†’ 8 bytes

No history, no metadata.

Approx: β‰ˆ 24 bytes per driver

Even if we consider 50-100 byter per driver:

For 1 million drivers latest location we need β‰ˆ 10610^6 drivers Γ— 100 bytes = 10810^8 bytes = 100 MB (which can easily fit into cache)

Recommended Pattern

Fast Path (Hot Data)

  • Store latest driver coordinates in Redis / in-memory cache
  • Supports massive TPS
  • Optimized for geo queries

Durable Path (Cold Data)

  • Persist location events asynchronously (Cassandra / DynamoDB)
  • Used for analytics & recovery
  • Does not impact real-time operations

image

Resulting Benefits

βœ” Handles high TPS efficiently
βœ” Keeps matching latency low
βœ” Avoids database overload
βœ” Improves system resilience

Deep Dive: Driver Matching Logic

Earlier we saw why naΓ―ve database range queries fail at scale.
Once we move latest driver locations to Redis (geo-indexed cache), we can use spatial partitioning techniques to make proximity search efficient.

Two common ideas:

  • Geohashing
  • Quad Trees

Geohashing

Geohashing is a method for encoding geographic coordinates (latitude and longitude) into a compact, hierarchical string. It enables efficient spatial indexing, proximity searches, and location-based queries.

Instead of treating the world as continuous coordinates, geohashing:

  1. Divides the world into grid cells
  2. Encodes each cell in binary
  3. Converts binary into Base32 characters
  4. Produces a short hash string

Grid Subdivision

Geohashing works by recursively splitting space:

  • Level 1: The world is divided into four large cells
    Binary identifiers: 00, 01, 10, 11

  • Level 2+: Each cell is subdivided again into four smaller cells

Every subdivision adds more precision.

Binary Encoding

Each split alternates between:

  • Longitude division
  • Latitude division

This generates a sequence of bits (like Binary sequence β†’ 011010011...) describing the position of a point within the grid.

Feature Extraction

Binary bits are grouped into chunks of 5 and then converted into base32 characters:

1001 10110 01001 10001 10000 10111 β†’ 9q9jhr (Base 32) Hunan Friendly GeoHash

Properties:

  • Prefix similarity means spatial proximity
  • Shorter hash means larger region
  • Longer hash means smaller region

How it helps driver matching:

  1. Driver location stored with geohash
  2. Rider request arrives with source location
  3. Compute rider geohash
  4. Fetch drivers from:
    • Same hash cell
    • Adjacent hash cells
  5. Sort by exact distance
  6. Pick closest drivers

Why this works:

βœ” Limits search space
βœ” Avoids full table scans
βœ” Constant-time lookup in cache

Redis directly supports this via:

  • GEOADD
  • GEOSEARCH / GEORADIUS

Quad Tree Indexing

Quad Tree recursively partitions space based on data density.

Key intuition:

  • Entire map = one big cell
  • If too many points inside a cell β†’ split into 4 subcells
  • Repeat until manageable density

Start with the root region (entire space)

Feature Extraction
  1. Points present: A, B, C, D, E, F

  2. Too many points β†’ split into 4 quadrants

For simplicity in this diagram above consider more than one point means too many points. After the split:

Top-left quadrant β†’ contains A βœ” Only one point β†’ no split

Bottom-left quadrant β†’ contains B βœ” Only one point β†’ no split

Top-right quadrant β†’ contains C and D ❌ More than one point β†’ must split

Bottom-right quadrant β†’ contains E and F ❌ More than one point β†’ must split

  1. Splitting the C/D region

That quadrant is subdivided again. Now C and D fall into different child cells

βœ” Each child now has ≀ 1 point β†’ stop splitting

Why split happened: local density > capacity.

  1. Splitting the E/F region

Same logic. too many points (more than one here for simplicity) in E/F region β†’ subdivision required.

After split: Both E and F falls in the north left division. Hence in that North Left division we have more than one points.

Hence again subdivision required. Points separated into smaller cells β†’ stop

Quadtree avoids unnecessary subdivision.

This is the core efficiency advantage.

Big Insight: Adaptive Resolution

Notice:

  • Sparse areas β†’ large cells
  • Dense areas β†’ small cells

This mirrors real systems:

Downtown β†’ deeply subdivided

Suburbs β†’ shallow tree

Why This Matters in Systems (Uber / Maps / Games)

Splits occur only where needed, improving:

βœ” Query speed βœ” Memory usage βœ” Spatial locality βœ” Scalability

Instead of wasting computation on empty space.

Deep Dive: Updated Driver Matching Flow

Drivers continuously stream their latest GPS coordinates to the backend.

Driver App β†’ Location Service β†’ Cache (Redis)

GEOADD active_drivers <lon> <lat> <driver_id>

List of drivers fetched from chunks which are adjacent to source location chunk.

Key Properties

  • Always stores latest position
  • Overwrites previous location
  • Optimized for high-frequency updates
  • Redis acts as real-time location cache

When a rider requests a trip, the system performs a proximity search within Redis.

Step 1: Ride Request

Rider sends source and destination location Ride Service β†’ Driver Matching Service

src_lat, src_lon, dst_lat, dst_lon

Step 2: Proximity Search in Redis

GEOSEARCH active_drivers
  FROMLONLAT <src_lon> <src_lat>
  BYRADIUS <r> km
  WITHDIST
  COUNT N
  ASC

Redis returns:

  • Nearby drivers
  • Sorted by distance
  • Low-latency lookup

Step 3: Candidate Filtering

Matching service may apply:

βœ” Driver availability checks βœ” ETA constraints βœ” Business rules (rating, surge, etc.)

Step 4: Driver Dispatch

Driver Matching Service requests drivers to accept ride (using websocket connection).

Step 5: Acceptance Handling

If driver accepts the ride then upon receiving response from driver, Driver Matching Service updates Ride Service with matched Driver Id.

Ride Service assigns driver_id to ride entry and Ride status changed to ACCEPTED.

Ride Service notifies rider that we have found a driver.

image

What happens in case of redis failure?

Redis is distributed and failure is very rare. But even for that case we should have robust monitoring and failover straegies.

Also system recovers quickly after failure as driver sends the location update every 5 seconds.

Driver Matching Consistency (Preventing Double Booking)

A ride-sharing system must guarantee matching correctness under extreme concurrency.

Two critical failure scenarios exist:

  1. One ride β†’ multiple drivers
  2. One driver β†’ multiple rides

Both must be prevented.

Problem 1: One Ride, Multiple Drivers

We must ensure:

βœ” Only one driver can accept a ride
βœ” No duplicate assignments
βœ” No race conditions across servers

Solution: Partition by Ride ID

The Driver Matching Service is sharded using consistent hashing on ride_id.

Driver Matching Service sends request to drivers sequentially

  • One denies, sends request to the next driver
  • So no two drivers will be booked at the same time for one ride

Implication:

βœ” A single service instance owns a ride
βœ” No cross-instance coordination needed
βœ” No duplicate dispatch decisions

Flow:

  1. Ride request arrives
  2. Hash(ride_id) β†’ matching instance
  3. Instance sends requests to drivers

Problem 2: One Driver, Multiple Rides (Harder Problem)

Even with ride-based sharding:

βœ” Consider Two different rides went to two different instances
βœ” Both instances query Redis
βœ” Same driver may appear in both results

Without protection:

❌ Driver could receive multiple requests
❌ Driver accepts one β†’ others become invalid
❌ Risk of double booking

Why Instance-Level Locks Fail

Locking inside a service instance:

❌ Does not work across instances
❌ Cannot prevent distributed races

Multiple instances may still target the same driver.

Why DB-Based Locks Are Dangerous

Updating driver state in database:

❌ Slow (high write latency)
❌ Requires cleanup logic
❌ Risk of stale locks on crashes

Example failure:

Driver state is OUTSTANDING_REQUEST and Service crashes. Driver is stuck in that state.

Requires cron / repair jobs to cleanup which is undesirable overhead.

Correct Solution: Distributed Lock in Redis (With TTL)

Redis provides a fast, atomic locking primitive.

When dispatching a request:

SET driver_lock:<driver_id> ride_id NX EX 10

βœ” NX β†’ Only acquire if not locked βœ” EX 10 β†’ Auto-expires after 10 seconds

For each candidate driver, Driver Matching Service attempt Redis lock.

  • If lock fails β†’ skip driver

  • If lock succeeds β†’ send request

  • Only one ride globally can target that driver.

Why TTL Is Essential?

Drivers may ignore requests or services may crash.

TTL guarantees:

  • No permanent lock
  • Automatic recovery
  • No cleanup jobs required

If driver accepts:

  • Update driver state to IN_RIDE
  • Remove lock / let expire (as driver state now makes sure that we dont send request to that driver)

If driver rejects / timeout:

  • Delete lock immediately (optional optimization)

How frequently drivers need to send location updates?

  • In client we can have adaptive strategy (on device sensor and client algorithm)
  • If driver is not moving, dont send updates
  • If moving slowly send less frequent updates

Scaling the Ride Service

The Ride Service operates on the critical path of every trip request and must support extreme concurrency, burst traffic, and geographically skewed demand.

The Ride Service should be designed as stateless.

Implications:

  • Any instance can process any request
  • Horizontal scaling becomes straightforward
  • No session affinity or sticky routing needed
  • Failures are easier to recover from

All persistent state is stored externally:

  • Ride metadata β†’ Database
  • Driver availability β†’ Cache / Redis
  • Event history β†’ Durable log (Kafka / DynamoDB / Cassandra)

Queue-Based Load Regulation

Instead of synchronously invoking downstream services we can introduce queue in between.

Ride Service β†’ Queue β†’ Driver Matching Service

Introducing a queue provides:

  • Burst absorption during traffic spikes
  • Backpressure protection
  • Controlled processing rate
  • Failure isolation

Without queuing:

  • ❌ Sudden demand spikes can overload matching systems
  • ❌ Cascading failures become likely

With queuing, System remains stable under peak load

image

Large-scale ride-sharing systems must handle geographically skewed traffic while maintaining low latency and high availability. Geo sharding and multi–data center deployment are core strategies to achieve this.

Geo Sharding

Geo sharding partitions the system based on geographic regions such as:

βœ” Country
βœ” City
βœ” Region / grid cell

Each shard is responsible for rides, drivers, and users within its region.

Why Geo Sharding?

βœ” Requests are processed close to users
βœ” Lower network latency
βœ” Independent scaling per region
βœ” Failure isolation (smaller blast radius)

Most ride operations are local, making geographic partitioning a natural fit. GeoSharding handles scale and locality.

Multi–Data Center Deployment

Each geographic shard is deployed across multiple data centers within the region.

Typical setup:

βœ” Active–active data centers
βœ” Load balancer routes traffic to nearest DC
βœ” Shared regional data stores

Benefits

βœ” High availability
βœ” Disaster recovery
βœ” Zero or minimal downtime
βœ” Resilience to DC-level failures

Multi–data center support handles availability and resilience.

Data Consistency Model

βœ” Strong consistency within a region
βœ” Eventual consistency across regions

Cross-region replication is used for:

βœ” Analytics
βœ” Reporting
βœ” Machine learning models

That was a free preview lesson.