Uber System Design

Designing a large-scale ride-hailing platform such as Uber or Ola requires building a highly distributed, real-time system that seamlessly connects millions of riders and drivers across diverse geographic regions. These platforms function as dynamic marketplaces, continuously processing live location updates, demand spikes, pricing decisions, and matching operations — all under strict latency, availability, and reliability constraints.

At its core, such a system must efficiently solve challenges in real-time location tracking, low-latency proximity search, elastic scaling, fault tolerance, and consistency management, while operating reliably under unpredictable traffic patterns and regional surges.

Functional Requirements

I have categorised the Functional Requirements from the users' point of view. There are two types of users in Uber/Ola like systems - Rider and Driver.

Core user flows:

Rider searches for a cab using source & destination
System shows ETA & fare estimates for cab types (Go, Moto, Sedan, etc.)
Rider books a ride
System finds nearby drivers
Ride requests sent to drivers
Driver accepts / denies request
Driver navigates to pickup location
Rider verification using OTP
Trip execution → Drop-off
Rider makes payment

Out of Scope

Profile creation
Ratings & reviews
Notification service internals
Monitoring / analytics

Non-Functional Requirements

Availability vs Consistency

Different subsystems require different guarantees:

Highly Available → ETA, fare estimates
Highly Consistent → Driver matching (avoid double booking)

Latency Sensitivity

Location updates must be near real-time
Matching decisions must be fast
UI interactions should feel instantaneous

Durability

Ride lifecycle data must persist
Payments & trip records cannot be lost

Scalability

Must handle peak traffic (rush hours)
Rapid geographic expansion

Core Entities

Rider
Driver
Ride
Estimate
Location
Cab

APIs

Fare Estimate

GET /estimates?src_lat={}&src_long={}&dest_lat={}&dest_long={}

   Output → estimateId, fare, ETA (per cab type)

Book a Ride

POST /rides

{
  "estimateId": "...",
  "cabType": "sedan"
}

Location Update

POST /location/update

{
  "latitude": ...,
  "longitude": ...
}

Driver Accepts/Denies Ride Request


PATCH /rides/driver/accept
{
  "rideId": "...",
  "accept": true
}

Driver Updates The Ride For Pickup and Dropoff

PATCH /rides/driver/update

{
  "rideId": "...",
  "update": "pickedUp"
}

Make Payment

POST /payments
{
  "rideId": "...",
  "paymentInfo": {}
}

Scale Estimation

Let's assume the below numbers for riders, drivers and rides:

$100 \; million$ total riders

$10 \; million$ daily active riders

1 million drivers

Ride requests:

$20 \; million$ rides/day → $\frac{20 \times 10^6}{10^5}$ rides per second = ~200 TPS (assuming there are $10^5$ seconds in a day)

Peak TPS (~3×) → ~600 TPS

Driver location updates:

Assume $1 \; million$ drivers update their locations every 5 seconds. → $\frac{10^6}{5}$ request per second = $200k$ TPS (write-heavy workload)

High Level Design

Core Services Overview

Let's not focus on the huge scale and high TPS requirement first and come up with a very High Level Design.

API Gateway – Entry point handling authentication, authorization, routing, rate limiting, and request orchestration.
Estimate Service – Accepts Source and Destination location from riders and Computes fare estimates and ETA using distance, traffic data, and pricing logic.
Ride Service – Accepts Ride requests from riders, manages ride lifecycle, persists ride state, and coordinates downstream workflows. Ride Service calls Driver Matching Service to find a suitable driver for the ride. Also when ride is complete it invokes Payment Service to accept payments.
Driver Matching Service – Finds suitable nearby drivers and controls ride assignment with consistency guarantees. Driver Matching is a complex task so introducing a separate Service makes sense. It Separates the ride creation/updation task from Driver Matching.
Location Service – Ingests high-frequency driver location updates and maintains latest coordinates.
Payment Service – Processes ride payments, charges users, and records transaction outcomes.

There is something wrong with the above image. We need an websocket layer that we will add in the next section. Before that let's go through the overall flow.

Basic Uber / Ola Ride Flow

1. Rider Requests Estimate

Rider enters source & destination
Estimate Service calculates ETA & fare using distance, traffic data, and pricing logic.

2. Rider Books Ride

Rider selects cab type
Ride Service creates ride with status = REQUESTED
Rider Service requests Driver Matching Service to find driver.

3. Driver Discovery

Driver Matching Service searches nearby drivers

4. Driver Allocation

Driver Matching Service sends requests to nearby drivers
First accepting driver gets the ride
Driver Matching Service sends driver details to Ride Service

5. Ride Confirmation

Ride Service updates ride → ACCEPTED with driver id
Rider & driver receive ride details

6. Driver Navigation

Driver moves to pickup location
Real-time location updates via WebSocket

7. Trip Start (OTP / Verification)

Driver verifies rider
Ride status → STARTED

8. Trip Completion

Driver reaches destination and completes the trip
Ride status → FINISHED

9. Payment Processing

After ride completion Ride Service invokes Payment Service to accept payment
Rider completes payment via Payment Service
Payment status recorded

10. Post-Ride Updates

Driver marked available
Trip stored for history & analytics

Driver Matching Logic

Let's talk about a naive Driver Matching Logic to start with.

Rider sends source and destination location. Any location has latitude and longitude (here shown as x and y in the world map)

How do we find nearby drivers of the rider. Let's say the service will find drivers within 2 km range of the riders source location. Database query scans drivers within coordinate range

Example query pattern:

SELECT driver_id, latitude, longitude FROM location WHERE latitude BETWEEN (user_lat - radius) AND (user_lat + radius) AND longitude BETWEEN (user_long - radius) AND (user_long + radius);

With Uber Scale this will definitely fail because:

Traditional databases are not optimized for multi-dimensional geo queries. These databases typically utilize one index efficiently per query. Combined latitude + longitude range filtering becomes costly. Let's say you filter by latitude and longitude one after another, even then queries return large intermediate result sets. Result → Inefficient execution plans and rising latency
Many drivers fall inside the bounding box. The DB must scan and post-filter large portions of the table. Query latency increases non-linearly with scale
Range queries produce a square region, not an actual radius. Drivers near corners may be far away. Extra distance calculations required to solve this which is a significant wasted computation.

We will come up with better approaches in next sections.

Why we need websocket connection?

Ride-hailing systems involve continuous real-time interactions such as:

Frequent driver location updates
Dispatching ride requests to drivers
Instant ride lifecycle updates (accepted, arrived, started, finished)

These workflows demand low-latency communication and immediate message delivery.
Traditional polling mechanisms introduce unnecessary overhead, while external notification systems may add delays.

Why Not Use SSE (Server-Sent Events)?

Although SSE supports server-to-client streaming, it has critical limitations:

Unidirectional communication (server → client only)
Requires separate APIs for client-to-server events
Inconsistent behavior across mobile environments
Less suitable for highly interactive systems

Ride-hailing platforms require both parties to exchange events dynamically, which SSE does not handle efficiently. So we can utilize a bidirectional connection like websocket

WebSocket Infrastructure – Enables low-latency bidirectional communication for live updates and dispatch events.

WebSocket Manager – Maintains active client connections registry and routes real-time events to the correct sockets.
WebSocket Handler – Owns individual WebSocket connections and handles incoming/outgoing messages for a connected client.

WebSocket Interaction Flow

Driver Matching

Driver Matching Service identifies candidate drivers for a ride.
It sends a dispatch event to the WebSocket Manager targeting specific driver IDs.
WebSocket Manager looks up active connections (driverId → socket).
Manager forwards the event to the appropriate WebSocket Handlers.
WebSocket Handlers push the ride request to the driver’s live WebSocket connection.
Driver response (accept / deny) flows back through Handlers → Manager → Driver Matching Service

Deep Dive: Handling frequent driver location updates

Assuming there are 1 million daily active drivers, and they are sending location update every 5 seconds:

$10^6$ location updates in every 5 seconds

= $\frac{10^6}{5}$ location updates every second = $200k$ TPS (which is huge)

Accepting Updates — HTTP vs WebSocket

HTTP (REST / Polling)

Each update requires full request–response overhead
Repeated headers & protocol costs
Higher latency under heavy load
Inefficient for small, frequent messages

Suitable for infrequent, stateless communication.

WebSocket

Persistent connection (no repeated handshake)
Minimal framing overhead per message
Very low latency delivery
Efficient for continuous updates

Ideal for high-frequency, real-time streaming workloads like driver locations.

High TPS Storage Strategy

Driver updates generate extremely high write throughput - 200k TPS.

SQL database supports max 10k TPS (single node)
dynamoDB supports upto 100k +
redis supports upto 1 million TPS
can we do batch updates to database to reduce write overhead?
it introduces delay in driver location updates -> driver matching might not be appropriate

Naïve approach:

Our Naive approach was writing every update to primary database which leads to Bottlenecks & latency spikes. How do we make things better here?

Consider the key observation below.

Key observation:

Matching logic needs latest location only for drivers, not full history

So why don't we store latest location in cache and historical location details in DB?

But can we store 1 million drivers' latest location in Cache?

How much storage does it need?

What Data Are We Storing Per Driver?

For latest location only, you typically store:

driver_id → 8 bytes
latitude → 8 bytes
longitude → 8 bytes

No history, no metadata.

Approx: ≈ 24 bytes per driver

Even if we consider 50-100 byter per driver:

For 1 million drivers latest location we need ≈ $10^6$ drivers × 100 bytes = $10^8$ bytes = 100 MB (which can easily fit into cache)

Recommended Pattern

Fast Path (Hot Data)

Store latest driver coordinates in Redis / in-memory cache
Supports massive TPS
Optimized for geo queries

Durable Path (Cold Data)

Persist location events asynchronously (Cassandra / DynamoDB)
Used for analytics & recovery
Does not impact real-time operations

Resulting Benefits

✔ Handles high TPS efficiently
✔ Keeps matching latency low
✔ Avoids database overload
✔ Improves system resilience

Deep Dive: Driver Matching Logic

Earlier we saw why naïve database range queries fail at scale.
Once we move latest driver locations to Redis (geo-indexed cache), we can use spatial partitioning techniques to make proximity search efficient.

Two common ideas:

Geohashing
Quad Trees

Geohashing

Geohashing is a method for encoding geographic coordinates (latitude and longitude) into a compact, hierarchical string. It enables efficient spatial indexing, proximity searches, and location-based queries.

Instead of treating the world as continuous coordinates, geohashing:

Divides the world into grid cells
Encodes each cell in binary
Converts binary into Base32 characters
Produces a short hash string

Grid Subdivision

Geohashing works by recursively splitting space:

Level 1: The world is divided into four large cells
Binary identifiers: 00, 01, 10, 11
Level 2+: Each cell is subdivided again into four smaller cells

Every subdivision adds more precision.

Binary Encoding

Each split alternates between:

Longitude division
Latitude division

This generates a sequence of bits (like Binary sequence → 011010011...) describing the position of a point within the grid.

Binary bits are grouped into chunks of 5 and then converted into base32 characters:

1001 10110 01001 10001 10000 10111 → 9q9jhr (Base 32) Hunan Friendly GeoHash

Properties:

Prefix similarity means spatial proximity
Shorter hash means larger region
Longer hash means smaller region

How it helps driver matching:

Driver location stored with geohash
Rider request arrives with source location
Compute rider geohash
Fetch drivers from:
- Same hash cell
- Adjacent hash cells
Sort by exact distance
Pick closest drivers

Why this works:

✔ Limits search space
✔ Avoids full table scans
✔ Constant-time lookup in cache

Redis directly supports this via:

GEOADD
GEOSEARCH / GEORADIUS

Quad Tree Indexing

Quad Tree recursively partitions space based on data density.

Key intuition:

Entire map = one big cell
If too many points inside a cell → split into 4 subcells
Repeat until manageable density

Start with the root region (entire space)

Points present: A, B, C, D, E, F
Too many points → split into 4 quadrants

For simplicity in this diagram above consider more than one point means too many points. After the split:

Top-left quadrant → contains A ✔ Only one point → no split

Bottom-left quadrant → contains B ✔ Only one point → no split

Top-right quadrant → contains C and D ❌ More than one point → must split

Bottom-right quadrant → contains E and F ❌ More than one point → must split

Splitting the C/D region

That quadrant is subdivided again. Now C and D fall into different child cells

✔ Each child now has ≤ 1 point → stop splitting

Why split happened: local density > capacity.

Splitting the E/F region

Same logic. too many points (more than one here for simplicity) in E/F region → subdivision required.

After split: Both E and F falls in the north left division. Hence in that North Left division we have more than one points.

Hence again subdivision required. Points separated into smaller cells → stop

Quadtree avoids unnecessary subdivision.

This is the core efficiency advantage.

Big Insight: Adaptive Resolution

Notice:

Sparse areas → large cells
Dense areas → small cells

This mirrors real systems:

Downtown → deeply subdivided

Suburbs → shallow tree

Why This Matters in Systems (Uber / Maps / Games)

Splits occur only where needed, improving:

✔ Query speed ✔ Memory usage ✔ Spatial locality ✔ Scalability

Instead of wasting computation on empty space.

Deep Dive: Updated Driver Matching Flow

Drivers continuously stream their latest GPS coordinates to the backend.

Driver App → Location Service → Cache (Redis)

GEOADD active_drivers <lon> <lat> <driver_id>

List of drivers fetched from chunks which are adjacent to source location chunk.

Key Properties

Always stores latest position
Overwrites previous location
Optimized for high-frequency updates
Redis acts as real-time location cache

When a rider requests a trip, the system performs a proximity search within Redis.

Step 1: Ride Request

Rider sends source and destination location Ride Service → Driver Matching Service

src_lat, src_lon, dst_lat, dst_lon

Step 2: Proximity Search in Redis

GEOSEARCH active_drivers
  FROMLONLAT <src_lon> <src_lat>
  BYRADIUS <r> km
  WITHDIST
  COUNT N
  ASC

Redis returns:

Nearby drivers
Sorted by distance
Low-latency lookup

Step 3: Candidate Filtering

Matching service may apply:

✔ Driver availability checks ✔ ETA constraints ✔ Business rules (rating, surge, etc.)

Step 4: Driver Dispatch

Driver Matching Service requests drivers to accept ride (using websocket connection).

Step 5: Acceptance Handling

If driver accepts the ride then upon receiving response from driver, Driver Matching Service updates Ride Service with matched Driver Id.

Ride Service assigns driver_id to ride entry and Ride status changed to ACCEPTED.

Ride Service notifies rider that we have found a driver.

What happens in case of redis failure?

Redis is distributed and failure is very rare. But even for that case we should have robust monitoring and failover straegies.

Also system recovers quickly after failure as driver sends the location update every 5 seconds.

Driver Matching Consistency (Preventing Double Booking)

A ride-sharing system must guarantee matching correctness under extreme concurrency.

Two critical failure scenarios exist:

One ride → multiple drivers
One driver → multiple rides

Both must be prevented.

Problem 1: One Ride, Multiple Drivers

We must ensure:

✔ Only one driver can accept a ride
✔ No duplicate assignments
✔ No race conditions across servers

Solution: Partition by Ride ID

The Driver Matching Service is sharded using consistent hashing on ride_id.

Driver Matching Service sends request to drivers sequentially

One denies, sends request to the next driver
So no two drivers will be booked at the same time for one ride

Implication:

✔ A single service instance owns a ride
✔ No cross-instance coordination needed
✔ No duplicate dispatch decisions

Flow:

Ride request arrives
Hash(ride_id) → matching instance
Instance sends requests to drivers

Problem 2: One Driver, Multiple Rides (Harder Problem)

Even with ride-based sharding:

✔ Consider Two different rides went to two different instances
✔ Both instances query Redis
✔ Same driver may appear in both results

Without protection:

❌ Driver could receive multiple requests
❌ Driver accepts one → others become invalid
❌ Risk of double booking

Why Instance-Level Locks Fail

Locking inside a service instance:

❌ Does not work across instances
❌ Cannot prevent distributed races

Multiple instances may still target the same driver.

Why DB-Based Locks Are Dangerous

Updating driver state in database:

❌ Slow (high write latency)
❌ Requires cleanup logic
❌ Risk of stale locks on crashes

Example failure:

Driver state is OUTSTANDING_REQUEST and Service crashes. Driver is stuck in that state.

Requires cron / repair jobs to cleanup which is undesirable overhead.

Correct Solution: Distributed Lock in Redis (With TTL)

Redis provides a fast, atomic locking primitive.

When dispatching a request:

SET driver_lock:<driver_id> ride_id NX EX 10

✔ NX → Only acquire if not locked ✔ EX 10 → Auto-expires after 10 seconds

For each candidate driver, Driver Matching Service attempt Redis lock.

If lock fails → skip driver
If lock succeeds → send request
Only one ride globally can target that driver.

Why TTL Is Essential?

Drivers may ignore requests or services may crash.

TTL guarantees:

No permanent lock
Automatic recovery
No cleanup jobs required

If driver accepts:

Update driver state to IN_RIDE
Remove lock / let expire (as driver state now makes sure that we dont send request to that driver)

If driver rejects / timeout:

Delete lock immediately (optional optimization)

How frequently drivers need to send location updates?

In client we can have adaptive strategy (on device sensor and client algorithm)
If driver is not moving, dont send updates
If moving slowly send less frequent updates

Scaling the Ride Service

The Ride Service operates on the critical path of every trip request and must support extreme concurrency, burst traffic, and geographically skewed demand.

The Ride Service should be designed as stateless.

Implications:

Any instance can process any request
Horizontal scaling becomes straightforward
No session affinity or sticky routing needed
Failures are easier to recover from

All persistent state is stored externally:

Ride metadata → Database
Driver availability → Cache / Redis
Event history → Durable log (Kafka / DynamoDB / Cassandra)

Queue-Based Load Regulation

Instead of synchronously invoking downstream services we can introduce queue in between.

Ride Service → Queue → Driver Matching Service

Introducing a queue provides:

Burst absorption during traffic spikes
Backpressure protection
Controlled processing rate
Failure isolation

Without queuing:

❌ Sudden demand spikes can overload matching systems
❌ Cascading failures become likely

With queuing, System remains stable under peak load

Large-scale ride-sharing systems must handle geographically skewed traffic while maintaining low latency and high availability. Geo sharding and multi–data center deployment are core strategies to achieve this.

Geo Sharding

Geo sharding partitions the system based on geographic regions such as:

✔ Country
✔ City
✔ Region / grid cell

Each shard is responsible for rides, drivers, and users within its region.

Why Geo Sharding?

✔ Requests are processed close to users
✔ Lower network latency
✔ Independent scaling per region
✔ Failure isolation (smaller blast radius)

Most ride operations are local, making geographic partitioning a natural fit. GeoSharding handles scale and locality.

Multi–Data Center Deployment

Each geographic shard is deployed across multiple data centers within the region.

Typical setup:

✔ Active–active data centers
✔ Load balancer routes traffic to nearest DC
✔ Shared regional data stores

Benefits

✔ High availability
✔ Disaster recovery
✔ Zero or minimal downtime
✔ Resilience to DC-level failures

Multi–data center support handles availability and resilience.

Data Consistency Model

✔ Strong consistency within a region
✔ Eventual consistency across regions

Cross-region replication is used for:

✔ Analytics
✔ Reporting
✔ Machine learning models