Uber System Design
Designing a large-scale ride-hailing platform such as Uber or Ola requires building a highly distributed, real-time system that seamlessly connects millions of riders and drivers across diverse geographic regions. These platforms function as dynamic marketplaces, continuously processing live location updates, demand spikes, pricing decisions, and matching operations β all under strict latency, availability, and reliability constraints.
At its core, such a system must efficiently solve challenges in real-time location tracking, low-latency proximity search, elastic scaling, fault tolerance, and consistency management, while operating reliably under unpredictable traffic patterns and regional surges.
Functional Requirements
I have categorised the Functional Requirements from the users' point of view. There are two types of users in Uber/Ola like systems - Rider and Driver.
Core user flows:
- Rider searches for a cab using source & destination
- System shows ETA & fare estimates for cab types (Go, Moto, Sedan, etc.)
- Rider books a ride
- System finds nearby drivers
- Ride requests sent to drivers
- Driver accepts / denies request
- Driver navigates to pickup location
- Rider verification using OTP
- Trip execution β Drop-off
- Rider makes payment
Out of Scope
- Profile creation
- Ratings & reviews
- Notification service internals
- Monitoring / analytics
Non-Functional Requirements
Availability vs Consistency
Different subsystems require different guarantees:
- Highly Available β ETA, fare estimates
- Highly Consistent β Driver matching (avoid double booking)
Latency Sensitivity
- Location updates must be near real-time
- Matching decisions must be fast
- UI interactions should feel instantaneous
Durability
- Ride lifecycle data must persist
- Payments & trip records cannot be lost
Scalability
- Must handle peak traffic (rush hours)
- Rapid geographic expansion
Core Entities
- Rider
- Driver
- Ride
- Estimate
- Location
- Cab
APIs
Fare Estimate
GET /estimates?src_lat={}&src_long={}&dest_lat={}&dest_long={}
Output β estimateId, fare, ETA (per cab type)
Book a Ride
POST /rides
{
"estimateId": "...",
"cabType": "sedan"
}
Location Update
POST /location/update
{
"latitude": ...,
"longitude": ...
}
Driver Accepts/Denies Ride Request
PATCH /rides/driver/accept
{
"rideId": "...",
"accept": true
}
Driver Updates The Ride For Pickup and Dropoff
PATCH /rides/driver/update
{
"rideId": "...",
"update": "pickedUp"
}
Make Payment
POST /payments
{
"rideId": "...",
"paymentInfo": {}
}
Scale Estimation
Let's assume the below numbers for riders, drivers and rides:
total riders
daily active riders
1 million drivers
Ride requests:
rides/day β rides per second = ~200 TPS (assuming there are seconds in a day)
Peak TPS (~3Γ) β ~600 TPS
Driver location updates:
Assume drivers update their locations every 5 seconds. β request per second = TPS (write-heavy workload)
High Level Design
Core Services Overview
Let's not focus on the huge scale and high TPS requirement first and come up with a very High Level Design.
-
API Gateway β Entry point handling authentication, authorization, routing, rate limiting, and request orchestration.
-
Estimate Service β Accepts Source and Destination location from riders and Computes fare estimates and ETA using distance, traffic data, and pricing logic.
-
Ride Service β Accepts Ride requests from riders, manages ride lifecycle, persists ride state, and coordinates downstream workflows. Ride Service calls Driver Matching Service to find a suitable driver for the ride. Also when ride is complete it invokes Payment Service to accept payments.
-
Driver Matching Service β Finds suitable nearby drivers and controls ride assignment with consistency guarantees. Driver Matching is a complex task so introducing a separate Service makes sense. It Separates the ride creation/updation task from Driver Matching.
-
Location Service β Ingests high-frequency driver location updates and maintains latest coordinates.
-
Payment Service β Processes ride payments, charges users, and records transaction outcomes.

There is something wrong with the above image. We need an websocket layer that we will add in the next section. Before that let's go through the overall flow.
Basic Uber / Ola Ride Flow
1. Rider Requests Estimate
- Rider enters source & destination
- Estimate Service calculates ETA & fare using distance, traffic data, and pricing logic.
2. Rider Books Ride
- Rider selects cab type
- Ride Service creates ride with
status = REQUESTED - Rider Service requests Driver Matching Service to find driver.
3. Driver Discovery
- Driver Matching Service searches nearby drivers
4. Driver Allocation
- Driver Matching Service sends requests to nearby drivers
- First accepting driver gets the ride
- Driver Matching Service sends driver details to Ride Service
5. Ride Confirmation
- Ride Service updates ride β
ACCEPTEDwith driver id - Rider & driver receive ride details
6. Driver Navigation
- Driver moves to pickup location
- Real-time location updates via WebSocket
7. Trip Start (OTP / Verification)
- Driver verifies rider
- Ride status β
STARTED
8. Trip Completion
- Driver reaches destination and completes the trip
- Ride status β
FINISHED
9. Payment Processing
- After ride completion Ride Service invokes Payment Service to accept payment
- Rider completes payment via Payment Service
- Payment status recorded
10. Post-Ride Updates
- Driver marked available
- Trip stored for history & analytics
Driver Matching Logic
Let's talk about a naive Driver Matching Logic to start with.
Rider sends source and destination location. Any location has latitude and longitude (here shown as x and y in the world map)
How do we find nearby drivers of the rider. Let's say the service will find drivers within 2 km range of the riders source location. Database query scans drivers within coordinate range
Example query pattern:
SELECT driver_id, latitude, longitude FROM location WHERE latitude BETWEEN (user_lat - radius) AND (user_lat + radius) AND longitude BETWEEN (user_long - radius) AND (user_long + radius);
With Uber Scale this will definitely fail because:
- Traditional databases are not optimized for multi-dimensional geo queries. These databases typically utilize one index efficiently per query. Combined latitude + longitude range filtering becomes costly. Let's say you filter by latitude and longitude one after another, even then queries return large intermediate result sets. Result β Inefficient execution plans and rising latency
- Many drivers fall inside the bounding box. The DB must scan and post-filter large portions of the table. Query latency increases non-linearly with scale
- Range queries produce a square region, not an actual radius. Drivers near corners may be far away. Extra distance calculations required to solve this which is a significant wasted computation.
We will come up with better approaches in next sections.
Why we need websocket connection?
Ride-hailing systems involve continuous real-time interactions such as:
- Frequent driver location updates
- Dispatching ride requests to drivers
- Instant ride lifecycle updates (accepted, arrived, started, finished)
These workflows demand low-latency communication and immediate message delivery.
Traditional polling mechanisms introduce unnecessary overhead, while external notification systems may add delays.
Why Not Use SSE (Server-Sent Events)?
Although SSE supports server-to-client streaming, it has critical limitations:
- Unidirectional communication (server β client only)
- Requires separate APIs for client-to-server events
- Inconsistent behavior across mobile environments
- Less suitable for highly interactive systems
Ride-hailing platforms require both parties to exchange events dynamically, which SSE does not handle efficiently. So we can utilize a bidirectional connection like websocket
WebSocket Infrastructure β Enables low-latency bidirectional communication for live updates and dispatch events.
-
WebSocket Manager β Maintains active client connections registry and routes real-time events to the correct sockets.
-
WebSocket Handler β Owns individual WebSocket connections and handles incoming/outgoing messages for a connected client.

WebSocket Interaction Flow
Driver Matching
-
Driver Matching Service identifies candidate drivers for a ride.
-
It sends a dispatch event to the WebSocket Manager targeting specific driver IDs.
-
WebSocket Manager looks up active connections (driverId β socket).
-
Manager forwards the event to the appropriate WebSocket Handlers.
-
WebSocket Handlers push the ride request to the driverβs live WebSocket connection.
-
Driver response (accept / deny) flows back through Handlers β Manager β Driver Matching Service
Deep Dive: Handling frequent driver location updates
Assuming there are 1 million daily active drivers, and they are sending location update every 5 seconds:
location updates in every 5 seconds
= location updates every second = TPS (which is huge)
Accepting Updates β HTTP vs WebSocket
HTTP (REST / Polling)
- Each update requires full requestβresponse overhead
- Repeated headers & protocol costs
- Higher latency under heavy load
- Inefficient for small, frequent messages
Suitable for infrequent, stateless communication.
WebSocket
- Persistent connection (no repeated handshake)
- Minimal framing overhead per message
- Very low latency delivery
- Efficient for continuous updates
Ideal for high-frequency, real-time streaming workloads like driver locations.
High TPS Storage Strategy
Driver updates generate extremely high write throughput - 200k TPS.
-
SQL database supports max 10k TPS (single node)
-
dynamoDB supports upto 100k +
-
redis supports upto 1 million TPS
-
can we do batch updates to database to reduce write overhead?
-
it introduces delay in driver location updates -> driver matching might not be appropriate
NaΓ―ve approach:
Our Naive approach was writing every update to primary database which leads to Bottlenecks & latency spikes. How do we make things better here?
Consider the key observation below.
Key observation:
- Matching logic needs latest location only for drivers, not full history
So why don't we store latest location in cache and historical location details in DB?
But can we store 1 million drivers' latest location in Cache?
How much storage does it need?
What Data Are We Storing Per Driver?
For latest location only, you typically store:
-
driver_id β 8 bytes
-
latitude β 8 bytes
-
longitude β 8 bytes
No history, no metadata.
Approx: β 24 bytes per driver
Even if we consider 50-100 byter per driver:
For 1 million drivers latest location we need β drivers Γ 100 bytes = bytes = 100 MB (which can easily fit into cache)
Recommended Pattern
Fast Path (Hot Data)
- Store latest driver coordinates in Redis / in-memory cache
- Supports massive TPS
- Optimized for geo queries
Durable Path (Cold Data)
- Persist location events asynchronously (Cassandra / DynamoDB)
- Used for analytics & recovery
- Does not impact real-time operations

Resulting Benefits
β Handles high TPS efficiently
β Keeps matching latency low
β Avoids database overload
β Improves system resilience
Deep Dive: Driver Matching Logic
Earlier we saw why naΓ―ve database range queries fail at scale.
Once we move latest driver locations to Redis (geo-indexed cache), we can use spatial partitioning techniques to make proximity search efficient.
Two common ideas:
- Geohashing
- Quad Trees
Geohashing
Geohashing is a method for encoding geographic coordinates (latitude and longitude) into a compact, hierarchical string. It enables efficient spatial indexing, proximity searches, and location-based queries.
Instead of treating the world as continuous coordinates, geohashing:
- Divides the world into grid cells
- Encodes each cell in binary
- Converts binary into Base32 characters
- Produces a short hash string
Grid Subdivision
Geohashing works by recursively splitting space:
-
Level 1: The world is divided into four large cells
Binary identifiers:00,01,10,11 -
Level 2+: Each cell is subdivided again into four smaller cells
Every subdivision adds more precision.
Binary Encoding
Each split alternates between:
- Longitude division
- Latitude division
This generates a sequence of bits (like Binary sequence β 011010011...) describing the position of a point within the grid.
Binary bits are grouped into chunks of 5 and then converted into base32 characters:
1001 10110 01001 10001 10000 10111 β 9q9jhr (Base 32) Hunan Friendly GeoHash
Properties:
- Prefix similarity means spatial proximity
- Shorter hash means larger region
- Longer hash means smaller region
How it helps driver matching:
- Driver location stored with geohash
- Rider request arrives with source location
- Compute rider geohash
- Fetch drivers from:
- Same hash cell
- Adjacent hash cells
- Sort by exact distance
- Pick closest drivers
Why this works:
β Limits search space
β Avoids full table scans
β Constant-time lookup in cache
Redis directly supports this via:
GEOADDGEOSEARCH / GEORADIUS
Quad Tree Indexing
Quad Tree recursively partitions space based on data density.
Key intuition:
- Entire map = one big cell
- If too many points inside a cell β split into 4 subcells
- Repeat until manageable density
Start with the root region (entire space)
-
Points present: A, B, C, D, E, F
-
Too many points β split into 4 quadrants
For simplicity in this diagram above consider more than one point means too many points. After the split:
Top-left quadrant β contains A β Only one point β no split
Bottom-left quadrant β contains B β Only one point β no split
Top-right quadrant β contains C and D β More than one point β must split
Bottom-right quadrant β contains E and F β More than one point β must split
- Splitting the C/D region
That quadrant is subdivided again. Now C and D fall into different child cells
β Each child now has β€ 1 point β stop splitting
Why split happened: local density > capacity.
- Splitting the E/F region
Same logic. too many points (more than one here for simplicity) in E/F region β subdivision required.
After split: Both E and F falls in the north left division. Hence in that North Left division we have more than one points.
Hence again subdivision required. Points separated into smaller cells β stop
Quadtree avoids unnecessary subdivision.
This is the core efficiency advantage.
Big Insight: Adaptive Resolution
Notice:
- Sparse areas β large cells
- Dense areas β small cells
This mirrors real systems:
Downtown β deeply subdivided
Suburbs β shallow tree
Why This Matters in Systems (Uber / Maps / Games)
Splits occur only where needed, improving:
β Query speed β Memory usage β Spatial locality β Scalability
Instead of wasting computation on empty space.
Deep Dive: Updated Driver Matching Flow
Drivers continuously stream their latest GPS coordinates to the backend.
Driver App β Location Service β Cache (Redis)
GEOADD active_drivers <lon> <lat> <driver_id>
List of drivers fetched from chunks which are adjacent to source location chunk.
Key Properties
- Always stores latest position
- Overwrites previous location
- Optimized for high-frequency updates
- Redis acts as real-time location cache
When a rider requests a trip, the system performs a proximity search within Redis.
Step 1: Ride Request
Rider sends source and destination location Ride Service β Driver Matching Service
src_lat, src_lon, dst_lat, dst_lon
Step 2: Proximity Search in Redis
GEOSEARCH active_drivers
FROMLONLAT <src_lon> <src_lat>
BYRADIUS <r> km
WITHDIST
COUNT N
ASC
Redis returns:
- Nearby drivers
- Sorted by distance
- Low-latency lookup
Step 3: Candidate Filtering
Matching service may apply:
β Driver availability checks β ETA constraints β Business rules (rating, surge, etc.)
Step 4: Driver Dispatch
Driver Matching Service requests drivers to accept ride (using websocket connection).
Step 5: Acceptance Handling
If driver accepts the ride then upon receiving response from driver, Driver Matching Service updates Ride Service with matched Driver Id.
Ride Service assigns driver_id to ride entry and Ride status changed to ACCEPTED.
Ride Service notifies rider that we have found a driver.

What happens in case of redis failure?
Redis is distributed and failure is very rare. But even for that case we should have robust monitoring and failover straegies.
Also system recovers quickly after failure as driver sends the location update every 5 seconds.
Driver Matching Consistency (Preventing Double Booking)
A ride-sharing system must guarantee matching correctness under extreme concurrency.
Two critical failure scenarios exist:
- One ride β multiple drivers
- One driver β multiple rides
Both must be prevented.
Problem 1: One Ride, Multiple Drivers
We must ensure:
β Only one driver can accept a ride
β No duplicate assignments
β No race conditions across servers
Solution: Partition by Ride ID
The Driver Matching Service is sharded using consistent hashing on ride_id.
Driver Matching Service sends request to drivers sequentially
- One denies, sends request to the next driver
- So no two drivers will be booked at the same time for one ride
Implication:
β A single service instance owns a ride
β No cross-instance coordination needed
β No duplicate dispatch decisions
Flow:
- Ride request arrives
- Hash(
ride_id) β matching instance - Instance sends requests to drivers
Problem 2: One Driver, Multiple Rides (Harder Problem)
Even with ride-based sharding:
β Consider Two different rides went to two different instances
β Both instances query Redis
β Same driver may appear in both results
Without protection:
β Driver could receive multiple requests
β Driver accepts one β others become invalid
β Risk of double booking
Why Instance-Level Locks Fail
Locking inside a service instance:
β Does not work across instances
β Cannot prevent distributed races
Multiple instances may still target the same driver.
Why DB-Based Locks Are Dangerous
Updating driver state in database:
β Slow (high write latency)
β Requires cleanup logic
β Risk of stale locks on crashes
Example failure:
Driver state is OUTSTANDING_REQUEST and Service crashes. Driver is stuck in that state.
Requires cron / repair jobs to cleanup which is undesirable overhead.
Correct Solution: Distributed Lock in Redis (With TTL)
Redis provides a fast, atomic locking primitive.
When dispatching a request:
SET driver_lock:<driver_id> ride_id NX EX 10
β NX β Only acquire if not locked β EX 10 β Auto-expires after 10 seconds
For each candidate driver, Driver Matching Service attempt Redis lock.
-
If lock fails β skip driver
-
If lock succeeds β send request
-
Only one ride globally can target that driver.
Why TTL Is Essential?
Drivers may ignore requests or services may crash.
TTL guarantees:
- No permanent lock
- Automatic recovery
- No cleanup jobs required
If driver accepts:
- Update driver state to IN_RIDE
- Remove lock / let expire (as driver state now makes sure that we dont send request to that driver)
If driver rejects / timeout:
- Delete lock immediately (optional optimization)
How frequently drivers need to send location updates?
- In client we can have adaptive strategy (on device sensor and client algorithm)
- If driver is not moving, dont send updates
- If moving slowly send less frequent updates
Scaling the Ride Service
The Ride Service operates on the critical path of every trip request and must support extreme concurrency, burst traffic, and geographically skewed demand.
The Ride Service should be designed as stateless.
Implications:
- Any instance can process any request
- Horizontal scaling becomes straightforward
- No session affinity or sticky routing needed
- Failures are easier to recover from
All persistent state is stored externally:
- Ride metadata β Database
- Driver availability β Cache / Redis
- Event history β Durable log (Kafka / DynamoDB / Cassandra)
Queue-Based Load Regulation
Instead of synchronously invoking downstream services we can introduce queue in between.
Ride Service β Queue β Driver Matching Service
Introducing a queue provides:
- Burst absorption during traffic spikes
- Backpressure protection
- Controlled processing rate
- Failure isolation
Without queuing:
- β Sudden demand spikes can overload matching systems
- β Cascading failures become likely
With queuing, System remains stable under peak load

Large-scale ride-sharing systems must handle geographically skewed traffic while maintaining low latency and high availability. Geo sharding and multiβdata center deployment are core strategies to achieve this.
Geo Sharding
Geo sharding partitions the system based on geographic regions such as:
β Country
β City
β Region / grid cell
Each shard is responsible for rides, drivers, and users within its region.
Why Geo Sharding?
β Requests are processed close to users
β Lower network latency
β Independent scaling per region
β Failure isolation (smaller blast radius)
Most ride operations are local, making geographic partitioning a natural fit. GeoSharding handles scale and locality.
MultiβData Center Deployment
Each geographic shard is deployed across multiple data centers within the region.
Typical setup:
β Activeβactive data centers
β Load balancer routes traffic to nearest DC
β Shared regional data stores
Benefits
β High availability
β Disaster recovery
β Zero or minimal downtime
β Resilience to DC-level failures
Multiβdata center support handles availability and resilience.
Data Consistency Model
β Strong consistency within a region
β Eventual consistency across regions
Cross-region replication is used for:
β Analytics
β Reporting
β Machine learning models