YourPlatform

Uber System Design

Functional Requirements

  • rider searches for cab (with source & destination)
  • uber shows ETA and fare estimate for different types of cab (go, moto, sedan etc)
  • rider books a cab
  • uber matches nearby drivers
  • uber sends request to nearby drivers -> sends notification
  • driver accept/deny ride request
  • driver accepts - navigates to pickup location
  • after pickup (using otp) driver navigates to drop location
  • driver ends the trip
  • rider makes payment

out of scope

  • profile creation
  • ratings( driver and rider)
  • notification service
  • monitoring
  • analytics

Non Functional Requirements

  • Cap theorem - highly available for ETA, fare estimate etc
  • Highly consistent for driver matching - no double booking (1 ride two drivers, 1 drivers two ride)
  • Latency - as low as possible
  • durable - ride details persistence
  • scalable - for peak hours, for new users

Entities

  • rider
  • driver
  • estimate
  • ride
  • location
  • cab

APIs

get fare estimate

GET /estimates?src_lat={}&src_long={}&dest_lat={}&dest_long={}

    -> estimateId, estimate, ETA for different cab types

book a ride

POST /rides

{
    estimateId,
    cabType
}

location updates

POST /location/update

{ 
  latitude,
  longitude
}

driver accepts/deny ride request

patch /rides/driver/accept

{ 
    rideId,
    accept (true/false)
}

driver updates the ride for pickup and dropoff

patch /rides/driver/update

{
    rideId,
    update (pickedUp/droppedOff)
}

make payment

POST /payments

{
    rideId,
    paymentInfo {}
}

Scale

  • 100 million total riders
  • 10 million daily active riders - 2 riders per users (on an average)
  • 1 million drivers

ride requests = 20 million per day = 20 * 10 ^6/10^5 = 200 TPS

peak TPS = 3x = 600 TPS

location update every 5s -> 10^6/5 TPS =>200k TPS

High Level Design

image

Why we need websocket connection?

  • frequent location updates
  • server to send ride accept requests to the drivers
  • server to send real time updates for ride status update (using notification service might add delay)

why not use SSE?

  • SSE not supported in all mobile devices
  • websocket has lower latency
  • bidirectional connection is required

So we can utilize a bidirectional connection like websocket

Handle frequent driver location updates

  • 1 million DA drivers

  • sending update every 5s -> 10^6/5 TPS =>200k TPS

  • SQL database supports max 10k TPS (single node)

  • dynamoDB supports upto 100k +

  • redis supports upto 1 million TPS

  • can we do batch updates to database to reduce write overhead?

  • it introduces delay in driver location updates -> driver matching might not be appropriate

Driver Matching Logic

user sends src location let's say the service will find drivers within 2 km range

image

GeoHashing

image

Quad Tree

image

image

image

  • map service uses some geo hashing/ quad tree indexing
  • using ride src location, we get chunk of src from map service
  • we get adjacent chunks from map service
  • list of drivers fetched from location DB who belongs to those adjacent chunks (order by distance, limit by x)
  • Driver matching service sends request to drivers one by one
  • one driver rejects, move on to the next one
  • if driver accepts, send request to ride service to update the ride details with driver, ride status changed to accepted
  • sends notification to user

We have driver locations, but how do we know which chunks they belong to? again take help from map service!!!

How to do proximity search for drivers without using map service?

  • there is lots of overhead in driver matching previous approach

  • lets use a database with geohashing index capability options - POstgres with PostGIS, redis

Can we use redis?

  • we just need to store the latest location of drivers in cache

driver_id, lat, long -> 8 bytes *3 => ~ 25bytes for 1 million drivers -> 25 x 10^6 bytes = 25 mb

We can store the latest driver locations in redis -> distributed cache

update location events if we want to store we can use cassanda/dynamodb

Redis supports upto 1 million TPS

  • GEOADD, GEOSEARCH -> search within a radius

image

in case of redis failure? distributed - very rare -> have robust monitoring and failover straegies

  • system recovers quickly after failure -> as driver sends the location update every 5 seconds

Driver Matching Consistency (No double booking)

one ride multiple drivers?

  • we scale the Driver matching service based on rideId (consistent hashing)
  • only one server handles one ride
  • server sends request to drivers sequentially
  • one denies, sends request to the next driver
  • so no two drivers will be booked at the same time for one ride

One driver, multiple rides?

  • scaling the Driver matching service based on ride id

  • two ride request went to same service instance

  • both got adjacent driver lists

  • possible that one driver is present in both the lists

  • we dont want to send multiple ride requests to one driver simultaneously, if he accepts => double booking :(

Can we have service instance level lock? one service instance doesnot send requests to one driver for multiple rides simultaneously

  • two ride request went to different service instance
  • both got adjacent driver lists
  • possible that one driver is present in both the lists
  • we dont want to send multiple ride requests to one driver simultaneously, if he accepts => double booking :(

service instance level lock on driver with TTL?

  • multiple service instance can lock the same driver simultaneously, whats the use of this lock?
  • only within a service instance this lock works

Database update for driver status, application level lock TTL?

  • update driver status to be outstanding_request
  • manage lock in application, if driver accepts update the driver status to accepted or in_ride.
  • if driver denies release lock, updates status to be available

what if service instance crashes? driver status remains locked in outstanding_request

Need to run a cron job to find such driver status and clear that out. Extra overhead!!

Distributed lock in Redis with TTL

  • driver id locked for say 10s

image

How do we Scale Ride Service?

  • make it stateless, use queue before matching service

How frequently drivers need to send location updates?

  • in client we can have adaptive strategy (on device sensor and client algorithm)
  • if driver is not moving, dont send updates
  • if moving slowly send less frequent updates

How to scale geographically and for peak hours?

  • shard the system geographically

  • GEO sharding, consistent hashing

  • copy of the systems deployed in multiple data centres, with data replication worldwide

That was a free preview lesson.