Youtube System Design
YouTube is a globally distributed video-sharing and streaming platform that allows users to upload large video files, publish them, and provide smooth, low-latency streaming to millions of concurrent viewers. The system must support:
Video upload and storage
Video processing and transcoding
Efficient and scalable streaming
Search and discovery
Personalization and recommendations
High availability and fault tolerance
YouTube must handle massive scale: millions of uploads per day and billions of views with low latency and high reliability.
Functional Requirements
A YouTube-like system must support the following core functions:
Video Uploading – Users can upload videos (potentially hundreds of MBs to GBs).
Video Storage & Encoding – Videos are stored durably and transcoded into multiple formats.
Video Streaming – Users can stream videos with adaptive bitrate support (e.g., HLS or MPEG-DASH).
User Interaction – Users can watch, like, comment, subscribe, and interact with video content.
Search – Users can search for videos based on metadata (title, tags, description).
Recommendations – Personalized video suggestions delivered based on viewing history and behavior.
In this discussion we are currently covering these main topics:
- Video Uploading
- Video Storage & Encoding
- Video Streaming
Non-Functional Requirements
The system must satisfy:
High availability & reliability – No single point of failure, resilient to outages.
Low latency – Fast response times for streaming and search ( latency < 500 ms).
Scalability – Support millions of uploads and hundreds of millions of views daily.
Durability – Store petabytes of data reliably.
Cost efficiency – Use caching and CDNs to reduce backend load.
Scale
- 100 million daily active users
- 1 million daily uploads
- 400 million daily watch
- Max video size of youtube is 256 GB
Calculation:
1 million daily uploads = 10^6/10^5 uploads/sec = 10 TPS
400 million watch = 400 * 10^6 /10^5 watch/sec = 4k QPS
1 million daily uploads, assume avg video size = 500 MB = 10^6 * 500 MB = 500 TB/day = 500 *365 TB/year = approx 500 * 400 = 200k TB = 200 PB
Entities
Major entities in the system would be:
User
Video
VideoMetadata
API
Upload a Video
POST /videos/upload
video, videoMetadata
Watch a Video
GET /videos/videoId/watch
High Level Design
At a conceptual level, a video platform such as YouTube can be understood through two primary user journeys:
- Upload Flow – Getting videos into the system
- Watch Flow – Delivering videos efficiently to viewers
These flows are supported by gateway layers, storage systems, metadata management, and global delivery infrastructure.
Core Components
Client Applications
Users interact with the system through:
- Web browsers
- Mobile applications
- Smart TVs and other devices
Two key interfaces drive the platform:
- Upload Page – Handles large video uploads
- Watch Page – Streams video content to viewers
API Gateway
All client requests pass through the API Gateway, which acts as the entry point to the system.
Responsibilities include:
- Request routing (Upload, Watch, Metadata, etc.)
- Authentication and rate limiting
- TLS termination
- Protection of internal services
This layer prevents clients from directly accessing backend services.
Upload Service
When a user uploads a video:
- The client sends a request via the API Gateway
- The Upload Service stores the video in object storage like Google Cloud Storage or S3.
- Video Metadata( like title, description, author etc.) is stored in some metadata database.

Video metadata table is shown in the image where id, title, creator_id, description, channel, tags, video_url etc. will be stored.
A Content Delivery Network (CDN) is essential for YouTube to deliver billions of videos globally without lag, buffering, or server crashes. By caching content on edge servers close to users, it minimizes latency, reduces the load on central servers, and enables high-quality, seamless streaming across different devices and internet speeds. So we also store the videos in CDN or edge caches for faster access.

Deep Dive: Uploading Large Videos
-
Video files are large (100MB to several GB).
-
Uploading through your app server:
-
Increases memory & CPU usage
-
Consumes bandwidth
-
-
If 100 users upload at once, your server must handle concurrent large streams
-
Adds unnecessary network hops and delay. Files must go:
-
From Client to App Server first,
-
Then from App Server to Blob Storage (like S3)
-
-
API gateway has hard limit of payload size of 10 MB
So we cannot upload large files via Upload Server.
Pre-Signed URL
Better approach: client directly uploads in blob storage. But for that client needs to have proper permission to upload directly in Blob Storage. This is handled by pre-signed url.
-
The client sends a request to the upload API.
-
The upload service gets one pre-signed URL from the object storage and passes it to the client. Then client can directly upload video into object storage (blob store), avoiding heavy application server load.

Multi-part upload
Object storage also will have some upload size limit. Like for S3 object size > 100 MB will not be supported.
=> We should consider breaking the video in chunks/parts and upload in parts, which is called Multi-Part Upload.
Multipart upload splits a large file into smaller parts for efficient and reliable transfer.
Multipart Upload – Overview
Multipart upload is a strategy for handling large file transfers by breaking a file into smaller, independent chunks. Instead of sending a single massive request, the client uploads multiple parts, making the process faster, more reliable, and easier to recover from failures.
High-Level Flow
-
Upload Initiation
The client asks the backend to start an upload. The backend coordinates with object storage and receives anupload_id, which uniquely tracks the upload session. -
Signed URL Generation
The backend decides number of parts needed (from filesize metadata) and generates pre-signed URLs for each file part. These URLs allow the client to upload directly to object storage without routing large payloads through the application servers. -
Parallel Part Uploads
The client splits the file into chunks and uploads them independently. Parts can be uploaded in parallel and retried individually if any network failure occurs. -
Upload Completion
After all parts are successfully uploaded, the client signals completion. Object storage validates the parts and assembles them into the final file.
Why Multipart Upload Is Important
- Failure Resilience – Only failed parts are retried, not the entire file
- Performance – Parallel uploads significantly reduce total transfer time
- Scalability – Backend services avoid becoming bandwidth bottlenecks
- Efficiency – Ideal for large media, video, and data-heavy systems


Deep Dive: Streaming Videos
Why Direct Video Downloads Are Not Ideal?

Serving large videos as direct downloads creates serious performance, reliability, and user-experience problems:
- Slow Start Time – Users must wait for a large portion of the file to download before playback begins, which is unacceptable for multi-GB videos.
- Storage & Memory Constraints – The client effectively needs sufficient buffer/storage capacity, making playback fragile on constrained devices.
- Wasted Bandwidth – If the user stops watching midway, significant data may have already been transferred unnecessarily.
- No Adaptive Bitrate (ABR) – Direct downloads deliver a single resolution/bitrate. Users on slower networks suffer buffering, while faster networks cannot benefit from higher quality.
- Poor Playback Resilience – Network fluctuations directly interrupt viewing instead of gracefully adjusting quality.
- Weak Content Protection – Entire files are easier to copy, redistribute, or pirate.
- Limited Observability – Engagement metrics, QoE signals, and ad analytics become harder to capture accurately.
A better approach is to stream videos by downloading them in small sequential chunks rather than fetching the entire file at once. So videos are Segmented into small chunks (typically 2–10 seconds) for streaming.
Media Processing / Transcoding
Modern video platforms do not store or deliver media in a single format. Instead, uploaded video and audio streams undergo a media processing (transcoding) pipeline to ensure compatibility, performance, and efficient delivery across devices and network conditions.
Once a video is uploaded, the system converts the raw media into multiple variants:
- Formats – Different container and codec combinations (e.g., MP4, WebM)
- Resolutions – 240p, 480p, 720p, 1080p, etc.
- Bitrates – Multiple quality levels for each resolution
- Audio Streams – Encoded separately at different bitrates
This step is necessary because:
- Devices support different codecs and formats
- Network conditions vary widely
- Efficient streaming requires multiple quality options

The output is a set of media segments and a manifest file describing available streams.
After a video is uploaded to object storage (e.g., S3), the system initiates an asynchronous media processing pipeline to transcode the chunks.
Transcoding & Chunking Flow
1. Raw Video Storage
- The client uploads the original video using multipart upload.
- The full-resolution raw file is stored in object storage (S3).
- Upload metadata (title, creator, status, etc.) is written to the Metadata DB.
- A processing event is published to a message broker (Kafka topic).
This decouples user uploads from heavy media processing workloads.
2. Chunker Stage
A Chunker Service consumes events from the queue and performs:
- Retrieval of the raw video from storage
- Splitting the video into small time-based segments (typically 2–10 seconds)
- Extraction of audio/video streams if required
Why chunking is necessary:
- Enables adaptive bitrate streaming
- Prevents large file transfers
- Supports parallel processing
- Improves CDN cache efficiency
Generated chunks are written back to object storage.

3. Transcoding Stage
A Transcoder Service processes chunking outputs:
- Converts segments into multiple resolutions (240p → 1080p+)
- Encodes using streaming-friendly codecs
- Produces multiple bitrate variants
- Generates manifest/playlist files (HLS / MPEG-DASH)
Key goals:
- Device compatibility
- Bandwidth adaptation
- Efficient streaming
Each resolution/bitrate combination becomes an independent stream variant.
4. Storage & Distribution
Processed assets are stored back in object storage:
- Video segments per bitrate/resolution
- Audio variants
- Manifest files
These assets are then served via CDN for low-latency global delivery.

Why This Pipeline Works Well?
- Asynchronous → Upload latency unaffected
- Horizontally scalable → Chunker/transcoder workers scale independently
- Fault tolerant → Failed jobs can be retried
- Streaming optimized → Supports ABR protocols
- Storage efficient → Raw + processed assets managed separately
This design is foundational for large-scale video platforms.
Adaptive Bitrate (ABR) Streaming
Adaptive bitrate streaming allows the client to dynamically adjust video quality during playback based on:
- Available network bandwidth
- Device performance and screen resolution
- Real-time playback conditions
Instead of downloading an entire video at one quality level, the player requests small segments at the most suitable bitrate.
Key Characteristics
- Quality can increase or decrease seamlessly
- Playback continues without restarting
- Minimizes buffering under fluctuating networks
Common ABR Protocols
Two dominant protocols power adaptive streaming:
- HLS (HTTP Live Streaming)
- MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
Both operate by:
- Splitting media into small time-based chunks
- Providing multiple bitrate/resolution variants
- Allowing the client to switch streams dynamically
Why ABR Is Critical
- Ensures smooth playback across network conditions
- Optimizes bandwidth consumption
- Improves startup latency and user experience
- Supports a wide range of devices and capabilities
ABR streaming is a foundational requirement for any large-scale video delivery system.
Upload Chunks vs Download Chunks
While both uploads and downloads may split data into smaller pieces, they serve very different purposes and follow different design constraints.
Upload Chunks (Multipart Upload)
Goal: Ensure reliable and efficient transfer of large files from client → storage.
- The client splits the file into chunks
- Chunks are uploaded independently
- Failed chunks can be retried individually
- Uploads may run in parallel
- Storage assembles parts after completion
Typical Chunk Size
- Usually larger chunks (e.g., 5–100 MB)
- Chosen to balance:
- Network overhead (too small → inefficient)
- Retry cost (too large → expensive failures)
Optimizes for
- Failure recovery
- Transfer efficiency
- Backend load reduction
Common use cases → Video uploads, large file ingestion.
Download Chunks (Adaptive Streaming)
Goal: Deliver media efficiently for smooth playback.
- Videos are pre-segmented by the system
- Chunks represent short time-based segments
- Client requests chunks on demand
- Supports adaptive bitrate switching (ABR)
- Playback continues without restarting
Typical Chunk Size
- Usually small segments (2–10 seconds of video)
- Size varies based on bitrate:
- Lower bitrate → smaller files
- Higher bitrate → larger files
Why small segments?
- Faster startup
- Quick quality adaptation
- Reduced buffering impact
Optimizes for
- Low latency playback
- Bandwidth efficiency
- Network variability
Common use cases → HLS / MPEG-DASH streaming.
Key Difference
- Upload chunking → Reliability & transfer optimization
- Download chunking → Playback & user experience optimization
Even though both use "chunks," their objectives, behavior, and sizing strategies are fundamentally different.
Manifest File
The manifest file (also called a playlist or index file) is simply a small static file containing streaming metadata. It is the control document that instructs the video player how to stream a video.
Instead of downloading a single large file, the player first retrieves this lightweight metadata file, which describes:
- Available resolutions
- Bitrates
- Formats / codecs
- Ordered list of chunk URLs
Example structure:
{
"video_id": "yt12345",
"title": "System Design Basics",
"streams": [
{
"resolution": "1080p",
"bitrate": "3000kbps",
"format": "mp4",
"codec": "h264",
"chunks": [
"https://cdn.youtube.com/yt12345/1080p/1.mp4",
"https://cdn.youtube.com/yt12345/1080p/2.mp4"
]
},
{
"resolution": "720p",
"bitrate": "2000kbps",
"format": "mp4",
"codec": "h264",
"chunks": [
"https://cdn.youtube.com/yt12345/720p/1.mp4",
"https://cdn.youtube.com/yt12345/720p/2.mp4"
]
},
{
"resolution": "480p",
"bitrate": "1000kbps",
"format": "webm",
"codec": "vp9",
"chunks": [
"https://cdn.youtube.com/yt12345/480p/1.webm",
"https://cdn.youtube.com/yt12345/480p/2.webm"
]
}
]
}
Why the Manifest Exists
The manifest is the key enabler of adaptive bitrate (ABR) streaming:
- The player dynamically selects video quality based on available bandwidth
- Quality can switch without restarting playback
- Bandwidth usage becomes efficient and network-aware
- Playback startup latency is significantly reduced
Without a manifest, adaptive streaming is not possible.
Where Is the Manifest Stored?
The manifest is generated by the transcoding and packaging pipeline and stored alongside video chunks in Object Storage (Origin Layer) and can be cached in CDN or edge servers. Manifest File URL can be stored in our metadata database.
Object storage is well-suited because it offers:
- High durability
- Low cost for static assets
- Simple versioning and regeneration
- Excellent compatibility with CDN caching
Manifests rarely change once created.
How the Client Uses Manifest File
Typical streaming sequence:
- Player/Client requests the manifest file from CDN, if not found in CDN then fetched from Object Storage
- Manifest describes available stream variants
- Player selects the optimal rendition
- Player downloads chunks progressively
