Youtube System Design

YouTube is a globally distributed video-sharing and streaming platform that allows users to upload large video files, publish them, and provide smooth, low-latency streaming to millions of concurrent viewers. The system must support:

Video upload and storage

Video processing and transcoding

Efficient and scalable streaming

Search and discovery

Personalization and recommendations

High availability and fault tolerance

YouTube must handle massive scale: millions of uploads per day and billions of views with low latency and high reliability.

Functional Requirements

A YouTube-like system must support the following core functions:

Video Uploading – Users can upload videos (potentially hundreds of MBs to GBs).

Video Storage & Encoding – Videos are stored durably and transcoded into multiple formats.

Video Streaming – Users can stream videos with adaptive bitrate support (e.g., HLS or MPEG-DASH).

User Interaction – Users can watch, like, comment, subscribe, and interact with video content.

Search – Users can search for videos based on metadata (title, tags, description).

Recommendations – Personalized video suggestions delivered based on viewing history and behavior.

In this discussion we are currently covering these main topics:

Video Uploading
Video Storage & Encoding
Video Streaming

Non-Functional Requirements

The system must satisfy:

High availability & reliability – No single point of failure, resilient to outages.

Low latency – Fast response times for streaming and search ( latency < 500 ms).

Scalability – Support millions of uploads and hundreds of millions of views daily.

Durability – Store petabytes of data reliably.

Cost efficiency – Use caching and CDNs to reduce backend load.

Scale

100 million daily active users
1 million daily uploads
400 million daily watch
Max video size of youtube is 256 GB

Calculation:

1 million daily uploads = 10^6/10^5 uploads/sec = 10 TPS

400 million watch = 400 * 10^6 /10^5 watch/sec = 4k QPS

1 million daily uploads, assume avg video size = 500 MB = 10^6 * 500 MB = 500 TB/day = 500 *365 TB/year = approx 500 * 400 = 200k TB = 200 PB

Entities

Major entities in the system would be:

User

Video

VideoMetadata

API

Upload a Video

POST /videos/upload

video, videoMetadata

Watch a Video

GET /videos/videoId/watch

High Level Design

At a conceptual level, a video platform such as YouTube can be understood through two primary user journeys:

Upload Flow – Getting videos into the system
Watch Flow – Delivering videos efficiently to viewers

These flows are supported by gateway layers, storage systems, metadata management, and global delivery infrastructure.

Core Components

Client Applications

Users interact with the system through:

Web browsers
Mobile applications
Smart TVs and other devices

Two key interfaces drive the platform:

Upload Page – Handles large video uploads
Watch Page – Streams video content to viewers

API Gateway

All client requests pass through the API Gateway, which acts as the entry point to the system.

Responsibilities include:

Request routing (Upload, Watch, Metadata, etc.)
Authentication and rate limiting
TLS termination
Protection of internal services

This layer prevents clients from directly accessing backend services.

Upload Service

When a user uploads a video:

The client sends a request via the API Gateway
The Upload Service stores the video in object storage like Google Cloud Storage or S3.
Video Metadata( like title, description, author etc.) is stored in some metadata database.

Video metadata table is shown in the image where id, title, creator_id, description, channel, tags, video_url etc. will be stored.

A Content Delivery Network (CDN) is essential for YouTube to deliver billions of videos globally without lag, buffering, or server crashes. By caching content on edge servers close to users, it minimizes latency, reduces the load on central servers, and enables high-quality, seamless streaming across different devices and internet speeds. So we also store the videos in CDN or edge caches for faster access.

Deep Dive: Uploading Large Videos

Video files are large (100MB to several GB).
Uploading through your app server:
- Increases memory & CPU usage
- Consumes bandwidth
If 100 users upload at once, your server must handle concurrent large streams
Adds unnecessary network hops and delay. Files must go:
- From Client to App Server first,
- Then from App Server to Blob Storage (like S3)
API gateway has hard limit of payload size of 10 MB

So we cannot upload large files via Upload Server.

Pre-Signed URL

Better approach: client directly uploads in blob storage. But for that client needs to have proper permission to upload directly in Blob Storage. This is handled by pre-signed url.

The client sends a request to the upload API.
The upload service gets one pre-signed URL from the object storage and passes it to the client. Then client can directly upload video into object storage (blob store), avoiding heavy application server load.

Multi-part upload

Object storage also will have some upload size limit. Like for S3 object size > 100 MB will not be supported.

=> We should consider breaking the video in chunks/parts and upload in parts, which is called Multi-Part Upload.

Multipart upload splits a large file into smaller parts for efficient and reliable transfer.

Multipart Upload – Overview

Multipart upload is a strategy for handling large file transfers by breaking a file into smaller, independent chunks. Instead of sending a single massive request, the client uploads multiple parts, making the process faster, more reliable, and easier to recover from failures.

High-Level Flow

Upload Initiation
The client asks the backend to start an upload. The backend coordinates with object storage and receives an upload_id, which uniquely tracks the upload session.
Signed URL Generation
The backend decides number of parts needed (from filesize metadata) and generates pre-signed URLs for each file part. These URLs allow the client to upload directly to object storage without routing large payloads through the application servers.
Parallel Part Uploads
The client splits the file into chunks and uploads them independently. Parts can be uploaded in parallel and retried individually if any network failure occurs.
Upload Completion
After all parts are successfully uploaded, the client signals completion. Object storage validates the parts and assembles them into the final file.

Why Multipart Upload Is Important

Failure Resilience – Only failed parts are retried, not the entire file
Performance – Parallel uploads significantly reduce total transfer time
Scalability – Backend services avoid becoming bandwidth bottlenecks
Efficiency – Ideal for large media, video, and data-heavy systems

Deep Dive: Streaming Videos

Why Direct Video Downloads Are Not Ideal?

Serving large videos as direct downloads creates serious performance, reliability, and user-experience problems:

Slow Start Time – Users must wait for a large portion of the file to download before playback begins, which is unacceptable for multi-GB videos.
Storage & Memory Constraints – The client effectively needs sufficient buffer/storage capacity, making playback fragile on constrained devices.
Wasted Bandwidth – If the user stops watching midway, significant data may have already been transferred unnecessarily.
No Adaptive Bitrate (ABR) – Direct downloads deliver a single resolution/bitrate. Users on slower networks suffer buffering, while faster networks cannot benefit from higher quality.
Poor Playback Resilience – Network fluctuations directly interrupt viewing instead of gracefully adjusting quality.
Weak Content Protection – Entire files are easier to copy, redistribute, or pirate.
Limited Observability – Engagement metrics, QoE signals, and ad analytics become harder to capture accurately.

A better approach is to stream videos by downloading them in small sequential chunks rather than fetching the entire file at once. So videos are Segmented into small chunks (typically 2–10 seconds) for streaming.

Media Processing / Transcoding

Modern video platforms do not store or deliver media in a single format. Instead, uploaded video and audio streams undergo a media processing (transcoding) pipeline to ensure compatibility, performance, and efficient delivery across devices and network conditions.

Once a video is uploaded, the system converts the raw media into multiple variants:

Formats – Different container and codec combinations (e.g., MP4, WebM)
Resolutions – 240p, 480p, 720p, 1080p, etc.
Bitrates – Multiple quality levels for each resolution
Audio Streams – Encoded separately at different bitrates

This step is necessary because:

Devices support different codecs and formats
Network conditions vary widely
Efficient streaming requires multiple quality options

The output is a set of media segments and a manifest file describing available streams.

After a video is uploaded to object storage (e.g., S3), the system initiates an asynchronous media processing pipeline to transcode the chunks.

Transcoding & Chunking Flow

1. Raw Video Storage

The client uploads the original video using multipart upload.
The full-resolution raw file is stored in object storage (S3).
Upload metadata (title, creator, status, etc.) is written to the Metadata DB.
A processing event is published to a message broker (Kafka topic).

This decouples user uploads from heavy media processing workloads.

2. Chunker Stage

A Chunker Service consumes events from the queue and performs:

Retrieval of the raw video from storage
Splitting the video into small time-based segments (typically 2–10 seconds)
Extraction of audio/video streams if required

Why chunking is necessary:

Enables adaptive bitrate streaming
Prevents large file transfers
Supports parallel processing
Improves CDN cache efficiency

Generated chunks are written back to object storage.

3. Transcoding Stage

A Transcoder Service processes chunking outputs:

Converts segments into multiple resolutions (240p → 1080p+)
Encodes using streaming-friendly codecs
Produces multiple bitrate variants
Generates manifest/playlist files (HLS / MPEG-DASH)

Key goals:

Device compatibility
Bandwidth adaptation
Efficient streaming

Each resolution/bitrate combination becomes an independent stream variant.

4. Storage & Distribution

Processed assets are stored back in object storage:

Video segments per bitrate/resolution
Audio variants
Manifest files

These assets are then served via CDN for low-latency global delivery.

Why This Pipeline Works Well?

Asynchronous → Upload latency unaffected
Horizontally scalable → Chunker/transcoder workers scale independently
Fault tolerant → Failed jobs can be retried
Streaming optimized → Supports ABR protocols
Storage efficient → Raw + processed assets managed separately

This design is foundational for large-scale video platforms.

Adaptive Bitrate (ABR) Streaming

Adaptive bitrate streaming allows the client to dynamically adjust video quality during playback based on:

Available network bandwidth
Device performance and screen resolution
Real-time playback conditions

Instead of downloading an entire video at one quality level, the player requests small segments at the most suitable bitrate.

Key Characteristics

Quality can increase or decrease seamlessly
Playback continues without restarting
Minimizes buffering under fluctuating networks

Common ABR Protocols

Two dominant protocols power adaptive streaming:

HLS (HTTP Live Streaming)
MPEG-DASH (Dynamic Adaptive Streaming over HTTP)

Both operate by:

Splitting media into small time-based chunks
Providing multiple bitrate/resolution variants
Allowing the client to switch streams dynamically

Why ABR Is Critical

Ensures smooth playback across network conditions
Optimizes bandwidth consumption
Improves startup latency and user experience
Supports a wide range of devices and capabilities

ABR streaming is a foundational requirement for any large-scale video delivery system.

Upload Chunks vs Download Chunks

While both uploads and downloads may split data into smaller pieces, they serve very different purposes and follow different design constraints.

Upload Chunks (Multipart Upload)

Goal: Ensure reliable and efficient transfer of large files from client → storage.

The client splits the file into chunks
Chunks are uploaded independently
Failed chunks can be retried individually
Uploads may run in parallel
Storage assembles parts after completion

Typical Chunk Size

Usually larger chunks (e.g., 5–100 MB)
Chosen to balance:
- Network overhead (too small → inefficient)
- Retry cost (too large → expensive failures)

Optimizes for

Failure recovery
Transfer efficiency
Backend load reduction

Common use cases → Video uploads, large file ingestion.

Download Chunks (Adaptive Streaming)

Goal: Deliver media efficiently for smooth playback.

Videos are pre-segmented by the system
Chunks represent short time-based segments
Client requests chunks on demand
Supports adaptive bitrate switching (ABR)
Playback continues without restarting

Typical Chunk Size

Usually small segments (2–10 seconds of video)
Size varies based on bitrate:
- Lower bitrate → smaller files
- Higher bitrate → larger files

Why small segments?

Faster startup
Quick quality adaptation
Reduced buffering impact

Optimizes for

Low latency playback
Bandwidth efficiency
Network variability

Common use cases → HLS / MPEG-DASH streaming.

Key Difference

Upload chunking → Reliability & transfer optimization
Download chunking → Playback & user experience optimization

Even though both use "chunks," their objectives, behavior, and sizing strategies are fundamentally different.

Manifest File

The manifest file (also called a playlist or index file) is simply a small static file containing streaming metadata. It is the control document that instructs the video player how to stream a video.
Instead of downloading a single large file, the player first retrieves this lightweight metadata file, which describes:

Available resolutions
Bitrates
Formats / codecs
Ordered list of chunk URLs

Example structure:

{
  "video_id": "yt12345",
  "title": "System Design Basics",
  "streams": [
    {
      "resolution": "1080p",
      "bitrate": "3000kbps",
      "format": "mp4",
      "codec": "h264",
      "chunks": [
        "https://cdn.youtube.com/yt12345/1080p/1.mp4",
        "https://cdn.youtube.com/yt12345/1080p/2.mp4"
      ]
    },
    {
      "resolution": "720p",
      "bitrate": "2000kbps",
      "format": "mp4",
      "codec": "h264",
      "chunks": [
        "https://cdn.youtube.com/yt12345/720p/1.mp4",
        "https://cdn.youtube.com/yt12345/720p/2.mp4"
      ]
    },
    {
      "resolution": "480p",
      "bitrate": "1000kbps",
      "format": "webm",
      "codec": "vp9",
      "chunks": [
        "https://cdn.youtube.com/yt12345/480p/1.webm",
        "https://cdn.youtube.com/yt12345/480p/2.webm"
      ]
    }
  ]
}

Why the Manifest Exists

The manifest is the key enabler of adaptive bitrate (ABR) streaming:

The player dynamically selects video quality based on available bandwidth
Quality can switch without restarting playback
Bandwidth usage becomes efficient and network-aware
Playback startup latency is significantly reduced

Without a manifest, adaptive streaming is not possible.

Where Is the Manifest Stored?

The manifest is generated by the transcoding and packaging pipeline and stored alongside video chunks in Object Storage (Origin Layer) and can be cached in CDN or edge servers. Manifest File URL can be stored in our metadata database.

Object storage is well-suited because it offers:

High durability
Low cost for static assets
Simple versioning and regeneration
Excellent compatibility with CDN caching

Manifests rarely change once created.

How the Client Uses Manifest File

Typical streaming sequence:

Player/Client requests the manifest file from CDN, if not found in CDN then fetched from Object Storage
Manifest describes available stream variants
Player selects the optimal rendition
Player downloads chunks progressively