Learn System Design | sauravgpt.in

Concept Overview

Scalability is the ability of a system to cope with increased load by adding resources. It is not just a single number; it is a multi-dimensional challenge. When an interviewer asks to "Design YouTube," the first question must always be: "At what scale?"

A system designed for 1,000 users (MVP) looks fundamentally different from a system designed for 100 million users (Enterprise).

The Four Dimensions of Scale

User Scale: Number of Daily Active Users (DAU). (e.g., 10k vs 1B users).
Request Scale: Throughput in Requests Per Second (RPS). (e.g., 100 RPS vs 1M RPS).
Data Scale: Volume of storage required. (e.g., 100 GB vs 10 PB).
Growth Rate: How fast is the traffic increasing? (Linear growth vs. Viral spike).

Scale Multipliers

Inefficiencies multiply at scale. A wasted 10ms database query is negligible at 1 RPS. At 100,000 RPS, that same inefficiency burns through 1,000 CPU cores of computing power constantly.

Vertical vs. Horizontal Scaling

There are two fundamental ways to scale any system.

Vertical Scaling (Scaling Up)

Adding more power (CPU, RAM, Disk) to an existing single server.

Analogy: Upgrading from a Toyota Corolla to a Ferrari.
Pros: Simple. No code changes required.
Cons: Expensive (Diminishing returns). Has a hard hardware limit (e.g., 128 cores is the max). Single point of failure.

Horizontal Scaling (Scaling Out)

Adding more servers to a pool of resources.

Analogy: Replacing a Ferrari with 100 Toyota Corollas to transport more people.
Pros: Limitless theoretical scale. Uses commodity hardware (cheaper). Built-in redundancy.
Cons: Complex. Requires load balancing, data partitioning (sharding), and distributed coordination.

Comparison Matrix

Feature	Vertical Scaling	Horizontal Scaling
Complexity	Low (Plug & Play)	High (Distributed Systems)
Cost	Exponentially High	Linear / Cost-Effective
Limit	Hardware Ceiling	Virtually Unlimited
Failure Impact	High (Single Machine Down)	Low (One node of many)

Read Scaling: Use Read Replicas. One Primary accepts writes, multiple Replicas serve reads.
Write Scaling: Use Sharding (Partitioning). Split data across multiple database nodes based on a key (e.g., user_id). Or use NoSQL (DynamoDB/Cassandra) which shards automatically.

2. Caching Strategy

At scale, the "fastest query is the one you don't make."

Cache Hit Ratio: Aim for >95%. If 100M users hit the DB directly, it will crash.
Layers: Browser Cache -> CDN (Edge) -> API Gateway Cache -> Application Cache (Redis).

3. Asynchronous Processing

Synchronous operations block resources. At scale, move heavy lifting to the background.

Pattern: Instead of processing a video upload immediately (Client waits for 5 mins), upload to S3, push a message to a Queue (Kafka/SQS), and let a Worker pool process it later.

Real-World Scaling Scenarios

Scale: 10M DAU, generic text/image posts.
Constraint: Rapid growth, unpredictable spikes.
Design:
- Datastore: NoSQL (Cassandra/DynamoDB) for infinite horizontal write scaling.
- Compute: Serverless (AWS Lambda) or Kubernetes Autoscaling to handle viral spikes instantly.

Scenario B: Payment Processing (High Consistency)

Scale: 1M Transactions/sec.
Constraint: Zero data loss, Strong Consistency.
Design:
- Datastore: Sharded SQL (MySQL/PostgreSQL) or NewSQL (Spanner). We cannot trade consistency for scale here. Use "Sticky Sessions" or "Consistent Hashing" to route users to specific shards.

You have a system handling 50k requests per second. The CPU usage on your database primary is at 100%, causing timeouts. The application servers are at 10% CPU. What is the most effective immediate step to scale?

Common Scaling Mistakes

Premature Optimization: Building complex sharding for a startup with 100 users. Start Monolithic, then refactor.
Ignoring Data Archival: Keeping 10 years of logs in your "Hot" database. Move old data to "Cold" storage (S3/Glacier) to keep indexes small and fast.
The "All Scale is Equal" Fallacy: 100M IoT devices sending 1-byte heartbeats is arguably easier to handle than 1M users uploading 4K video. Data Volume vs Count matters.

A photo-sharing app is growing fast. Users complain that uploading photos is becoming slower and slower. You check the logs and see the 'Image Resize' function is taking 5+ seconds and blocking the main server thread. What is the best architectural fix?

System Scale: Designing for Growth & High Load