Blog/System Design Interview Cheat Sheet: The Framework Every Senior Engineer Uses (2026)
🏗️
system designsoftware engineer interviewdistributed systemssenior engineer

System Design Interview Cheat Sheet: The Framework Every Senior Engineer Uses (2026)

The 6-step system design framework with time allocation, capacity estimation formulas, top 10 must-know designs, and component tradeoffs that impress senior interviewers in 2026.

CareerLift Team·June 16, 2026·17 min read

System design interviews are where strong coders often fall apart. You can solve hard LeetCode problems cold, but put you in front of "design Twitter" with 45 minutes and no test cases, and the open-endedness is paralyzing. There's no compiler to tell you if you're right. There's no single correct answer. The interviewer is watching how you think, not waiting to grade your final diagram.

This guide gives you the exact framework that senior engineers use to structure a 45-minute system design interview — with time allocations, capacity estimation formulas you can apply immediately, the 10 designs every candidate must know, and a tradeoffs reference you can internalize before your next interview.


Why System Design Trips Up Strong Coders

Coding interviews have a clear contract: here is a problem, write a function, here are the test cases. System design interviews have almost none of that structure.

Strong coders fail system design interviews for three predictable reasons:

They dive into components before establishing scope. A candidate who starts drawing boxes labeled "load balancer → app server → database" without first asking "how many users are we designing for?" signals to the interviewer that they don't understand that scale drives architecture.

They optimize for cleverness over clarity. System design rewards structured communication more than encyclopedic knowledge. An interviewer would rather watch you reason through a cache invalidation tradeoff than hear you name-drop Kafka without explaining why you'd use it.

They run out of structure when the depth increases. Without a framework, candidates cover the easy surface-level stuff (load balancer, CDN, database) and then stall when the interviewer asks to go deeper. A framework gives you something to fall back on when you don't immediately know the answer.

The fix is a repeatable 6-step process. Once it's automatic, you stop spending cognitive load on "what should I do next" and spend it on "what's the right tradeoff here."


The 6-Step Framework (45-Minute Version)

Step 1: Clarify Requirements (5 minutes)

Never start designing until you've established what you're building and at what scale.

Functional requirements are what the system does. "Users can post tweets" is a functional requirement. "Users can follow other users" is a functional requirement.

Non-functional requirements are how the system performs. Latency targets, availability (99.9% vs. 99.99%), consistency guarantees, read/write ratio.

Scale assumptions are the numbers that will drive every design decision. Always state them explicitly:

  • How many daily active users (DAU)?
  • How many requests per second (QPS) at peak?
  • What's the read/write ratio?
  • How much data are we storing? For how long?

A strong opening sounds like: "I'll assume 100M DAU, with each user performing roughly 10 actions per day — so about 1,000 requests per second at average load and maybe 3x that at peak, around 3,000 QPS. Reads will dominate — I'll assume a 10:1 read/write ratio. Does that sound right for what you have in mind?"

Getting the interviewer to confirm or correct your assumptions is part of the exercise.

Step 2: Capacity Estimation (5 minutes)

Back-of-envelope math shows the interviewer you can reason about scale. You don't need precise numbers — you need order-of-magnitude estimates that inform your design choices.

QPS estimation:

QPS = (DAU × requests_per_user_per_day) / 86,400 seconds

Examples:

  • 1M DAU, 1 request/day → ~12 QPS
  • 10M DAU, 10 requests/day → ~1,160 QPS
  • 500M DAU, 20 requests/day → ~115,740 QPS (116K QPS)

Storage estimation:

Storage_per_day = requests_per_day × avg_payload_size
Total_storage = Storage_per_day × retention_days

Example for a tweet system:

  • 10M tweets/day × 300 bytes/tweet = 3 GB/day
  • 3 GB/day × 365 days × 5 years = ~5.5 TB over 5 years

Bandwidth estimation:

Bandwidth = QPS × avg_response_size

Example:

  • 1,000 QPS × 1 KB/response = 1 MB/s (trivial)
  • 100,000 QPS × 50 KB/response (images) = 5 GB/s (requires CDN)

Read/write ratio implications:

A 10:1 read/write ratio tells you to optimize for reads — caching, read replicas, CDN for static content. A near 1:1 ratio (like a logging system) tells you write throughput is the bottleneck.

State these numbers explicitly and reference them when you make design choices. "Given our 100K QPS read load, I'll add a caching layer here" is more compelling than "I'll add a cache because caches are good."

Step 3: High-Level Design (10 minutes)

Draw the major components and the data flow between them. At this stage, don't go deep on any single component — cover the full system.

A typical high-level design includes:

  • Client (mobile, browser)
  • Load balancer / API gateway
  • Application servers (stateless)
  • Primary database
  • Cache layer
  • Object storage (for media)
  • CDN (for static assets and media delivery)
  • Message queue (if async processing is needed)

Define your core APIs. For a news feed system:

GET /v1/feed?user_id={id}&cursor={cursor}&limit=20
POST /v1/posts { user_id, content, media_ids[] }
POST /v1/follow { follower_id, followee_id }

The interviewer is checking that you understand the full picture before zooming in. A common mistake is spending the entire deep-dive time on the database schema without covering the read path at all.

Step 4: Deep Dive (15 minutes)

This is where you earn your level. The deep dive is where senior engineers separate from mid-level engineers. Pick the two or three most interesting technical challenges in your design and go deep on each one.

Typical deep-dive areas:

  • Database schema: Which tables? Which indexes? What are the query patterns?
  • Caching strategy: What do you cache? What's the eviction policy? How do you handle cache invalidation?
  • Feed generation: Push vs. pull vs. hybrid? How do you handle celebrities with 50M followers?
  • Search: Inverted index? Elasticsearch? How does it stay in sync with the primary DB?
  • Consistency model: Strong vs. eventual? What does a user see if they post and immediately refresh?

Don't wait for the interviewer to drag you here. After your high-level design, say: "The most interesting technical problem here is the feed fan-out for high-follower accounts — let me go deep on that." Driving the deep dive shows senior-level initiative.

Step 5: Scale It (7 minutes)

Now take your design and stress-test it. Walk through the bottlenecks and explain how you'd address each one.

Common scaling techniques:

  • Sharding: Partition the database by user_id (horizontal partitioning). Discuss the shard key choice and hotspot risks.
  • Read replicas: Add read replicas for read-heavy workloads. Discuss replication lag implications.
  • CDN: Move static and media content to a CDN. Discuss cache-control headers and invalidation.
  • Load balancing: Layer-7 load balancing (application-aware routing) vs. Layer-4 (TCP/IP level).
  • Async processing: Use a message queue (Kafka, SQS) to decouple write-heavy operations from the critical path.
  • Horizontal scaling: Stateless application servers can scale horizontally. Databases and caches require more care.

Flag the tradeoffs you're accepting. "Sharding by user_id means cross-shard queries — like finding all posts across a set of users — are expensive. I'd denormalize the feed so we never need to do that cross-shard."

Step 6: Review Tradeoffs (3 minutes)

End by briefly acknowledging what you'd do differently with more time or resources. This signals intellectual honesty and senior-level thinking.

Good closing statements:

  • "If I had more time, I'd think harder about the consistency model for the cache — right now we can show stale data for up to 60 seconds after a post is deleted."
  • "The shard key I chose optimizes for write throughput, but it means reads for a user's full history are slower. With more time I'd explore a dual-write approach."
  • "I haven't addressed multi-region deployment. For global scale, we'd need to think about data residency requirements and the latency vs. consistency tradeoff across regions."

Capacity Estimation Reference

Memorize these numbers. They come up in nearly every system design interview.

| Unit | Value | |---|---| | 1 KB | 1,000 bytes (or 1,024 for storage) | | 1 MB | 1,000 KB | | 1 GB | 1,000 MB | | 1 TB | 1,000 GB | | Seconds in a day | 86,400 | | Seconds in a month | 2,592,000 (~2.6M) | | 1M DAU, 1 req/day | ~12 QPS | | 10M DAU, 10 req/day | ~1,160 QPS | | 100M DAU, 10 req/day | ~11,600 QPS | | 1B DAU, 10 req/day | ~116,000 QPS | | 1 tweet (text) | ~300 bytes | | 1 photo (compressed) | ~300 KB | | 1 minute of video (720p) | ~50 MB | | SSD read latency | ~0.1 ms | | Network roundtrip (same DC) | ~0.5 ms | | Network roundtrip (cross-region) | ~150 ms |


Top 10 System Designs Every Candidate Must Know

1. URL Shortener (e.g., bit.ly)

Core challenge: Generating unique short codes at high write throughput and serving redirects at very high read QPS.

Key components:

  • Hash function or base-62 encoding to generate 6–8 character codes
  • Database with index on short code (primary key)
  • Cache (Redis) for hot URLs — a small percentage of URLs get most of the traffic
  • 301 (permanent) vs. 302 (temporary) redirect — 301 is cached by browsers, 302 allows tracking
  • Analytics pipeline (async, via message queue) for click tracking

2. Twitter/X News Feed

Core challenge: Generating a personalized, ranked feed for 100M+ users where some users follow accounts with 50M+ followers.

Key components:

  • Fan-out on write (precompute feeds) for regular users
  • Fan-out on read (pull at request time) for celebrity accounts (>1M followers)
  • Hybrid approach: push to followers under a threshold, pull + merge for high-follower accounts
  • Redis sorted sets for feed storage (score = timestamp)
  • Ranking model runs on the precomputed candidate set

3. Instagram (Photo Uploads + Feed)

Core challenge: Handling large binary uploads, generating thumbnails, delivering media globally, and building a feed similar to Twitter's.

Key components:

  • Upload flow: client uploads to object storage (S3) directly via presigned URL, then notifies app server
  • Media processing pipeline: async transcoding/thumbnail generation via message queue + worker pool
  • CDN for photo delivery — photos are immutable, so caching is straightforward
  • Feed generation: same fan-out model as Twitter, with photo metadata stored in DB

4. WhatsApp / Messaging System

Core challenge: Reliable message delivery with exactly-once semantics, real-time presence, and support for large group chats.

Key components:

  • WebSocket connections for real-time delivery; fall back to long polling
  • Message store: Cassandra or HBase (write-heavy, time-series access pattern)
  • Message delivery states: sent → delivered → read receipts
  • Presence service: heartbeat-based, stored in Redis with TTL
  • Group messages: fan-out to all group members at send time vs. pull at read time

5. YouTube / Video Streaming

Core challenge: Large file uploads, transcoding into multiple resolutions, adaptive bitrate streaming, and global delivery.

Key components:

  • Upload: chunked upload to object storage, with resumable uploads for large files
  • Transcoding: async worker pool converts to multiple resolutions (360p, 720p, 1080p, 4K) and formats (H.264, H.265, AV1)
  • Adaptive bitrate streaming: HLS or DASH — client requests the right chunk size based on bandwidth
  • CDN: pre-warm popular content; serve long-tail from origin
  • Metadata service: separate from the video blob, stores title/description/views/comments

6. Uber / Ride-Sharing

Core challenge: Real-time location tracking for millions of drivers, matching riders to nearby drivers with low latency, and dynamic pricing.

Key components:

  • Driver location: drivers send GPS updates every 4 seconds; stored in Redis geospatial index (GEOADD)
  • Matching: query nearby drivers (GEORADIUS), filter by availability, dispatch best match
  • Ride state machine: request → accepted → en route → arrived → in progress → completed
  • Surge pricing: computed from supply/demand ratio per geohash cell
  • Separate read path (location queries) from write path (location updates) for scale

7. Rate Limiter

Core challenge: Enforcing per-user or per-IP request limits across a distributed fleet of servers.

Key components:

  • Algorithms: Token bucket (smooth bursts), sliding window log (exact but memory-intensive), sliding window counter (approximate but efficient)
  • Redis for distributed state: INCR + EXPIRE for fixed window; sorted sets for sliding window log
  • Placement: API gateway is most common; can also be per-service middleware
  • Failure mode: if Redis is unavailable, fail open (allow requests) or fail closed (reject requests) — document the choice
  • Clock skew: a problem with distributed sliding windows; use Redis's server time (TIME command)

8. Distributed Cache (Redis-like)

Core challenge: Building a cache that is fast, consistent across nodes, and handles eviction and failure gracefully.

Key components:

  • Consistent hashing for distributing keys across nodes (minimizes rehashing when nodes are added/removed)
  • Eviction policies: LRU, LFU, TTL-based
  • Replication: primary/replica for availability; replica serves reads
  • Data structures: string, hash, list, set, sorted set — each with specific use cases
  • Persistence: RDB snapshots vs. AOF (append-only file) — durability vs. performance tradeoff

9. Search Autocomplete

Core challenge: Returning the top-k completions for a prefix in under 100ms, updated with real-world query frequency.

Key components:

  • Trie data structure for prefix matching (in-memory for hot prefixes)
  • Offline pipeline: aggregate query logs → compute top-k per prefix → push to cache
  • Storage: Redis sorted sets (score = frequency) for top-k queries per prefix
  • Two-tier caching: in-process cache for the most common prefixes, Redis for the long tail
  • Personalization layer (optional): blend global rankings with user-specific query history

10. Notification System

Core challenge: Delivering push, in-app, email, and SMS notifications reliably at high throughput with low latency.

Key components:

  • Notification types: push (APNs, FCM), email (SendGrid, SES), SMS (Twilio), in-app
  • Message queue: each notification type has its own consumer group to allow independent scaling
  • User preferences service: stores per-user opt-in/opt-out for each notification type
  • Deduplication: prevent duplicate notifications (idempotency key per notification)
  • Retry and dead-letter queue: failed deliveries are retried with exponential backoff; undeliverable after N attempts go to a dead-letter queue for investigation

Common Components: Tradeoffs Reference

SQL vs. NoSQL

| Dimension | SQL (PostgreSQL, MySQL) | NoSQL (Cassandra, DynamoDB, MongoDB) | |---|---|---| | Schema | Fixed, enforced | Flexible, schemaless | | Consistency | Strong (ACID) | Eventual (BASE) by default | | Query flexibility | High (JOINs, aggregations) | Low (design around access patterns) | | Horizontal scaling | Difficult (sharding is complex) | Built-in | | Best for | User accounts, transactions, relationships | Time-series, high-write throughput, wide-column data |

Rule of thumb: Start with SQL. Switch to NoSQL when you have a specific access pattern that SQL can't serve at your scale.

Cache Patterns

| Pattern | How it works | Best for | Downside | |---|---|---|---| | Cache-aside (lazy loading) | App checks cache first; on miss, loads from DB and populates cache | Read-heavy, tolerant of stale data | Cache miss penalty; potential for thundering herd | | Write-through | Write to cache and DB synchronously on every write | Read-heavy, cannot tolerate stale data | Write latency increases; cache may fill with rarely-read data | | Write-back (write-behind) | Write to cache only; async flush to DB | Write-heavy, can tolerate some data loss | Risk of data loss if cache node fails before flush |

Horizontal vs. Vertical Scaling

| Approach | How | When to use | Limit | |---|---|---|---| | Vertical scaling | Bigger machine (more CPU, RAM) | When your bottleneck is a single process | Physical hardware limits; single point of failure | | Horizontal scaling | More machines, stateless design | Web/app tier, stateless services | Requires stateless design; adds operational complexity | | Database sharding | Partition data across multiple DB instances | When a single DB can't handle write throughput | Cross-shard queries become expensive |

Synchronous vs. Asynchronous Communication

| Approach | How | When to use | Tradeoff | |---|---|---|---| | Synchronous (HTTP/gRPC) | Caller waits for response | When the caller needs the result immediately | Tight coupling; cascading failures | | Async (message queue) | Producer publishes; consumer processes later | Background jobs, fan-out, non-critical paths | Eventual consistency; harder to debug; added infrastructure |


What Distinguishes a Senior-Level Answer

Mid-level and senior-level candidates often know the same components. What separates them is how they reason.

Mid-level answer: "I'd use a cache here to reduce database load."

Senior-level answer: "I'd use a write-through cache here because we can't tolerate stale reads — a user who updates their profile photo expects to see it immediately. The tradeoff is that every write now takes two round trips. Given our write volume is relatively low — maybe 10,000 profile updates per day — that's an acceptable cost. If our write volume were orders of magnitude higher, I'd reconsider."

The difference is specificity about the tradeoff, justification based on the actual numbers you established, and acknowledgment of what you're giving up.

Senior engineers also proactively address failure modes. Mid-level: "I'll add a load balancer." Senior: "I'll add a load balancer, but I need to make sure our session data isn't stored on individual app servers — otherwise sticky sessions break failover. I'll move session state to Redis."


Code Block: REST API Design Example (URL Shortener)

A common ask in system design rounds is to sketch out your API. Here's what a clean API design looks like for a URL shortener:

# Create a short URL
POST /api/v1/urls
Content-Type: application/json

{
  "long_url": "https://www.example.com/very/long/path?query=value",
  "custom_alias": "my-link",   // optional
  "expires_at": "2027-01-01T00:00:00Z"  // optional
}

Response 201 Created:
{
  "short_url": "https://short.ly/abc123",
  "short_code": "abc123",
  "long_url": "https://www.example.com/very/long/path?query=value",
  "created_at": "2026-06-16T10:00:00Z",
  "expires_at": "2027-01-01T00:00:00Z"
}

# Redirect (handled at the edge / load balancer level)
GET /{short_code}
Response 302 Found
Location: https://www.example.com/very/long/path?query=value

# Get URL analytics (authenticated)
GET /api/v1/urls/{short_code}/stats
Response 200 OK:
{
  "short_code": "abc123",
  "total_clicks": 14823,
  "clicks_last_24h": 203,
  "top_referrers": ["google.com", "twitter.com"],
  "top_countries": ["US", "UK", "IN"]
}

# Delete a short URL (authenticated)
DELETE /api/v1/urls/{short_code}
Response 204 No Content

A few design decisions visible in this API:

  • The redirect is a 302 (temporary) rather than 301 (permanent) so we can track clicks rather than letting the browser cache the redirect.
  • Analytics are a separate endpoint, not bundled into the redirect response, because analytics writes are async and shouldn't add latency to the redirect.
  • Custom aliases are optional — the system generates one if not provided.
  • Versioning is in the URL (/v1/) for forward compatibility.

Putting It Together

The 6-step framework is only useful if it becomes automatic. Practice it by talking through designs out loud — not just drawing diagrams silently. The goal is to spend zero cognitive load on "what should I do next?" so all of it goes to "what's the right call here?"

Run through each of the top 10 designs at least once before your interview. You don't need to memorize the perfect architecture for each — you need to know the core challenge and the two or three key tradeoffs. That's enough to have a fluent, confident conversation when one of them comes up.

Use CareerLift to practice system design interviews with structured AI feedback on your framework usage, depth of tradeoffs, and communication clarity. The platform identifies which of the six steps you're spending too much or too little time on — so you can calibrate before the real interview.

Share this article:
🚀

Ready to practice?

CareerLift uses AI to simulate real interviews from Google, Meta, Amazon, and 22 more companies — calibrated to your level.

Start Free Interview Practice

Related Articles