This is a sibling to our earlier piece on how to prepare for system-design interviews. That one is about prep philosophy — the meta-framework. This one is about the 30 minutes that actually happen in the interview room. If you've read the books, watched the videos, drilled the archetypes, and the interview still goes sideways, the gap is usually not what you know — it's the order you do things in.
System-design interviews are 45 to 60 minutes. You have somewhere between 30 and 45 minutes of actual design time once you subtract introductions and the interviewer's wrap-up questions. The candidates who do well in that window aren't the ones with the best memorised reference architectures. They're the ones with a clean mental shape for the conversation, who can pivot when the interviewer changes a constraint, and who can articulate trade-offs in their own words instead of reciting a chapter.
Here's the layered approach we coach. It maps cleanly onto the 30 minutes.
Layer 1: Requirements (3-5 minutes)
The interviewer says four words: "design a URL shortener." There is now a long silence in which the interviewer waits for you to be the kind of senior engineer who doesn't immediately start drawing boxes.
The right move is to ask. Not because you don't know what a URL shortener is — you do — but because the design depends on what flavour of URL shortener they want. Public-facing like bit.ly? Internal like a corporate share-link tool? High-traffic with analytics? Low-traffic with vanity domains? Different answers, different correct designs.
Five clarifying questions, ranked by how much they change the design:
- What's the read-write ratio? (URL shorteners are dramatically read-heavy; analytics-style systems flip the ratio.)
- What's the rough scale? (10K DAU vs 100M DAU is two different systems.)
- What functional features matter beyond the obvious? (Custom URLs? Expiry? Click analytics? Single-use links?)
- What's the durability bar? (Can a few links be lost on a crash? Or is this storing audit-grade redirects?)
- Is there a latency budget I should be designing toward? (P95 under 50ms vs P95 under 500ms changes the cache strategy.)
Don't ask all five — pick the two or three that move your design the most for the question you got. The interviewer will mentally tick "asks good questions" and then answer. Their answers are the actual prompt.
Anti-pattern at this layer: asking too many questions, or asking questions whose answers don't change anything. "Should the URL be HTTP or HTTPS?" is not a design-changing question. "What's the read-write ratio?" is.
Layer 2: High-level design (10-12 minutes)
Now you draw boxes. Three or four. No more.
The high-level design is the piece most candidates spend too long on. It should be the easy part — load balancer, app servers, database, cache, queue if needed. The interviewer is checking whether you can sketch the obvious shape of the system in plain words. They're not checking whether you can render Visio diagrams.
Two pieces of advice:
Reach for the boring tool first. Junior candidates pick Kafka. Senior candidates pick Postgres. The interview reflex should be: can a single Postgres do this? If yes, that's your starting point, and you add complexity in response to a constraint, not in advance of one. When the interviewer says "now imagine 1B writes per day," then you scale up. Not before.
Talk while you draw. The boxes are the artefact; the talking is the signal. As you draw the database, say "I'm putting Postgres here because reads are heavy and Postgres handles read replicas cleanly; I'd reconsider this if our writes were Kafka-shaped." That sentence carries more signal than the box itself.
By the end of this layer, the interviewer should be able to read your sketch without you, follow the request flow from client to database, and see where you put the cache.
Layer 3: Drill-down on one or two components (10-15 minutes)
This is where the interview is graded. The interviewer will pick a component and start probing — usually the one with the most interesting trade-offs given your scale.
For a URL shortener at 100M DAU, the drill-down is usually the ID-generation strategy. Random hash collisions, base-62 encoding from a counter, distributed counters via something like Snowflake — each has a different latency, durability, and operational profile. The interviewer wants to hear you weigh them.
For a news feed, the drill-down is fan-out: pull-on-read versus push-on-write. Pull is simple but expensive at read time; push is fast at read time but requires storing redundant state proportional to the follower graph. Different scales, different right answers, and the interviewer wants to hear you reach for the right answer for the scale they specified.
For a rate limiter, the drill-down is the algorithm and the storage. Token bucket vs sliding-window log vs sliding-window counter. In-process vs Redis-backed. Each carries operational implications — token bucket loses precision at burst; sliding-window log uses unbounded memory; sliding-window counter is the usual production answer with a known accuracy/cost trade.
For autocomplete, the drill-down is the index structure and the freshness story. Trie in memory? Inverted index in Elasticsearch? Bloom filter for the prefix set? And how stale can the suggestions be — minutes, hours, days?
The shape of the drill-down is the same regardless of the question: pick a component, name two or three valid approaches, name the trade-off each one makes, then commit to one. Committing matters. The interviewer doesn't want to see you waffle between options forever — they want to see you make a decision and own it. "I'd start with token bucket in Redis because the operational story is simple and the accuracy is enough for our P95 budget; if we needed sub-millisecond decisions I'd revisit and probably move to in-process counters" is a complete answer.
Layer 4: Tradeoffs and failure modes (5-7 minutes)
Now the interviewer pushes. They change a number, they remove a constraint, they ask "what happens when this breaks." Your job at this layer is to absorb the pushback and update your design without panicking.
The pushback is structured. There are five common kinds, and each has a clean answer shape:
- "Now imagine 10x scale." Walk through which boxes hit limits first. Usually it's the database, then the cache fanout, then the network egress. Each one gets a specific mitigation: read replicas, sharding, regional caches, CDN. Don't try to scale every box at once — name the bottleneck first, then the fix.
- "What happens when component X fails?" Name what goes wrong, name the user impact, name the recovery path. "If the cache fails, we degrade to the database — reads get slower but stay correct. We have a circuit breaker so we don't take down the database with the recovered traffic." Concrete is the keyword.
- "What if we don't have N?" Where N is some piece of infrastructure (Kafka, Redis, an existing service). Don't argue. Reach for the boring fallback: "Without Kafka, I'd use Postgres-backed queueing — SELECT FOR UPDATE SKIP LOCKED with a worker poll. Less throughput, but most of our scale was bounded anyway." The interviewer is checking whether your design is dependent on luxury infra or whether you can degrade gracefully.
- "How would you debug this in prod?" Name the metrics you'd add (latency P50/P95/P99, error rate, queue depth) and the logs (request ID propagation, structured fields). The interviewer is checking whether you've operated systems before.
- "How would you migrate?" Dual-write, backfill, cutover, rollback plan. Usually phrased as "how would you change the schema" or "how would you switch to a different storage engine." The right answer is always the boring one: ship the new thing alongside the old, dual-write while you backfill, validate parity, cut over with a feature flag, keep the old one warm for a week.
Layer 5: Wrap-up (3-5 minutes)
The interviewer says "any questions for me?" and you have less time than you think. Use it.
The wrap-up question is partly your last chance to signal. A senior candidate at the end of a system-design interview asks one of three things: a question about the team's actual design culture ("what's the worst on-call incident you've had with this kind of system?"), a question about the constraints they didn't share ("is there context I should have asked about that I missed?"), or a question about what they'd build next ("if you got to redesign the existing system, what would you change?").
None of those are the question candidates default to ("what's the team like?"). The default question is fine, just lower-signal. The three above all signal that you're already thinking like a member of the team.
Common archetypes, in 30 seconds each
If you've prepped, you've seen these. Here's the 30-second version of each. The point is not to memorise — it's to have a starting shape so you spend the interview's time on the trade-offs, not the boilerplate.
URL shortener. Read-heavy. ID generation (counter + base62, or hashed) is the central design choice. Cache layer reads from Redis with a long TTL; database is Postgres. Analytics is a separate write path that can be eventually consistent. Watch out for: the interviewer adding "custom URLs" and you needing to dedupe.
News feed. Push vs pull. At small follower counts, pull is fine — assemble the feed at read time from followed users' recent posts. At large follower counts, push the post to materialised feeds at write time so the read is O(1). Hybrid: push for normal users, pull for celebrities (the "fan-out problem"). Watch out for: the interviewer asking about ranking, which is a separate ML system that consumes the feed.
Rate limiter. Token bucket in Redis is the production-default starting point. Sliding-window counter is the better-accuracy upgrade. The cluster-wide vs per-node decision matters: a single Redis is a SPOF; a cluster needs careful partitioning by key. Watch out for: the interviewer asking about rate limiting at the edge (CDN-level) vs the application level — different problems with different tools.
Autocomplete. Inverted index on prefixes, served from a tier of memory-resident nodes. Updates can lag (minutes is fine, days is suspect). The hard part is ranking the suggestions — usually a learned ranker with click-through-rate signal. Watch out for: typo tolerance, which adds a fuzzy-match layer that complicates the index dramatically.
Chat / messaging. WebSocket connection per active user. Stateful, so the connection-management layer matters more than the message storage. Messages are stored append-only; reads are by conversation, so partition by conversation ID. Read receipts and typing indicators are usually push-only and don't need durability.
Distributed cache. Consistent hashing for sharding. Cache invalidation is the actual hard problem — the interviewer will probe write-through vs write-behind vs explicit invalidation. There is no clean answer; you commit to a trade-off based on the consistency budget the question implies.
Anti-patterns to avoid
Premature optimization. Don't reach for sharding when one Postgres can hold the load. Don't reach for caching before the read-write ratio justifies it. Don't reach for queueing before the workload is async. Each layer of complexity needs a constraint to justify it.
Ignoring scale. The opposite failure: drawing a single-process design when the question explicitly said 100M DAU. The interviewer is testing whether you read the requirements. Don't pretend the scale isn't there.
Guessing tech choices. "I'd use Cassandra for this" without saying why is worse than "I'd use Postgres" with a paragraph of reasoning. The interviewer is grading the reasoning, not the brand name.
Talking too much, drawing too little. The diagram is the artefact the interviewer remembers. If you spent 20 minutes talking and have one box on the screen, the interviewer has nothing to grade against.
Talking too little, drawing too much. The opposite. A clean diagram with no commentary reads as someone who memorised an architecture. The trade-off conversation is where the score is.
Refusing to commit. "It depends" is fine as a starting clause, but you have to follow it with a decision. The interviewer is hiring someone who will own a system, not someone who will list options forever.
What we do at Cruto
The hardest part of system-design prep is the calibration loop. You can drill the archetypes alone, but you can't grade your own answer — and a friend who works in industry is rarely willing to push back the way an interviewer would. We built Cruto's mock interviews around persona-modeled probing: the mock interviewer reads the job description before the session, asks system-design questions weighted toward the company's actual domain, and pushes back when your trade-off articulation goes thin. The debrief names the moment in the transcript where you lost signal, not just a number. If you're 2 weeks from a senior loop and want to know whether your trade-off vocabulary holds up under pressure, try the free tier. 15 minutes of live mock per month is enough to run one full system-design walkthrough and find out where you actually are.