p:: Computer Science
- GitHub - donnemartin/system-design-primer: Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
- ByteByteGo | Ace Your Next System Design Interview
Database
Scaling
Vertical
- Scale Up
- Add more power (CPU, RAM, …)
- Has hard limit
- No failover and redundancy
Horizontal
- Scale Out
- Add more servers
Load Balancer
- User → Public IP of Load Balancer → Private IPs of Servers
Cache
Redis, Memcached
- Temporary storage much faster than the database
- Stores result of expensive responses / frequently accessed data in-memory
- Strategies
- Caching Strategies and How to Choose the Right One | CodeAhoy
- Cache-Aside
- Read-Through Cache
- Write-Through Cache
- Write-Around
- Write-Back or Write-Behind
- Caching Strategies and How to Choose the Right One | CodeAhoy
Considerations
- Decide when to use
- data is read frequently but modified infrequently
- Expiration policy
- neither be too long nor too short
- Consistency
- Keep DB and cache in sync
- Mitigating failures
- Avoid Single Point of Failure (SPOF) using multiple cache servers across different data centers
- Overprovision required memory by certain percentages
- Eviction Policy
- once cache is full, existing items need to be removed to add new items
- Cache Eviction Policies
- Least Recently Used (LRU)
- most popular
- Least Frequently Used (LFU)
- First In First Out (FIFO)
- Least Recently Used (LRU)
Content Delivery Network (CDN)
-
network of geographically dispersed servers used to deliver static content
-
images, videos, CSS, JavaScript files, …
-
Dynamic content caching
- new concept
- caching of HTML pages that are based on request path, query strings, cookies, and request headers
- Dynamic content delivery | Content Delivery Network (CDN), API Acceleration, Security | Amazon CloudFront
-
client requests file from CDN
-
if not in CDN, CDN requests file from origin
-
origin returns file with optional Time-To-Live (TTL) header
-
file remains cached in CDN until TTL expires
Considerations
- Cost
- charged for data transfers in and out of the CDN
- Cache Expiry time
- neither be too long nor too short
- CDN fallback
- clients should be able to detect CDN outage and request resources from origin
- Invalidating files
- Can remove a file before it expires
- Using provided API
- Use object versioning to serve a different version
- image.png?v=2
- Can remove a file before it expires
Stateless web tier
- to scale horizontally, we need to move state out of the web tier
- for example, store user session data in database
Stateful architecture
- server remembers client data (state) from one request to the next
- every request from the same client must be routed to the same server
- can be done with sticky sessions in most load balancers but adds overhead
- challenging to handle server failures
Data centers
- users are geoDNS-routed, also known as geo-routed, to the closest data center
- geoDNS is a DNS service that allows domain names to be resolved to IP addresses based on the location of a user
Challenges
- Traffic redirection
- GeoDNS can be used to direct traffic to the nearest data center depending on where a user is located
- Data synchronization
- Users from different regions could use different local databases or caches. In failover cases, traffic might be routed to a data center where data is unavailable.
- Test and deployment
- Automated deployment tools are vital to keep services consistent through all the data centers
Message queue
- durable component, stored in memory, that supports asynchronous communication
- Basic architecture
- Input services, called producers/publishers, create messages, and publish them to a message queue
- Other services or servers, called consumers/subscribers, connect to the queue, and perform actions defined by the messages
- Decoupling
- producer can post a message to the queue when the consumer is unavailable to process it
- consumer can read messages from the queue even when the producer is unavailable
- producer and consumer can be scaled independently
Logging, metrics, automation
- Logging
- monitor error logs at per server level or use tools to aggregate them to a centralized service for easy search and viewing
- Metrics
- Host level metrics: CPU, Memory, disk I/O, etc.
- Aggregated level metrics: the performance of the entire database tier, cache tier, etc.
- Key business metrics: daily active users, retention, revenue, etc.
- Automation
- continuous integration
- improve dev productivity
Millions of users and beyond
- Keep web tier stateless
- Build redundancy at every tier
- Cache data as much as you can
- Support multiple data centers
- Host static assets in CDN
- Scale your data tier by sharding
- Split tiers into individual services
- Monitor your system and use automation tools
A Framework for System Design Interviews
Step 1 - Understand the problem and establish design scope
3 - 10 minutes
Ask questions to understand the exact requirements.
- What specific features are we going to build?
- How many users does the product have?
- How fast does the company anticipate to scale up? What are the anticipated scales in 3 months, 6 months, and a year?
- What is the company’s technology stack? What existing services you might leverage to simplify the design?
Step 2 - Propose high-level design and get buy-in
10 - 15 minutes
- Come up with an initial blueprint for the design
- Draw box diagrams with key components
- Do back-of-the-envelope calculations
Step 3 - Design deep dive
10 - 25 minutes
Step 4 - Wrap up
3 - 5 minutes
The interviewer might ask you a few follow-up questions or give you the freedom to discuss other additional points
-
Design a rate limiter
-
rate limiter
- Step 1 - Understand the problem and establish design scope
- requirements
- Accurately limit excessive requests
- Low latency
- Use as little memory as possible
- Distributed rate limiting
- Exception handling
- High fault tolerance
- requirements
- Step 2 - Propose high-level design and get buy-in
- Algorithms
- Token bucket
- Leaking bucket
- Fixed window counter
- Sliding window log
- Sliding window counter
- Algorithms
- Step 1 - Understand the problem and establish design scope