In this past month I had the opportunity to talk to some great engineers in their fields. To be one step closer like them, a solid understanding of system design is the must.
Also some background knowledge before actually starting this post – know DBs, know OS operations like concurrency(threads, deadlock, startvation …) and basic understanding of read and write locks, know networking basics like the protocol TCP/UDP or the role of switches and routers, know file system, know levels of caching of a modern OS…(Wow this could actually start a whole new post;P)
There’s really no standard question in system design but there are always some tricks to make it a good one. Always ask myself: If given a task on how to design a system, can I do it well?
This is just the notes of my learning path. If you find any issues or have any suggestions, feel free to add in the comment section.
|Replication||Replication means frequently copying the data across multiple machines. So multiple copies of the data exists across machines after the replication. This might help in case one or more of the machines die due to some failure.|
|Consistency||Consistency means data is the same across the cluster when the storage system has more than one machine. So can write/read to/from any node and get the same data.
Eventual Consistency: In a cluster, if multiple machines store the same data, an eventual consistent model implies that all machines will have the same data eventually. Its possible that at a given instance, those machines have different versions of the same data ( temporarily inconsistent ) but they will eventually reach a state where they have the same data.
|Availability||In the context of a database cluster, Availability refers to the ability to always respond to queries ( read or write ) irrespective of nodes going down.|
|Partition Tolerance||In the context of a database cluster, cluster continues to function even if there is a “partition” (communications break) between two nodes (both nodes are up, but can’t communicate).|
|Vertical/Horizontal scaling||In simple terms, to scale horizontally is adding more servers. To scale vertically is to increase the resources of the server ( RAM, CPU, storage, etc. ).|
|Sharding||With most huge systems, data does not fit on a single machine. In such cases, sharding refers to splitting the very large database into smaller, faster and more manageable parts called data shards.|
CAP Theorem states that in a distributed system, it is impossible to simultaneously guarantee all of the following:
This is the upper limit for designing the distributed system.
• Partition Tolerance
Plain ENG explain
A General Approach!
- Requirements: what features, what functions…
- Constraints: how much traffic to handle, how much data to store, does latency matter a lot, which to guarantee? Consistency/Availability, is sharding required(cahing on single machine)…
- How many machines are needed
- Abstract design: draw each component first! Then discuss…
- Scale up: can system have fault tolerance, can it scale if company expands…
Designing systems = Design Code + Design Schema, Applied Scalability Priciples + Product Decisions
1. Design Cache
Personal notes: I have really special feelings for this question. Not just because it’s interesting and practical to solve, but also because it has come up a few times in these ‘talks’ in slightly different forms because it’s just too classic.
One senior engineer once told me, “Don’t always focus on one single language but try to understand the underlying protocols between each component.”
- requirement: how many data needs to cache? few TBs./ What is the eviction strategy?/ What is the access pattern for the given cache?
There are 3 kinds of the caching system:
Write through cache:
Write around cache:
Write back cache: