This is a summary of book Design data intensive application - chap 1.
For anyone interested but don't have much of time.
Overview
A data intensive application, system often requires:
Store to retrieval effectively (DB, search index)
Send a message to another process, to be handled asynchronously (stream processing)
Remember the result of an expensive operation, to speed up reads (caches)
Periodically crunch a large amount of accumulated data (batch processing)
Reliability
Always function properly, withstand fault from:
Human:
Design to minimizes opportunities for error making, especially from UI.
Decouple the places where people make the most mistakes from the places where they can cause failure.
Test by unit test, integration test.
Have quick-easy recovery method.
Set up monitoring, especially for performance and error rate.
Hardware
Software: cascading failures spread out services, exception crash process
Prevent abuse, unauthorized action.
Scalability
Scalability is a system ability, to decrease or increase in performance and cost on specific changing demands, load traffic...
Describe load params: Ex:
Requests per second to a web server
The ratio of reads to writes in a database
Qty of simultaneously active users in a chat room...
Data volume, traffic volume, throughput
Then measure performance, usually:
Duration: short resp time, short data transmission time. Fast data (de)serialization, (de)compression => avoid latency.
Space: low resource utilization such as CPU, RAM, bandwidth, IOPS...
Can classify performance onto 2 types:
Macro performance (design lv)
Micro performance (fine tune lv): about complexity of the algorithm, be careful premature optimization, mostly you can optimize something small and degrade something bigger.
System will grow up in load params, complexity.... A well-scale system should be able to handle it well. Ask:
What if load params increase, and you keep resource?
If I want to keep performance, how to change system resource?
Usually treat a measurement number as a distribution, leverage percentile. Ex: the distribution of an API response, percentile 95% represent some rare cases that this API is too slow.
Notice idea about bottle neck: a parallel backend call, the lowest will adversely affect the rest.
Maintainability
- Operate, read, add more feature, fix bug... easily.