p:: Computer Science
Relational / SQL
MySQL, Oracle database, PostgreSQL, …
- Tables and Rows
- Perform join operations across tables
Non-Relational / NoSQL
CouchDB, Neo4j, Cassandra, HBase, Amazon DynamoDB, …
- key-value stores
- graph stores
- column stores
- document stores
Join operations are generally not supported
Might be the right choice if:
- App requires super-low latency
- Data is unstructured, or no relational data
- Only need to serialize and deserialize data (JSON, XML, YAML, …)
- Need to store a massive amount of data
Database Replication
-
Master
- Original
- Only Writes
-
Slave
- Copies
- Only Reads
-
Better Performance
- Queries processed in parallel
-
Reliability
- No worry about data loss
-
High Availability
-
If master goes offline, one of the slaves will be promoted to be the new master
-
Other complicated methods
- Multi-masters
- Circular Replication
Scaling
Vertical
- Scaling up
- Adding more power (CPU, RAM, disk, etc.)
- Drawbacks
- Hardware limits
- SPOF
- Powerful servers are more expensive
Horizontal
- AKA Sharding
- Adding more servers
Sharding
- Shards
- Sharding separates large databases into smaller parts called shards
- Each shard shares the same schema and data is unique to the shard
- Sharding key
- AKA partition key, consists of one or more columns that determine how data is distributed
- allows efficient routing of queries to the correct database
- choose a key that can evenly distributed data
- Challenges
- Resharding data
- Needed when data is unevenly distributed
- Consistent hashing is commonly used
- Celebrity problem
- AKA hotspot key problem
- we may need to allocate a shard for each celebrity
- each shard might even require further partition
- Join and de-normalization
- hard to perform join operations across shards
- common workaround is to de-normalize the database so that queries can be performed in a single table
- Resharding data