Database

Relational / SQL

MySQL, Oracle database, PostgreSQL, …

CouchDB, Neo4j, Cassandra, HBase, Amazon DynamoDB, …

Join operations are generally not supported

Might be the right choice if:

Shards
- Sharding separates large databases into smaller parts called shards
- Each shard shares the same schema and data is unique to the shard
Sharding key
- AKA partition key, consists of one or more columns that determine how data is distributed
- allows efficient routing of queries to the correct database
- choose a key that can evenly distributed data
Challenges
- Resharding data
  - Needed when data is unevenly distributed
  - Consistent hashing is commonly used
- Celebrity problem
  - AKA hotspot key problem
  - we may need to allocate a shard for each celebrity
  - each shard might even require further partition
- Join and de-normalization
  - hard to perform join operations across shards
  - common workaround is to de-normalize the database so that queries can be performed in a single table