p:: Computer Science

Relational / SQL

MySQL, Oracle database, PostgreSQL, …

  • Tables and Rows
  • Perform join operations across tables

Non-Relational / NoSQL

CouchDB, Neo4j, Cassandra, HBase, Amazon DynamoDB, …

  1. key-value stores
  2. graph stores
  3. column stores
  4. document stores

Join operations are generally not supported

Might be the right choice if:

  • App requires super-low latency
  • Data is unstructured, or no relational data
  • Only need to serialize and deserialize data (JSON, XML, YAML, …)
  • Need to store a massive amount of data

Database Replication

  • Master

    • Original
    • Only Writes
  • Slave

    • Copies
    • Only Reads
  • Better Performance

    • Queries processed in parallel
  • Reliability

    • No worry about data loss
  • High Availability

  • If master goes offline, one of the slaves will be promoted to be the new master

  • Other complicated methods

    • Multi-masters
    • Circular Replication

Scaling

Vertical

  • Scaling up
  • Adding more power (CPU, RAM, disk, etc.)
  • Drawbacks
    • Hardware limits
    • SPOF
    • Powerful servers are more expensive

Horizontal

  • AKA Sharding
  • Adding more servers

Sharding

  • Shards
    • Sharding separates large databases into smaller parts called shards
    • Each shard shares the same schema and data is unique to the shard
  • Sharding key
    • AKA partition key, consists of one or more columns that determine how data is distributed
    • allows efficient routing of queries to the correct database
    • choose a key that can evenly distributed data
  • Challenges
    • Resharding data
      • Needed when data is unevenly distributed
      • Consistent hashing is commonly used
    • Celebrity problem
      • AKA hotspot key problem
      • we may need to allocate a shard for each celebrity
      • each shard might even require further partition
    • Join and de-normalization
      • hard to perform join operations across shards
      • common workaround is to de-normalize the database so that queries can be performed in a single table