Thursday, November 1, 2018

Introduction to Cassandra

Cassandra has a P2P architecture.

High Availability, Eventual Consistency

Data Partitioning

Every node has the whole nodes map.
Shared-nothing architecture (Each node is independent and self-sufficient, and there is no single point of contention across the system) with data replication
Cassandra stores data by dividing data evenly around its cluster of nodes.

Consistent Hashing

How to distribute data efficiently?
1. Determining a node on which a specific piece of data should reside on.
2. Minimizing data movement when adding or removing nodes.

Keys and nodes map to the same ID space (normally a ring)
Node: Hash(IP)
Key: Hash(Key)

Consistent Hashing is very balancing, and a minimal number of keys need to be remapped to maintain load balance when nodes join or leave the network.

Eventual Consistency

Cassandra enables users to configure the number of replicas in a cluster
Reaching a consistent state often takes microseconds.

Gossip Protocol

Cassandra uses a gossip protocol to discover node state for all nodes in a cluster.
Exchange information with <= 3 nodes, not every node (to reduce network load).
The gossip protocol facilitates failure detection.

Bloom Filter

A bloom filter is an extremely fast way to test the existence of a data structure in a set.
A bloom filter can tell if an item might exist in a set or definitely does not exist in the set.
False positives are possible but false negatives are not.
Bloom filters are a good way of avoiding expensive I/O operation.


Cassandra Keyspace - Keyspace is similar to a schema in the RDBMS world.

Memtable - A memtable is a write-back cache residing in memory which has not been flushed to disk yet.
Write-Back Cache - A write-back cache is where the write operation is only directed to the cache and completion is immediately confirmed. This is different from Write-through cache where the write operation is directed at the cache but is only confirmed once the data is written to both the cache and the underlying storage structure.
SSTable - A Sorted String Table (SSTable) ordered immutable key-value map. It is basically an efficient way of storing large sorted data segments in a file.

Cassandra Write Path

Node level



Cassandra Read Path

Node level


SSTable read path


No comments:

Post a Comment

Most Recent Posts