Saturday, November 3, 2018

Big Data Infrastructure

Frameworks and Tools


Computation

Hadoop, Spark, Samza, Flink, Hive, Pig, Drill, etc

Transportation

Kafka, Flume, Sqoop, Scribe, RabbitMQ, ZeroMQ, IronMQ, etc

Storage

HBase, Cassandra, CouchDB, MongoDB, etc

Coordination

Zookeeper, Consul, Etcd, Eureka, etc

Scheduling

Mesos, Yarn, Oozie, etc

Common Data Infrastructure


Data Ingestion Layer


  • High throughput
  • Simple processing logic, merely a pass through
  • Cannot serve as a storage layer

Data Storage Layer

(Operational Store [Indexed] + File System [Un-Indexed])

  • High availability
  • Fault tolerance
  • Handles high data volume
  • Able to handle various type of data
OLTP vs OLAP
"Online transaction processing" vs "Online analytical processing"

Data Processing Layer

No comments:

Post a Comment

Most Recent Posts