zhangsquared

Monday, July 29, 2019

Introduction to AWS Storage Gateway

Storage Gateway

Amazon S3
Amazon S3 Glacier (Deep Archive)
Amazon EBS (elastic block store) snapshots

types
- File (protocol NFS, SMB)
- Volume (protocol iSCSI)
- Storage

Friday, February 8, 2019

哭完了继续

最近什么都不给力，无论怎么挣扎都没有好结果。
到后来，最后总是一个人寂寞的哭泣。

不过至少还有自己吧。
就算是不能相信别人，至少还有自己可以相信。
只要自己不要放弃，只要还活着，总能有办法的。

Tuesday, January 15, 2019

最近不想学习。。。

不想学习不想学习不想学习不想学习不想学习不想学习怎么办？

前段时间脑子里有各种 project idea，灵感不断，好奇心旺盛。
现在空空如也，跟死猪一样混日子。。。

什么时候才能重拾兴趣呢？

Saturday, December 22, 2018

Why is it faster to process a sorted array than an unsorted array?

Tuesday, November 27, 2018

Types of NoSQL

There are 4 basic types of NoSQL databases:

Key-Value Store

– It has a Big Hash Table of keys & values
e.g. MemcacheDB, Redis, Amazon Dynamo

The schema-less format of a key value database like Redis is just about what you need for your storage needs. The key can be synthetic or auto-generated while the value can be String, JSON, BLOB (basic large object) etc.
CAP: Key value stores are great around the Availability and Partition aspects but definitely lack in Consistency.

Document-based Store

- It stores documents made up of tagged elements
e.g. MongoDB, CouchDB

The data which is a collection of key value pairs is compressed as a document store quite similar to a key-value store, but the only difference is that the values stored (referred to as “documents”) provide some structure and encoding of the managed data. XML, JSON (Java Script Object Notation), BSON (which is a binary encoding of JSON objects) are some common standard encodings.

One key difference between a key-value store and a document store is that the latter embeds attribute metadata associated with stored content, which essentially provides a way to query the data based on the contents.

Column-based Store

- Each storage block contains data from only one column
e.g. HBase, Cassandra

In column-oriented NoSQL database, data is stored in cells grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. Column families can contain a virtually unlimited number of columns that can be created at runtime or the definition of the schema. Read and write is done using columns rather than rows.

In comparison, most relational DBMS store data in rows, the benefit of storing data in columns, is fast search/ access and data aggregation. Relational databases store a single row as a continuous disk entry. Different rows are stored in different places on disk while Columnar databases store all the cells corresponding to a column as a continuous disk entry thus makes the search/access faster.

For example: To query the titles from a bunch of a million articles will be a painstaking task while using relational databases as it will go over each location to get item titles. On the other hand, with just one disk access, title of all the items can be obtained.

Data Model

ColumnFamily: ColumnFamily is a single structure that can group Columns and SuperColumns with ease.
Key: the permanent name of the record. Keys have different numbers of columns, so the database can scale in an irregular way.
Keyspace: This defines the outermost level of an organization, typically the name of the application. Kind of like schema in RDBM.
Column: It has an ordered list of elements aka tuple with a name and a value defined.

Graph-based

- A network database that uses edges and nodes to represent and store data.
e.g. Neo4J

These databases that use edges and nodes to represent and store data.
These nodes are organized by some relationships with one another, which is represented by edges between the nodes.
Both the nodes and the relationships have some defined properties.

Monday, November 26, 2018

“Neurons that fire together, wire together"

Wednesday, November 14, 2018

Introduction to Mesos Architecture

Mesos consists of
1) a master daemon that manages agent daemons running on each cluster node

Allocation policy module
Enables fine-grained sharing of resources (CPU, RAM, …) across frameworks by making them resource offers.
Each resource offer contains a list of <agent ID, resource1: amount1, resource2: amount2, ...>

2) Mesos frameworks that run tasks on these agents.

Mesos framework consists of:
a scheduler that registers with the master to be offered resources
an executor process that is launched on agent nodes to run the framework’s tasks

1. Mesos slave reports available resources to Mesos master.
2. Based on allocation policy module Mesos master decides which framework to allocate these resources to. For example, It allocated to Framework 1.
3. Framework1 is free to accept/deny offered resources. For example, it accepts the offer
4. Master sends the tasks to the slave and Framework1 executor takes over. Mesos master may allocate any unused resource to other frameworks.

Two-Level Scheduling

Allocation Module decide resources for each framework
Framework Scheduler decide resources for each task

How can the constraints of a framework be satisfied without Mesos knowing about these constraints? For example, how can a framework achieve data locality without Mesos knowing which nodes store the data required by the framework? Mesos answers these questions by simply giving frameworks the ability to reject offers.

Scheduling Algorithm

Dominant Resource Fairness Algorithm (DRF)

Min-Max Fairness Algorithm --- To achieve fairness, the min of requirement gets higher priority
DRF is a Min-Max Fairness Algorithm for heterogeneous resources
- CPU
- Memory
- IO

Similar Systems

YARN, Kubernetes, Docker Swarm