Quantcast
Channel: memory – Gea-Suan Lin's BLOG
Viewing all articles
Browse latest Browse all 93

把 HDFS 的 metadata 丟上 NewSQL

$
0
0

HDFS 的效能瓶頸 metadata server 的資料改到 NewSQL 上使得效能大幅提昇:「HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases」。

In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ single node in-memory metadata service, with a distributed metadata service built on a NewSQL database.

尤其是在 Spotify 的測試,有 16~37 倍的改善 (應該是指碰到 HDFS 時的這塊,像是從外部拉到 HDFS 上的分析,而非整體的效率改善):

Metadata capacity has been increased to at least 37 times HDFS’ capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS.

論文裡面有提到用的是 MySQL Cluster 的 NDB (in-memory):

HopsFS stores all metadata normalized in a highly available, in-memory, distributed, relational database called Network Database (NDB), a NewSQL storage engine for MySQL Cluster.

這樣應該會讓 Hadoop 的人有改善方向...


Viewing all articles
Browse latest Browse all 93