Sunday, November 2, 2014

Large-scale graph data process

Apache Giraph - is an iterative graph processing system originated as open-source Google Pregel, that implements BSP (Bulk Synchronous Parallel) model of distributed computation based on Hadoop.

Giraph uses Hadoop for scheduling workers and loading data into memory, but then implements its own communication protocol using HadoopRPC. Jobs are run as Mapper only job on Hadoop.

Apache Hama - is a pure BSP computing engine that generally used for iterative processing like Matrix, Graph. Jobs are runs as BSP job on HDFS only.