On using MapReduce to scale algorithms for Big Data analytics: a case study

Abstract

Introduction: Many data analytics algorithms are originally designed for in-memory data. Parallel and distributed computing is a natural first remedy to scale these algorithms to “Big algorithms” for large-scale data. Advances in many Big Data analytics algorithms are contributed by MapReduce, a programming paradigm that enables parallel and distributed execution of massive data processing on large clusters of machines. Much research has focused on building efficient naive MapReduce-based algorithms or extending MapReduce mechanisms to enhance performance. However, we argue that these should not be the only research directions to pursue. We conjecture that when naive MapReduce-based solutions do not perform well, it could be because certain classes of algorithms are not amendable to MapReduce model and one should find a fundamentally different approach to a new MapReduce-based solution. Case description: This paper investigates a case study of a scaling problem of “Big algorithms” for a popular association rule-mining algorithm, particularly the development of Apriori algorithm in MapReduce model. Discussion and evaluation: Formal and empirical illustrations are explored to compare our proposed MapReduce-based Apriori algorithm with previous solutions. The findings support our conjecture and our study shows promising results compared to the state-of-the-art performer with 7% increase in performance on the average of transactions ranging from 10,000 to 120,000. Conclusions: The results confirm that effective MapReduce implementation should avoid dependent iterations, such as that of the original sequential Apriori algorithm. These findings could lead to many more alternative non-naive MapReduce-based “Big algorithms”.

Description

© 2019, The Author(s). cc-by

Rights

Availability

Keywords

Association rules mining, Big Data analytics algorithms, MapReduce, Parallel computing

Citation

Kijsanayothin, P., Chalumporn, G., & Hewett, R.. 2019. On using MapReduce to scale algorithms for Big Data analytics: a case study. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0269-1

Collections