Applying big data analytics on integrated cybersecurity datasets

Date

2015-05

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the growing prevalence of cyber threats in the world, various security monitoring systems are being employed to protect the network and resources from the cyber attacks. The large network datasets that are generated in this process by security monitoring systems need an efficient design for integrating and processing them at a faster rate. In this research, a storage design scheme has been developed using HBase and Hadoop that can efficiently integrate, store, and retrieve security-related datasets. The design scheme is a value-based data integration approach, where data is integrated by columns instead of by rows. Since rowkeys are the most important aspect of HBase table design and performance, a rowkey design was chosen based on the most frequently accessed columns associated with use cases for the retrieval of the dataset statistics. Tests conducted on various schema design alternatives prove that the rate at which the datasets are stored and retrieved using the model designed as part of this research is higher than that of the standard method of storing data in HBase. Network datasets representing DDoS attacks have been used for integration in this research. Use case requirements have been identified, which are related to the characteristics of attacker IP addresses from the integrated datasets, to generate statistical data. This statistical data was used to run the Logistic Regression (LR) classification algorithm for classifying the network traffic data into attack-related and non-attack related traffic. The Fuzzy k-Means (FKM) algorithm was also used to create clusters of attackers and non-attacks to segregate the attack-related traffic from the network datasets. The results obtained from the two algorithms show that both LR and FKM algorithms can successfully classify the network traffic datasets into attackers and non-attackers.

Description

Rights

Rights Availability

Unrestricted.

Keywords

Cybersecurity, Cyber Attacks, HBase, Hadoop, Datasets, DDoS Attacks, Machine Learning, Data Mining

Citation