Date of Completion

3-26-2013

Embargo Period

3-26-2018

Keywords

Artificial Immune System

Major Advisor

Sanguthevar Rajasekaran

Co-Major Advisor

Reda A. Ammar

Associate Advisor

Chun-Hsi Huang

Associate Advisor

Swapna Gokhale

Field of Study

Computer Science and Engineering

Degree

Doctor of Philosophy

Open Access

Campus Access

Abstract

When attempting to build complex systems, systems that resemble the intelligence or efficiency found in natural systems, it is not surprising that computer scientists have often turned to biological systems for inspiration in solving complex computational problems. One of the most sophisticated biological systems is the Natural Immune System (NIS). NIS is a distributed, multi-layered, adaptive, dynamic, and life-long learning system. Meanwhile, the Artificial Immune System (AIS) is a computational system inspired by the principles and processes of the NIS. The field of AIS has obtained some degree of success as a branch of computational intelligence since its emergence in the 1990s. There have been several successful applications of AIS in computer security, optimization, anomaly detection, and data mining. Data mining is the process of discovering patterns from large data sets. One of the branches of data mining is Associative Classification (AC). AC algorithms integrate association rules discovery and classification to build a classifier from a training data for predicting the class of unforeseen test data. Meanwhile, traditional Associative Classification algorithms typically search for all possible association rules to find a representative subset of those rules. Since the search space of such rules may grow exponentially as the support threshold decreases, the rules discovery process can be computationally expensive. One effective way to tackle this problem is to directly find a set of high-stakes association rules that potentially builds a highly accurate classifier.

To achieve this efficiently, this work integrates two novel algorithms, namely: ML-DS (Multi-Level Deterministic Sampling) and AC-CS (Associative Classification with Clonal Selection). AC-CS is a deterministic sampling algorithm that attempts to improve accuracy without sacrificing the running time. It begins with a large sample deterministically selected from the dataset and then proceeds in levels. First, it divides the remaining data into disjoint groups of equal size. Each group in turn is recursively divided into smaller disjoint subgroups of equal size. A distance measure is then determined for each subgroup against the original group. Subgroups with minimum distance are retained while others are discarded. The process is repeated until the size of the remaining transactions is equal to a desired sampling threshold. We employ this sampling strategy to pick a representative training sample to begin with. AC-CS is an AC algorithm inspired by the clonal selection algorithm. The algorithm begins with a small population of frequent single item rules. These rules then go through a process of cloning, mutating, and pruning for several generations. Only high quality rules are added to the memory pool. These rules are applied in turn to classify a testing datasets. In a nutshell, upon picking a representative sample of the original data, the approach proceeds in an evolutionary fashion to populate only rules that are likely to yield good classification accuracy. Empirical results on several real datasets show that the approach generates dramatically less rules than traditional AC algorithms.In general, the accuracy of AC-CS with sampling is very close to that without sampling. However, there is a clear reduction in the running time on all datasets. This indicates that our sampling approach is very effective in producing a good representative sample of the original full dataset. In addition, the proposed approach is significantly more efficient than traditional AC algorithms while achieving a competitive accuracy.

COinS