Abstract
Outlier detection is an important problem that has been reached within various research and applications domains in today’s world. It aims to detect the object that are considerably distinct, exceptional and inconsistent the majority data in input data sets. Many outlier detection techniques have been specifically developed for certain application domains. To identify abnormal data which forms non-conforming pattern is referred to as outlier, anomaly detection. This leads to knowledge and discovery. Many outlier detection methods have been proposed based on classification clustering, classification, statistics and frequent patterns. Among them information theory have some different perspective while its computation is based on statistical approach only. The outlier detection from unsupervised data sets in more challenging since there is no inherent measurement of distance between these objects. We propose two practical 1-parameter outlier detection methods, named ITB-SS and ITB-SP, which require no user-defined parameters for deciding whether an object is an outlier or not. Users need only provide the number of outliers they want to detect in different data set. Experimental results show that ITB-SS and ITB-SP are more effective and efficient than mainstream methods and can be used to deal with both large and high-dimensional data sets where existing algorithms fail to work .Outlier detection in many times known as anomaly detection in advanced technology for a wide range of real time applications like medical, industrial, e-commerce ,security and engineering purpose. Outlier arises due to faults in systems, changes in the system, human errors, behavioral and instrumental errors. Detection of these outliers helps in identification of frauds and faults before they arises and affect our system intensively with outcomes. The data sets like transaction data, financial records in commercial bank, demographic data are present in non-numerical attributes known as categorical data. Existing unsupervised method are applicable on numerical data sets. However they do not work with categorical type data.