Abstract
In many text mining applications, such as Scientific Research Publication data, Internet Movie Database, etc. as meta- information or side-information is linked with the text documents collection. It is observed that, such attributes may contain a tremendous amount of information for clustering purposes. However, the relative importance of this side-information may be difficult to estimate, especially when some of the information is noisy. Additionally, it can be risky to incorporate side- information into the mining process, because it can either improve the quality of the representation for the mining process, or can add noise to the process. Therefore, this paper explores way to perform the mining process, so as to maximize the advantages from using this side information in text mining applications with the use of COntent and Auxiliary attribute based TExt clustering Algorithm (COATES) approach which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach along with its extension to the classification problem