Abstract
Text mining application, side information is available along with text documents. Such side information may be contain different kinds, such as links in the document, document provenance information, user-access behavior from web logs or other non-textual attributes. Such attributes may contain large amount of information in the clustering purposes. However, the relative information is difficult to estimate, when some of information is noisy data. In such cases, it can be risky to incorporate side-information into the mining process, because it can either improve the quality of the representation for the mining process, or can add noise to the process. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We then show how to extend the approach to the classification problem.