Explore Multidocument Text Clustering With Supervised And Unsupervised Constraints

V.Shanmugapriya, S.Krishnaveni

Articles

Total : PDF: 24 | Total views: 24

V.Shanmugapriya, S.Krishnaveni,

Article Date Published : 28 October 2014 | Page No.: |

Online Metrics

Abstract

Clustering techniques are used for automatically organizing or summarizing a large collection of text; there have been many approaches to clustering. As described below, for the purpose of the work, we are particularly interested in two of them: coclustering and constrained clustering. This thesis proposes a novel constrained coclustering method to achieve two goals. First, it combines information-theoretic coclustering and constrained clustering to improve clustering performance. Second, it adopts both supervised and unsupervised constraints to demonstrate the effectiveness of the algorithm.

The unsupervised constraints are automatically derived from existing knowledge sources, thus saving the effort and cost of using manually labeled constraints. To achieve our first goal, we develop a two-sided hidden Markov random field (HMRF) model to represent both document and word constraints. It then used an alternating expectation maximization (EM) algorithm to optimize the model. It also proposes two novel methods to automatically construct and incorporate document and word constraints to support unsupervised constrained clustering. 1) Automatically construct document constraints 2) Automatically construct word constraints The results of the evaluation demonstrates the superiority of our approaches against a number of existing approaches.Unlike existing approaches, this thesis applies stop word removal, stemming and synonym word replacement to apply semantic similarity between words in the documents. In addition, content can be retrieved from text files, HTML pages as well as XML pages. Tags are eliminated from HTML files. Attribute name and values are taken as normal paragraph words in XML files and then preprocessing (stop word removal, stemming and synonym word replacement) is applied

Downloads

Download data is not yet available.

Comments & Peer Review

Author's Affiliation

V.Shanmugapriya, S.Krishnaveni
Google Scholar

Copyrights & License

Article Details

Issue: Vol. 3 No. 10 (2014)

Page No.:

Section: Articles

DOI:

How to Cite

S.Krishnaveni, V. (2014). Explore Multidocument Text Clustering With Supervised And Unsupervised Constraints. International Journal of Engineering and Computer Science, 3(10). Retrieved from https://ijecs.in/index.php/ijecs/article/view/1969

Download Citation

HTML Viewed - 21 Times
PDF Downloaded - 24 Times

PDF

Downloads

PDF

Explore Multidocument Text Clustering With Supervised And Unsupervised Constraints

Abstract

Downloads

Author's Affiliation

V.Shanmugapriya, S.Krishnaveni

Copyrights & License

Article Details

How to Cite

Download Citation

Downloads

Sections

We Recommend