Abstract
The amount of multilingual documents generated on internet, is increasing day by day. Multilingual document clustering (MDC) is a technique of classifying documents in different languages. Classification of documents for the languages without labeled training data set is a major challenge. Two major approaches used till date are machine translation of documents for classification and use bilingual dictionaries for effective translation of trained classification models. This paper surveys various MDC challenge and techniques. The major focus is on the problem of translating documents and classifying it semantically.
Downloads
Download data is not yet available.