Abstract
— Stemmer plays a vital role in Natural Language Processing applications, where it is used to improve the accuracy of applications like Information Retrieval System(IRS) for indexing based on recall/precision factors, POS Tagger, Search engines and so on. Stemmer is a pre-processing step to squeeze out the root or stem or base of the words. Stemmer for class of words like nouns i.e plural form or verbal form of simple words is available. This paper presents a stemmer for joined words or compound words which is again a class of words formed as, the beginning character of the next word, formed with some form of the ending character of first word. It uses the dictionary of root or stem or base words to squeeze out the vital part of the word. It produces better accuracy when there are large numbers of root or stem or base word in the dictionary. It is working as context-based stemmer where it selects the second root or stem or base of the word based on context. So, it needs to collect the vital part of the words in the class of nouns and verbs