Abstract
Probabilistic classifiers provide outputs to interpret conditional probabilities and distribution of classes given as input sample. User use sequence patterns to search for chemical formula and chemical names. Indexing of sequence patterns involving chemical formula identifies and index appearances of certain patterns for efficient search and retrieval. However, identifying chemical formulae has been a fundamental problem with increasing presence of formulae in any sequences. This is addressed through feature subset selection and indexing method in this work, called as Chemical Structured Bond Tree-based Indexing (CSBT-I). The algorithms in CSBT-I method are analyzed for different sequence patterns improving the chemical bond indexing accuracy using Bond Tree-based Structure and Sequential feature selection algorithm than existing methods. Bond tree based structure is created as a temporary indexing structure for particular requirement and purpose of indexing, therefore reducing the tree structure computation time. After creation of tree-based structure for several sequential patterns, indexing is performed using Bond Indexed Sequence, where several sequential patterns are analyzed to improve the search performance about chemical information. Chemical information indexing using multi level index pruning with the aid of sequential feature selection algorithm identifies and selects frequent and selective chemical molecular information as features to index, therefore reducing the chemical bond indexing time. Finally, to support user provided search queries that require a match between the chemical names used as a keyword, all possible sub-formulae of formulae that appear in any sequence are indexed. This in turn prunes the indices significantly without compromising the quality of the returned results in a significant manner.