Abstract
Character recognition can be done on various image datasets. In this method Binarization and edge detection are separately carried out on the three colour planes of the image. From the binarized image Connected components (CC’s) are obtained and thresholded based on their area and aspect ratio. CC’s which contain sufficient edge pixels are retained. Also the text components are represented as nodes of a graph. The centroids of the individual CC’s are represented as the nodes of the graph. Long edges are broken from the minimum spanning tree of the graph. Pairwise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC’s to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level . After the segmentation and Localization text is character rs are recognized by using Correlation.