Abstract
The World Wide Web is currently the largest source of information. However, most information on the web is unstructured text in natural languages, and extracting knowledge from natural language text is very difficult. Still, some information on the web exists as lists or web tables coded with specific tags such as <ul>, <li>, and <table> on html pages. However, it is questionable how much valuable knowledge we can extract from lists and web tables. It is true that the total number of web tables is huge in the entire corpus, but only a very small percentage of them contain useful information. Nowadays we have So many Search engines. These search engines provide top-k lists as search results. The search results contains huge amount of information in which the user is not interested to visit all the pages except the top 2 or 3.Moreover the search results may also contain unwanted data. Hence to avoid this drawback we will be developing a better method for mining top-k lists. In proposed system we use Path Clustering Algorithm which better processes the top-k web page. The system displays only required top lists related to our top-k title. It is very useful to the user which saves user’s time. The extracted lists can also be used as background knowledge for Q/A system. We present an efficient method that extracts top-k lists from web pages with high performance. This system collects top-k lists of various interests which can be called a knowledge base and provides a search option to mine them.