Identification of level of resemblance between web based documents

Surbhi, Kakar,; Surbhi, Kakar,

Articles

Total : PDF: 47 | Total views: 47

Surbhi, Kakar,, Surbhi, Kakar,,

Article Date Published : 30 November 2013 | Page No.: |

Online Metrics

Abstract

One of the biggest challenges today on web is to deal with the “Big data” problem. Finding documents which are near duplicates of each other is another challenge which is in turn brought up by Big data. In this paper the author focuses on finding out the near duplicate documents using a technique called shingling. This paper also presents the different types of shingling that can be used. Further, a measure called the Jaccard coefficient is discussed which can be used to judge the degree of similarity between the documents

Downloads

Download data is not yet available.

Comments & Peer Review

Author's Affiliation

Surbhi, Kakar,
Google Scholar

Surbhi, Kakar,
Google Scholar

Copyrights & License

Article Details

Issue: Vol. 2 No. 11 (2013)

Page No.:

Section: Articles

DOI:

How to Cite

Kakar, S., & Kakar, S. (2013). Identification of level of resemblance between web based documents. International Journal of Engineering and Computer Science, 2(11). Retrieved from http://ijecs.in/index.php/ijecs/article/view/2123

Download Citation

HTML Viewed - 40 Times
PDF Downloaded - 47 Times

PDF

Downloads

PDF

Identification of level of resemblance between web based documents

Abstract

Downloads

Author's Affiliation

Surbhi, Kakar,

Surbhi, Kakar,

Copyrights & License

Article Details

How to Cite

Download Citation

Downloads

Sections

We Recommend