I am not looking to break new ground, just simply document some of the things that I found to be useful in everyday work. Sometimes I spend a considerable amount of time to find a solution for a problem that seemed silly and simple. I hope that some of my posts will save you some time.
Monday, January 30, 2012
Shingles
It is easy to create a naive Data Leakage Protection (DLP) Product that will look for exact data or pattern matches, it is a lot more difficult to spot similarity between documents such as this document is x% similar to this reference. This article looks interesting and the approach seems easy to implement http://nlp.stanford.edu/IR-book/html/htmledition/near-duplicates-and-shingling-1.html
Subscribe to:
Posts (Atom)