Skip to main content
COVID-19 information and screening Learn how we’re keeping our campus community safe, healthy and engaged during our gradual return to campus.
Note: The university’s mandatory vaccine directive is now in effect. Learn more about vaccine requirements.
Ontario Tech acknowledges the lands and people of the Mississaugas of Scugog Island First Nation.

We are thankful to be welcome on these lands in friendship. The lands we are situated on are covered by the Williams Treaties and are the traditional territory of the Mississaugas, a branch of the greater Anishinaabeg Nation, including Algonquin, Ojibway, Odawa and Pottawatomi. These lands remain home to many Indigenous nations and peoples.

We acknowledge this land out of respect for the Indigenous nations who have cared for Turtle Island, also called North America, from before the arrival of settler peoples until this day. Most importantly, we acknowledge that the history of these lands has been tainted by poor treatment and a lack of friendship with the First Nations who call them home.

This history is something we are all affected by because we are all treaty people in Canada. We all have a shared history to reflect on, and each of us is affected by this history in different ways. Our past defines our present, but if we move forward as friends and allies, then it does not have to define our future.

Learn more about Indigenous Education and Cultural Services

November 7, 2012

Title: Using Rabin-Karp Fingerprints and LevelDB for Faster Searches

Speaker: Richard Alan Deighton, Computer Science MSc student, Ontario Tech University

Abstract: This thesis represents the results of a study into using fingerprints generated in a manner similar to the Rabin-Karp Algorithm (Karp & Rabin, 1987) and a database LevelDB (Google, 2011) to achieve Text Search times below GREP (Gnu, 2009), which is a standard command-line UNIX text search tool. We discuss the following topics.

Text Search is a set of algorithms that find a string of characters called a Search Pattern in a much larger string of characters in a document we call a text file.
The Rabin-Karp Algorithm iterates through a text file converting character strings into fingerprints at each location. A fingerprint numerically represents a window length string of characters to the left of its location. Then, the algorithm compares that fingerprint to the Search Pattern’s fingerprint. (Karp & Rabin, 1987) When fingerprints are not equal, they guarantee there can be no match in the corresponding strings. Whereas when fingerprints equal, the strings likely match. A verification process confirms matches by checking respective characters.

Our application makes the following major changes to Rabin-Karp that are unique to us. We employ a two-step technique rather than one. During step 1, the preprocessing step, we calculate and store these fingerprints in a LevelDB database called an Index Database. This is our first major change unique to us. Step 2, the matching step, is our second unique change. We use the Index Database to look-up the Search Pattern’s fingerprint and gather its set of locations. Finally, our key change unique to us is how we arrive at the set of locations. We allow the pattern to be any length relative to the window length, whereas Rabin-Karp requires them to be the same. We even created an equation to check if the difference in length is too long for the fingerprint’s number system base.
We facilitated our performance experiments by first building our application and testing it against GREP for a wide range of different parameters. Our conclusions and recommendations determine that although we currently only outperform GREP in about half the cases, we identify some promising opportunities to modify some parts of our application so that we can outperform GREP in all instances.