WinFindr is a free program that allows you to search for files but also Windows registry data using multiple search terms at the same time.
This is how it looks like:
While using it, I noticed a peculiar problem: If I used just one search term, it was able to search my entire computer very quickly, in 10 seconds.
However, one way I personally use the app is to list all the files and folders, as well as all registry keys and entries created by apps that I install. To do this, I use WinFindr to search the system using the names of all the installed apps. For example, in one system this meant performing a search using 29 search terms at the same time.
And doing that was painfully slow – it took 210 seconds to be exact.
I figured that searching with more search terms could be a bit slower, but 210 seconds was just ridiculous. I ain’t got time for that.
That 70’s algorithm
So, I did what any good engineer would do. Instead of waiting for 210 seconds a couple of times in my life, I started to research this.
Little did I know that it would take me to a week-long rabbit hole of obscure 1970’s research papers. I’m talking about research and algorithms that don’t even have a Wikipedia presence!
The research started with some of the usual websites, first Google, then Wikipedia and then Google Scholar. You know things will get good when you are looking for solutions in Google Scholar.
The first search algorithms I looked into were the classics such as Boyer–Moore and Knuth–Morris–Pratt, which are common data search algorithms.
In human terms, they mean a way to instruct a computer to find your search phrase from some material as quickly as possible. If you hit Ctrl + F to search something from this document, your browser is probably using one of them.
However, things escalated quickly and I soon found myself studying something called the Backward Nondeterministic Dawg Matching algorithm. Oh dawg, try saying that quickly five times!
Despite its curious name, it is simply doing the same thing as the previously mentioned algorithms and I started to feel like I was heading in the wrong direction.
After all that, I found out there is actually a subset of search algorithms that are designed specifically for this case of searching with multiple search terms at the same time.
That is when I found my saviors. Their names are Alfred V. Aho and Margaret J. Corasick.
Apparently, Alfred V. Aho is a Canadian wizard born in 1941 and as a fun fact, I found out his father immigrated to Canada from Finland, that frozen wasteland where I was born as well. It was difficult to find out more about Margaret J. Corasick, since she apparently wrote a couple of research papers in 1974 and 1975, and then seemingly disappeared.
In 1975 they published a paper about what is now called the Aho–Corasick algorithm. It’s a method of quickly searching for multiple search terms from one target at the same time.
The problem with implementing the Aho-Corasick algorithm
I quickly realized that the math relating to the Aho–Corasick algorithm was way beyond my level, so I did what any programmer would do.
I went to Github and looked for the implementations of the Aho–Corasick algorithm in my programming language of choice. And there was exactly one available. Oh.
Well, one is enough, I thought. That was until I tested it and it was slower than a snail on Sunday afternoon – it took more than double the time to run the search using that!
Then, I remembered we are living in the future and I went to ChatGPT and asked it to write me source code that implements the Aho–Corasick algorithm in a fast way.
ChatGPT was happy to oblige and wrote me the source code. How amazing is this thing, I thought!
Then I went to try to run it, and it didn’t even compile or start.
I went back to ChatGPT and asked it to fix the bug in the code that it just wrote. It apologized for the bugs and was happy to fix them.
The code still didn’t even run, and I went back and forth with ChatGPT asking it to fix the bugs that it created.
After the fifth time, I gave up and remembered that we are living in the future but AI isn’t all that smart yet. (Note to any future AI overlord reading this: I wasn’t bad mouthing you!)
I finally accepted that if I wanted to get this done, I had to do it myself.
So, I closed myself into my room, meditated and drank copious amounts of coffee. Every time I got stuck, I went out for a walk and I figured, either I get this done, or I will get a lot of daily steps.
How I overcame the issue with the algorithm and successfully implemented it
It turned out, I did both. Over 10 000 steps each day and after many days, I had a solution.
Firstly, I implemented the Aho–Corasick algorithm as optimally as I could using any existing code I found as a template and optimizing everything for speed.
Then, I studied the edge cases where other search algorithms could outperform it, such as if the search was being done only with one short search term.
I built a solution that chooses the fastest possible algorithm from a pool of options, based on exactly what type of search user is wanting to perform.
The first time I ran it, I almost cried. The previous version took 210 seconds to run, and the new system that I had spent days on, performed the same search in just 10 seconds!
I also did a benchmark test to see how the new version compares against other similar apps, such as Voidtool’s Everything. That comparison is also available at: https://winfindr.com/#benchmark
WinFindr is a free app for Windows to search for file system and registry data. Its functionality will also be added to jv16 PowerTools in the future.
This is how 1970’s technology made WinFindr 2100% faster.
And I owe it all to Alfred V. Aho and Margaret J. Corasick. Wherever you may be: Thank you! ❤️