September 25, 2005

Using Python to drive Google Desktop Search

If you haven't tried Google Desktop Search yet I suggest you install it immediately. It extends the Google search engine to your hard drive so you can use it to search your files, emails, etc. I find it really useful for searching through source code. It's much faster than Visual Studio or Explorer's search because it uses some kind of clever keyword index.

This weekend I decided to learn Python. I know some people who swear by it (hello!) and I thought it was about time. As a little project, I decided to get Python talking to Google Desktop Search. I might use it, for example, to be able to double click a class name in an IDE and then initiate a desktop Google search with "ctrl+alt+g" or something.

The instructions on how to control Google Desktop Search from other programs are here. Basically, all you have to do is send an HTTP request to localhost that looks something like "http://127.0.0.1:4664/search?q=My+query&format=xml" and Google Desktop Search responds with an XML file with all the results.

It went pretty well. Python was really easy to learn, although it didn't go quite a smoothly as my first experience with the Ruby language. I didn't need to read the manual before starting. I just started coding and looked things up as I needed them.

With one exception, all the libraries I needed were bundled with the ActivePython distribution: unittest, urllib, xml.minidom, win32api. The exception is the Python mock library, which is the best thing since the invention of the wheel, perhaps even the written word. If you haven't tried out a mock library before and you do any amount of unit testing, you should check one out. Most recent OO languages, unfortunately with the exception of C++, have one.

I used Test Driven Development of course! Here are some of my tests:
class TestMakeQueryUrl(unittest.TestCase):

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})

def testMakeQueryUrlRequestsSearchUrlFromRegistry(self):
self.searcher.makeQueryUrl("hello", 10, 50)
self.searcher.registryReader.mockCheckCall(0, "queryValue",
HKEY_CURRENT_USER, APIRegistryKey, SearchUrlValue);

def testMakeQueryRaisesIfSearchUrlNotPresentInRegistry(self):
self.searcher.registryReader = Mock({"queryValue": (None, None)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testMakeQueryRaisesIfSearchUrlIsNotAString(self):
self.searcher.registryReader = Mock({"queryValue": (123, 2)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testCanMakeValidQueryUrl(self):
url = self.searcher.makeQueryUrl("hello", 10, 50)
self.assertEquals(
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50",
url)


class TestPerformSearch(unittest.TestCase):

TestResultXML = ""

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})
self.file = Mock({"read": self.TestResultXML})
self.searcher.urlOpener = Mock({"open": self.file})

def testPassesSearchUrlToUrlOpener(self):
self.searcher.performSearch("hello", 10, 50)
self.searcher.urlOpener.mockCheckCall(0, "open",
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50")

def testCanRetrieveResultXml(self):
xml = self.searcher.performSearch("hello", 10, 50)
self.assertEqual(1, xml.childNodes.length)
self.assertEqual("foo", xml.childNodes[0].localName)
There were some things I didn't like so much about Python. What's with the redundant use of "self" to identify member variables of classes? And why does "self" need to appear as a formal parameter of every member function? I have to be missing something!

Also, there are six key-strokes to many in "__init__", used to identify a class's constructor:
class GoogleDesktopResult:

def __init__(self, result):
self.category = getXmlElementData(result, "category", "")
self.id = getXmlElementData(result, "id", 0)
self.title = getXmlElementData(result, "title", "")
self.url = getXmlElementData(result, "url", "")
self.time = getXmlElementData(result, "time", 0)
self.snippet = getXmlElementData(result, "snippet", "")
self.icon = getXmlElementData(result, "icon", "")
self.cacheUrl = getXmlElementData(result, "cache_url", "")
But other than those two minor quibbles, I like Python.

This page is powered by Blogger. Isn't yours?