September 25, 2005

Using Python to drive Google Desktop Search

If you haven't tried Google Desktop Search yet I suggest you install it immediately. It extends the Google search engine to your hard drive so you can use it to search your files, emails, etc. I find it really useful for searching through source code. It's much faster than Visual Studio or Explorer's search because it uses some kind of clever keyword index.

This weekend I decided to learn Python. I know some people who swear by it (hello!) and I thought it was about time. As a little project, I decided to get Python talking to Google Desktop Search. I might use it, for example, to be able to double click a class name in an IDE and then initiate a desktop Google search with "ctrl+alt+g" or something.

The instructions on how to control Google Desktop Search from other programs are here. Basically, all you have to do is send an HTTP request to localhost that looks something like "http://127.0.0.1:4664/search?q=My+query&format=xml" and Google Desktop Search responds with an XML file with all the results.

It went pretty well. Python was really easy to learn, although it didn't go quite a smoothly as my first experience with the Ruby language. I didn't need to read the manual before starting. I just started coding and looked things up as I needed them.

With one exception, all the libraries I needed were bundled with the ActivePython distribution: unittest, urllib, xml.minidom, win32api. The exception is the Python mock library, which is the best thing since the invention of the wheel, perhaps even the written word. If you haven't tried out a mock library before and you do any amount of unit testing, you should check one out. Most recent OO languages, unfortunately with the exception of C++, have one.

I used Test Driven Development of course! Here are some of my tests:
class TestMakeQueryUrl(unittest.TestCase):

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})

def testMakeQueryUrlRequestsSearchUrlFromRegistry(self):
self.searcher.makeQueryUrl("hello", 10, 50)
self.searcher.registryReader.mockCheckCall(0, "queryValue",
HKEY_CURRENT_USER, APIRegistryKey, SearchUrlValue);

def testMakeQueryRaisesIfSearchUrlNotPresentInRegistry(self):
self.searcher.registryReader = Mock({"queryValue": (None, None)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testMakeQueryRaisesIfSearchUrlIsNotAString(self):
self.searcher.registryReader = Mock({"queryValue": (123, 2)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testCanMakeValidQueryUrl(self):
url = self.searcher.makeQueryUrl("hello", 10, 50)
self.assertEquals(
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50",
url)


class TestPerformSearch(unittest.TestCase):

TestResultXML = ""

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})
self.file = Mock({"read": self.TestResultXML})
self.searcher.urlOpener = Mock({"open": self.file})

def testPassesSearchUrlToUrlOpener(self):
self.searcher.performSearch("hello", 10, 50)
self.searcher.urlOpener.mockCheckCall(0, "open",
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50")

def testCanRetrieveResultXml(self):
xml = self.searcher.performSearch("hello", 10, 50)
self.assertEqual(1, xml.childNodes.length)
self.assertEqual("foo", xml.childNodes[0].localName)
There were some things I didn't like so much about Python. What's with the redundant use of "self" to identify member variables of classes? And why does "self" need to appear as a formal parameter of every member function? I have to be missing something!

Also, there are six key-strokes to many in "__init__", used to identify a class's constructor:
class GoogleDesktopResult:

def __init__(self, result):
self.category = getXmlElementData(result, "category", "")
self.id = getXmlElementData(result, "id", 0)
self.title = getXmlElementData(result, "title", "")
self.url = getXmlElementData(result, "url", "")
self.time = getXmlElementData(result, "time", 0)
self.snippet = getXmlElementData(result, "snippet", "")
self.icon = getXmlElementData(result, "icon", "")
self.cacheUrl = getXmlElementData(result, "cache_url", "")
But other than those two minor quibbles, I like Python.

Comments:
Hey Al!

A couple of quick points:

C++ *has* a Mock library, depending on what you want. It's not in-built at all, but if you check out MockPP, you might be happy with what you see.

With Python, I believe the 'self' parameter is simple for stating that you actually have a 'self' pointer available: versus static methods which don't have the 'self' parameter.

It's the same thing with the '__init__' member function (I believe you were complaining it's too wordy?); it favours Python's philosophy (excerpt from the WikiPedia Artcle on Python):

Python's developers expressly promote a particular "culture" or ideology based on what they want the language to be, favoring language forms they see as "beautiful", "explicit" and "simple"


All in all, I agree with Python. :) I've found Guido van Rossum (GvR) to be a benevolent overlord, with a good head for making tough decisions... I'm happy you didn't mind the space-enforcing of Python! :)

Keep up the good blogging.. I enjoy reading your site. :)
 
Thanks for the info on mockpp. I took a quick look at it and I'll try and give it a test-run soon. I have a feeling it's not going to save me as much typing as the Python mock library but then that's what I would expect of a statically typed language like C++. Worth a look though.
 
The 'self' is bugging me almost every time I run a new script. It has given me parse errors so many times I'm not entirely sure it's really necessary... It's nice to read, but then it's quite hard to remember, that You have to write it all the time. I find it simillary annoying as {} in all other languages. There must be a better way.

As for the __init__, I think it's a wonderfull decision to use single name for a constructor. It makes code reuse much smoother than Java/C++ conventions to use class name (You don't have to change it, when You copy a constructor from class to class). Maybe You argue about __init__/init choice, but that's the convention instead of using private word and in comparsion to that, one actually spares few letters.
 
Another reason for the 'self' or similar syntax ('this' in PHP) in class code is to insure that information passed to an object of the class gets assigned to the proper instance of the class. We see this will other OOP languages as well, though in some (PHP) it is not necessarily required (you won't get a parse error, I don't think).

It may be overkill in some ways, but basically, it makes sure that if you have 2 (or more) instances of objects (say Thing1 and Thing2) based on the class, once you pass information to Thing1.method(), that information gets assigned to Thing1.variableX instead of Thing2.variableX.

Again, I don't know how often something like this is likely to happen, but in complex systems (like games) you never know what sort of strange bug will rear its head. Especially when you are dealing with potentially thousands of object instances.

And, like Eddie said, it makes the code look nice.

Hope this helps.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?