September 25, 2005

Using Python to drive Google Desktop Search

If you haven't tried Google Desktop Search yet I suggest you install it immediately. It extends the Google search engine to your hard drive so you can use it to search your files, emails, etc. I find it really useful for searching through source code. It's much faster than Visual Studio or Explorer's search because it uses some kind of clever keyword index.

This weekend I decided to learn Python. I know some people who swear by it (hello!) and I thought it was about time. As a little project, I decided to get Python talking to Google Desktop Search. I might use it, for example, to be able to double click a class name in an IDE and then initiate a desktop Google search with "ctrl+alt+g" or something.

The instructions on how to control Google Desktop Search from other programs are here. Basically, all you have to do is send an HTTP request to localhost that looks something like "http://127.0.0.1:4664/search?q=My+query&format=xml" and Google Desktop Search responds with an XML file with all the results.

It went pretty well. Python was really easy to learn, although it didn't go quite a smoothly as my first experience with the Ruby language. I didn't need to read the manual before starting. I just started coding and looked things up as I needed them.

With one exception, all the libraries I needed were bundled with the ActivePython distribution: unittest, urllib, xml.minidom, win32api. The exception is the Python mock library, which is the best thing since the invention of the wheel, perhaps even the written word. If you haven't tried out a mock library before and you do any amount of unit testing, you should check one out. Most recent OO languages, unfortunately with the exception of C++, have one.

I used Test Driven Development of course! Here are some of my tests:
class TestMakeQueryUrl(unittest.TestCase):

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})

def testMakeQueryUrlRequestsSearchUrlFromRegistry(self):
self.searcher.makeQueryUrl("hello", 10, 50)
self.searcher.registryReader.mockCheckCall(0, "queryValue",
HKEY_CURRENT_USER, APIRegistryKey, SearchUrlValue);

def testMakeQueryRaisesIfSearchUrlNotPresentInRegistry(self):
self.searcher.registryReader = Mock({"queryValue": (None, None)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testMakeQueryRaisesIfSearchUrlIsNotAString(self):
self.searcher.registryReader = Mock({"queryValue": (123, 2)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testCanMakeValidQueryUrl(self):
url = self.searcher.makeQueryUrl("hello", 10, 50)
self.assertEquals(
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50",
url)


class TestPerformSearch(unittest.TestCase):

TestResultXML = ""

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})
self.file = Mock({"read": self.TestResultXML})
self.searcher.urlOpener = Mock({"open": self.file})

def testPassesSearchUrlToUrlOpener(self):
self.searcher.performSearch("hello", 10, 50)
self.searcher.urlOpener.mockCheckCall(0, "open",
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50")

def testCanRetrieveResultXml(self):
xml = self.searcher.performSearch("hello", 10, 50)
self.assertEqual(1, xml.childNodes.length)
self.assertEqual("foo", xml.childNodes[0].localName)
There were some things I didn't like so much about Python. What's with the redundant use of "self" to identify member variables of classes? And why does "self" need to appear as a formal parameter of every member function? I have to be missing something!

Also, there are six key-strokes to many in "__init__", used to identify a class's constructor:
class GoogleDesktopResult:

def __init__(self, result):
self.category = getXmlElementData(result, "category", "")
self.id = getXmlElementData(result, "id", 0)
self.title = getXmlElementData(result, "title", "")
self.url = getXmlElementData(result, "url", "")
self.time = getXmlElementData(result, "time", 0)
self.snippet = getXmlElementData(result, "snippet", "")
self.icon = getXmlElementData(result, "icon", "")
self.cacheUrl = getXmlElementData(result, "cache_url", "")
But other than those two minor quibbles, I like Python.

Comments:
Hey Al!

A couple of quick points:

C++ *has* a Mock library, depending on what you want. It's not in-built at all, but if you check out MockPP, you might be happy with what you see.

With Python, I believe the 'self' parameter is simple for stating that you actually have a 'self' pointer available: versus static methods which don't have the 'self' parameter.

It's the same thing with the '__init__' member function (I believe you were complaining it's too wordy?); it favours Python's philosophy (excerpt from the WikiPedia Artcle on Python):

Python's developers expressly promote a particular "culture" or ideology based on what they want the language to be, favoring language forms they see as "beautiful", "explicit" and "simple"


All in all, I agree with Python. :) I've found Guido van Rossum (GvR) to be a benevolent overlord, with a good head for making tough decisions... I'm happy you didn't mind the space-enforcing of Python! :)

Keep up the good blogging.. I enjoy reading your site. :)
 
Thanks for the info on mockpp. I took a quick look at it and I'll try and give it a test-run soon. I have a feeling it's not going to save me as much typing as the Python mock library but then that's what I would expect of a statically typed language like C++. Worth a look though.
 
The 'self' is bugging me almost every time I run a new script. It has given me parse errors so many times I'm not entirely sure it's really necessary... It's nice to read, but then it's quite hard to remember, that You have to write it all the time. I find it simillary annoying as {} in all other languages. There must be a better way.

As for the __init__, I think it's a wonderfull decision to use single name for a constructor. It makes code reuse much smoother than Java/C++ conventions to use class name (You don't have to change it, when You copy a constructor from class to class). Maybe You argue about __init__/init choice, but that's the convention instead of using private word and in comparsion to that, one actually spares few letters.
 
Catch the wow gold star that holds your gold in wow destiny,cheap wow gold the one that forever maplestory money twinkles within your heart. Take advantage of precious opportunities while they still sparkle before you. Always believe that your buy maplestory mesos ultimate goal is attainable cheap mesos as long as you commit yourself to it.maple money Though barriers may sometimes stand in the way of your dreams, remember that your destiny is hiding behind them.wow gold kaufen Accept the fact that not everyone is going to approve of the choices Maple Story Accounts you've made. Have faith in your judgment.wow gold farmen Catch the star that maple story money twinkles in your heart and it will lead you to your destiny's path. Follow that pathway and uncover the sweet sunrises that await you. Take pride in your accomplishments, as they are stepping stones to your dreams. Understand that you may make mistakes, powerlevelbut don't let them discourage you.ms mesos Value your capabilities and talents for they are what make you truly unique. The greatest gifts in life are not purchased, but acquired through hard work and determination.maplestory mesos Find the star that twinkles in your heart?for you alone maplestory powerleveling are capable of making your brightest dreams come true. Give your hopes everything you've got and you will catch the star that holds your destiny.
 
Another reason for the 'self' or similar syntax ('this' in PHP) in class code is to insure that information passed to an object of the class gets assigned to the proper instance of the class. We see this will other OOP languages as well, though in some (PHP) it is not necessarily required (you won't get a parse error, I don't think).

It may be overkill in some ways, but basically, it makes sure that if you have 2 (or more) instances of objects (say Thing1 and Thing2) based on the class, once you pass information to Thing1.method(), that information gets assigned to Thing1.variableX instead of Thing2.variableX.

Again, I don't know how often something like this is likely to happen, but in complex systems (like games) you never know what sort of strange bug will rear its head. Especially when you are dealing with potentially thousands of object instances.

And, like Eddie said, it makes the code look nice.

Hope this helps.
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?