February 05, 2006

Tim Sweeney gives a presentation at POPL 2006

Last month, Tim Sweeney of Epic Games (developers of the Unreal engine) gave a presentation titled, "The Next Mainstream Programming Language: A Game Developer's Perspective". Although I did not attend the POPL 2006 conference, I found reading the slides interesting. They can be opened with Open Office if you don't have Microsoft Powerpoint.

On the opening slide, he argues that the programming languages we use for game development today fail us in two ways.

"Where are today'’s languages failing?
- Concurrency
- Reliability"

Next generation consoles such as Xbox 360 and Playstation 3 have multiple parallel processors and the languages we typically use (mostly C++) offer almost no help in utilizing them beyond some libraries with low-level thread synchronization primitives like semaphores and mutexes. As for reliability, we all know how easy it is in C++ to dereference a null or dangling pointer, deallocate a reachable object or index off the end of an array.

I fully agree that these two areas need to be addressed in future programming languages for game development. Further, I think there is something even more important: productivity.

Although better reliability and language support for concurrency can certainly improve productivity, if only as a secondary effect, it should be addressed directly. For example, a C++ program with 100,000s of lines of code can take several minutes to build, even for relatively small changes (for example to a header file). This can certainly be improved by careful management of physical dependencies between source files and use of programming practices like the pimpl idiom.

But compare this to a similarly sized program written in Java, which can be rebuilt from scratch in seconds and where the build times for incremental changes are imperceptibly quick. The Java IDE that I have been using recently (Eclipse) builds every time I save a source file so the project is always up-to-date and ready to run immediately.

I was a little disappointed that he only briefly mentioned the subject of tools. By tools I mean, for example, the software that we use to make game content and the software that runs behind the scenes building all the game assets and gluing them together. This becomes more important every year as we squeeze more and more content into games.

Here as well, productivity is key. Not just the productivity of individual programmers but the productivity of the whole team. A new programming language can only be a small piece of that puzzle. But there are certain language features that can make a difference. Reflection is number one on my list. Reflection can be used to automate many software problems involving interoperation between tools and game code, such as automatic GUI generation, versioning, distributed builds, etc.

As with productivity, I would rate reflection (or some similar language feature) above language support for concurrency. At least for next generation consoles. Maybe not next-next-generation!

From his slides:

"Solved problems:
Random memory overwrites
Memory leaks

Solvable:
Accessing arrays out-of-bounds
Dereferencing null pointers
Integer overflow
Accessing uninitialized variables

50% of the bugs in Unreal can be traced to these problems!"

I'm not sure why "accessing uninitialized variables" is listed as only solvable. It is solved! With the exception of C++, no mainstream language I know of allows the programmer to access an uninitialized variable. That is entirely a C++ problem and can be avoided by using a language like Java or Lua for example.

In most languages, the other three "solvable" problems result in runtime exceptions. Even integer overflow can be checked at runtime by some languages such as C#. Sweeney argues that we would be better off if these problems were caught at compile time. In a sense I agree because when I introduce a bug into a program, I want to know as soon as possible. I would rather have the compiler tell me immediately (or in 10 minutes if I am using C++!) than get a report from QA several months later and spend hours tracking down the problem.

But I don't think compile time checks are the best solution. More than likely they will complicate the language. The examples of programs annotated with compile time checks from his slides certainly look more complicated than they would be without. Also, I am sure there will be cases where the compiler will not be smart enough and either miss a problem or be too conservative and raise an error when there is no problem.

It seems to me that unit testing is a better solution. If one uses TDD to ensure that every line of code is thoroughly covered by unit tests then the introduction of such a bug will result in a unit test failure in the majority of cases. And if unit tests are run after every compile, then the offending bug will be identified immediately at compile time, without any changes to the programming language.

Something I have been thinking about is how one might better design a language to support TDD. But that will have to be the subject of a future post.

November 13, 2005

How const is const?

Here is some straightforward C++ code:

int a[5];

The variable "a" is declared as being an array of 5 integers. This code is equivalent:

const int n = 5;
int a[n];

The variable "n" is a compile-time constant and can be used interchangeably with the constant 5. The address of a global variable is also a compile-time constant. Therefore, this is valid C++ as well:

int b;
const int n = reinterpret_cast<int>(&b);

And it does indeed compile, at least under Visual C++ 2005. The variable "n" no longer has the value 5. It has the value of the address of "b" reinterpreted as an integer.

Now the interesting part (to me anyway!). What if I combine the two examples so that the constant "n" is initialized with the address of "b" and then "n" is used as the size of the array "a"?

int b;
const int n = reinterpret_cast<int>(&b);
int a[n];

Although "n" has the same datatype as it did previously (constant integer), this code results in a compile time error under Visual C++. I can understand why this is the case. Consider:

// foo.cpp
extern int b = 0;

// bar.cpp
extern int b;
const int n = reinterpret_cast<int>(&b);
int a[n];

The compiler has no way of knowing the value of "n" at compile time. Only the linker has sufficient information to determine its value. But this is too late because the compiler is responsible for allocating the storage for the array "a".

My conclusion is that the type of "n" seems to go beyond its declared type of constant integer. Behind the scenes, the compiler must track whether "n" can be resolved at compile time or not. Sort of like, is "n" constant or really constant.

The morale of the story? Never fool yourself into thinking you fully understand C++!

September 25, 2005

Using Python to drive Google Desktop Search

If you haven't tried Google Desktop Search yet I suggest you install it immediately. It extends the Google search engine to your hard drive so you can use it to search your files, emails, etc. I find it really useful for searching through source code. It's much faster than Visual Studio or Explorer's search because it uses some kind of clever keyword index.

This weekend I decided to learn Python. I know some people who swear by it (hello!) and I thought it was about time. As a little project, I decided to get Python talking to Google Desktop Search. I might use it, for example, to be able to double click a class name in an IDE and then initiate a desktop Google search with "ctrl+alt+g" or something.

The instructions on how to control Google Desktop Search from other programs are here. Basically, all you have to do is send an HTTP request to localhost that looks something like "http://127.0.0.1:4664/search?q=My+query&format=xml" and Google Desktop Search responds with an XML file with all the results.

It went pretty well. Python was really easy to learn, although it didn't go quite a smoothly as my first experience with the Ruby language. I didn't need to read the manual before starting. I just started coding and looked things up as I needed them.

With one exception, all the libraries I needed were bundled with the ActivePython distribution: unittest, urllib, xml.minidom, win32api. The exception is the Python mock library, which is the best thing since the invention of the wheel, perhaps even the written word. If you haven't tried out a mock library before and you do any amount of unit testing, you should check one out. Most recent OO languages, unfortunately with the exception of C++, have one.

I used Test Driven Development of course! Here are some of my tests:
class TestMakeQueryUrl(unittest.TestCase):

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})

def testMakeQueryUrlRequestsSearchUrlFromRegistry(self):
self.searcher.makeQueryUrl("hello", 10, 50)
self.searcher.registryReader.mockCheckCall(0, "queryValue",
HKEY_CURRENT_USER, APIRegistryKey, SearchUrlValue);

def testMakeQueryRaisesIfSearchUrlNotPresentInRegistry(self):
self.searcher.registryReader = Mock({"queryValue": (None, None)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testMakeQueryRaisesIfSearchUrlIsNotAString(self):
self.searcher.registryReader = Mock({"queryValue": (123, 2)})
self.assertRaises(EnvironmentError, self.searcher.makeQueryUrl,
"hello", 10, 50)

def testCanMakeValidQueryUrl(self):
url = self.searcher.makeQueryUrl("hello", 10, 50)
self.assertEquals(
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50",
url)


class TestPerformSearch(unittest.TestCase):

TestResultXML = ""

def setUp(self):
self.searcher = GoogleDesktopSearcher()
self.searcher.registryReader =
Mock({"queryValue": ("http://127.0.0.1:4664/search?q=", 1)})
self.file = Mock({"read": self.TestResultXML})
self.searcher.urlOpener = Mock({"open": self.file})

def testPassesSearchUrlToUrlOpener(self):
self.searcher.performSearch("hello", 10, 50)
self.searcher.urlOpener.mockCheckCall(0, "open",
"http://127.0.0.1:4664/search?q=hello&format=xml&start=10&num=50")

def testCanRetrieveResultXml(self):
xml = self.searcher.performSearch("hello", 10, 50)
self.assertEqual(1, xml.childNodes.length)
self.assertEqual("foo", xml.childNodes[0].localName)
There were some things I didn't like so much about Python. What's with the redundant use of "self" to identify member variables of classes? And why does "self" need to appear as a formal parameter of every member function? I have to be missing something!

Also, there are six key-strokes to many in "__init__", used to identify a class's constructor:
class GoogleDesktopResult:

def __init__(self, result):
self.category = getXmlElementData(result, "category", "")
self.id = getXmlElementData(result, "id", 0)
self.title = getXmlElementData(result, "title", "")
self.url = getXmlElementData(result, "url", "")
self.time = getXmlElementData(result, "time", 0)
self.snippet = getXmlElementData(result, "snippet", "")
self.icon = getXmlElementData(result, "icon", "")
self.cacheUrl = getXmlElementData(result, "cache_url", "")
But other than those two minor quibbles, I like Python.

March 26, 2005

Virtual sizeof

Do you ever wish you could do this:
class Base
{
};

class Derived: public Base
{
int x;
};

Base *p = new Derived;

size_t s = sizeof(*p); // s is sizeof(Derived)
Of course it doesn't work because s becomes the size of Base rather than Derived. What you really want is a kind of "virtual sizeof". This is what I came up with:
class Base
{
public:
virtual size_t Size() const = 0;
};

class Derived: public Base
{
int x;
};

template < typename T >
class Wrapper: public T
{
public
virtual size_t Size() const
{
return sizeof(T);
}
};

template < typename T >
T *New()
{
return new Wrapper< T >;
}

Base *p = New<Derived>();

size_t s = p->Size(); // works as desired

I'm not dead

It's been 3 weeks since my last post. I'm not dead. I've been busy working on various projects. None of them are in a state that I can post anything concrete. Last week I learned two scripting languages: Ruby and Lua. The first thing I noticed was how much more productive I was using a scripting language than C# or C++. I think it was mostly the dynamic typing and certain language features like closures and coroutines. Closures are awesome. They do all the things I wish C# delegates would do.

After writing my first program in Ruby, another thing I noticed was how short it was considering the amount of stuff it did. I also noticed that the code was considerably narrower than a C# or C++ program would be, probably because of all the type conversions I would have used in a statically typed language. Overall I think the Ruby program had about one quarter the surface area :)

Apart from that I've been thinking about next generation of consoles, parallel processing in particular. Nothing to report yet.

Today I messed around trying to write a C++ garbage collector. I discovered a diabolical way to abuse the assignment operator. Unless you tell it not to, a C++ compiler will automatically generate an assignment operator (and various constructors) for each class. This default assignment operator will assign member variables component-wise. So if you use smart pointers to represent references between objects, and override the smart pointer's assignment operator, it is possible to use the assignment operator of the containing class to find all the reference members.
template < typename T >
class SmartPointer
{
public:

SmartPointer &operator=(const SmartPointer &p)
{
if (garbageCollecting)
{
handlePointer(*this);
}
object = p.object;
return *this;
}

// ...
};

class MyClass
{
SmartPointer< OtherClass > p;
};

MyClass myObject;

garbageCollecting = true;

// Assigning myObject to itself causes handlePointer to
// be called for myObject.p
myObject = myObject;

Of course, this will only work if you write classes in a certain way. In particular, all the members have to handle being assigned to themselves. I can see STL containers going a bit funny. Still, it's keeping me entertained.

March 05, 2005

CDL prototype working

I just got my prototype CDL metacompiler working. It can compile the CDL class definition I posted previously and it outputs all the automatically generated schema and C++ serialization code. It's not even close to being production ready but I have taken it far enough to satisfy myself that it would be feasible for a game project. It was pretty straightforward to get working: just 18 hours.

I ended up not using the Puma C++ transformation library. It was overkill for my needs. I used the Program Database Toolkit instead. PDT is based on the EDG C++ compiler, which is used by a lot of commercial C++ compilers including Comeau C++. EDG can successfully compile boost (which gives a lot of compilers difficulty) so my metacompiler could compile class definitions that used boost or other template heavy code.

EDG is commercial, but they allow it to be used free for non-commercial purposes. I haven't been able to find out what kind of licensing arrangements are available. Alternatively, I could base CDL on Puma as originally planned. It would just be a little more work and I don't know if it would be able to parse things like boost.

I originally started considering this as an alternative to my C# to C++ translator for exploiting metadata and automatic code generation in game projects. I think they could both be made to work, although C# would offer more benefits. Of course, the CDL metacompiler is much simpler and much lower risk.

I think for my next project I will look at some scripting languages: their applications in games and how to effectively bind them to game code, possibly using CDL to automatically generate the bindings from annotated class definitions.

February 27, 2005

Other applications for CDL

My main justification for CDL to this point has been as a tool to automatically generate a schema from an annotated C++ class definition, together with additional automatically generated C++ code to support the binding of objects to binary data cooked with the schema. As I see things, the only viable alternative (using C++ at least) is to automatically generate C++ code from a hand-written schema. See my previous post for more info.

Now I want to consider some of the other applications of CDL, which I don't think could be achieved, or could only be achieved with increased redundancy, through the code from schema approach. As I said before, I think serialization, specifically serialization of level data, is the most important application for a language like CDL. But there are other forms of serialization that are useful in game code and for tool / game interoperation.

Here are two examples. The first is allowing realtime editing of game data as the game is running, for example with a level editing tool connected to the game engine over a network. The second is supporting automatic synchronization of objects in network games. These are both related. They require that it be possible to access and change individual properties of an object after it has been instantiated.

With the CDL approach, realtime level data updates could be supported without extending the CDL language. It is a case of automatically generating the code needed to access and modify individual accessor functions and member variables in response to network messages. For synchronizing objects in a network game, it would be necessary to annotate the class definition with attributes indicating which properties should be synchronized and which should not.

With the C++ from schema code generation approach, I can't see a way of supporting this easily. The problem is the game objects are not directly described by the metadata in the schema. So there is no way to automatically generate code to access or modify individual properties of game objects after they have been instantiated.

Another CDL example

This is another example of CDL. I am posting it in response to Noel Llopis' comment on another thread. This is intended to demonstrate how CDL could allow the structure of the schema to differ from that of the member variables.
// Orientation.cdl
// Hand written annotated class
CDL class Orientation
{
public:

Orientation(): x(0), y(0), z(0), w(1) {}

// Public API uses Euler angles
float GetX() const;
void SetX(float);

float GetY() const;
void SetY(float);

float GetZ() const;
void SetZ(float);

// Other members...

private:

// Implemented in terms of quaternions
float x, y, z, w;
};

// Orientation.xml
// Automatically generated schema for binding with level editing tool
// and to allow a build tool to cook C++ CInfo structs from level
// data. The field names and types are inferred from the accessor
// functions in class Orientation.
[schema]
[class name="Orientation"]
[field name="X" type="float"/]
[field name="Y" type="float"/]
[field name="Z" type="float"/]
[/class]
[/schema]

// Orientation.auto.cpp
// More automatically generated code

struct OrientationCInfo
{
float X;
float Y;
float Z;
};

// These are invoked automatically by the serializer
// when loading or saving level data.
void Load(Orientation &l, const OrientationCInfo &r)
{
l.SetX(r.X);
l.SetY(r.Y);
l.SetZ(r.Z);
}

void Save(const Orientation &l, OrientationCInfo &r)
{
r.X = l.GetX();
r.Y = l.GetY();
r.Z = l.GetZ();
}

// Other automatically generated code for factory, factory
// registration, etc...

I have changed things so that properties are identified through a naming convention rather than through explicit annotations. Instead, annotations would be used when the wrong interpretation would be implied by the naming convention, for example if Orientation had a function called GetAxisAngle, which should not be serialized.

February 20, 2005

Puma C++ transformation library

Today I looked at Puma. Puma is a library for applying transformations to C++ programs. It can do C++ lexical analysis, parsing and semantic analysis. The library provides an API for examining and modifying C++ abstract syntax trees. To support program transformation, it also allows a modified syntax tree to be "unparsed" back into C++ source code, which can then be compiled using a C++ compiler.

It seems to be quite robust. For example, looking through some of the tests, I saw that it can parse templates and even template metaprograms. I was also pleased to see that it is used as the back-end of AspectC++, which is relatively mature aspect oriented programming system for C++.

I considered using AspectC++ instead of CDL. Aspect oriented programming has a lot of cross-over with metaprogramming. One thing it wouldn't let me do though is generate a schema from C++ class definitions. It provides no way of outputting anything other than C++ as far as I can tell. I'll take a closer look at it soon.

I am considering using Puma to parse CDL. If it works out, it will eliminate one of the big risks and most of the development effort for my C++ metaprogramming system. As with my C# and Java to C++ translators, I think the key to getting a prototype up and running quickly is to find a library that will do most of the parsing and analysis for me.

Puma is GPL. I am not sure if that would be a problem for a commercial project. The game would not be compiled with or link to any of the GPLed code. But some of it would be automatically generated by GPLed code. Is code that is generated by a GPL tool also GPL? I'll do some research.

I haven't had a chance to actually try it out yet. It took me all day to get it to compile in Visual C++. I had to upgrade from Visual C++ 7.0 to 8.0 beta. 7.0 was not sufficiently standards compliant. Then I had to wade through GCC makefiles and figure out how to set up a VC++ project with all the necessary preprocessor definitions. Grr, C++ makes things too difficult for a lazy programmer like me!

February 15, 2005

Schema from program or program from schema?

One of the goals of my C++ metaprogramming system is that it should be possible to generate a schema for game data directly from an annotated C++ program. This schema is important because it decouples the game engine from the tools that are used to process the game's data.

There seem to be three options here. The first option is the one I am currently investigating and also the motivation for my C# to C++ translator: generate the schema from the code. The second is to generate the code from the schema. The third is to write both the schema and the code by hand.

The last option is undesirable. It violates the Don't Repeat Yourself principle. The schema and the program express the same thing and one of the two is redundant.

The second option, generate the code from the schema, has some appeal, primarily simplicity of implementation. It has one problem I can't see a solution to though. A schema is data-centric whereas an OO program hides data within classes that expose a more abstract public interface. The schema language could of course be extended to include OO constructs for information hiding but then it would become coupled to the structure of the program, which is undesirable.

So lets say I generate C++ classes from a data-centric schema. The generated C++ classes will inevitably be data-centric too. They might just be structs with public data members. Or one might generate accessor function pairs for the data members. But that is just C++ gorp to make a C style struct look like a OO class.

Then, being a good OO programmer, I would have to write more classes that exposed a better OO abstraction. These classes might be initialized directly from the automatically generated classes. Here is an example:
// Hand written schema
[schema]
[dynamite]
[field name="fuseTime" type="int" default="20"/]
[/dynamite]
[/schema]

// Automatically generated from the schema
struct SDynamite
{
int fuseTime;
};

// Hand written OO abstraction for dynamite
class CDynamite
{
public:

CDynamite(const SDynamite &d): fuseTime(d.fuseTime) {}

// Abstract interface to dynamite
void StartCountDown();
void Update(float dt);

private:
int fuseTime;
};
Of course it would be possible to embed the SDynamite as a private data member of CDynamite. But then CDynamite is tightly coupled to the schema format. It does not allow the internals of CDynamite to change without changing the schema. Notice the redundancy in the constructor. It is essentially converting between the schema format and the internal representation of CDynamite.

So generating the code from the schema does not appear to eliminate the redundancy. It just moves it around. The fundamental problem is the schema does not contain all the information necessary to generate an OO abstraction to itself. Whereas, through metadata annotations or naming conventions, OO code has sufficient information to generate a data-centric schema without redundancy.
// Hand written annotated dynamite class
CDL class Dynamite
{
public:

PROPERTY
ATTR(Serialized)
ATTR(Default=20)
int GetFuseTime() const;
void SetFuseTime(int);

// Abstract interface to dynamite
void StartCountDown();
void Update(float dt);

// ...
};

// Automatically generated schema
[schema]
[dynamite]
[field name="FuseTime" type="int" default="20"/]
[/dynamite]
[/schema]

Both examples express how to convert between the schema format and an object. In the second example, the schema format is inferred from the conversion. In the first example, both the schema format and the conversion must be separately expressed. There must be redundancy in the first example.

In the case where the OO class has the same or similar structure to the schema (probably a high proportion of cases), this redundancy can also be eliminated.
CDL class Dynamite
{
public:

// Abstract interface to dynamite
void StartCountDown();
void Update(float dt);

private:

PROPERTY
ATTR(Serialized)
ATTR(Default=20)
int fuseTime;
};

This is similar to embedding an SDynamite data member in CDynamite but with one key difference. Embedding SDynamite is all or nothing. It is not possible to say I only want the fuse time from SDynamite but these other schema fields will be represented differently.

I see a potential problem with generating a schema from code as well. Which language? If I am using multiple languages, say a scripting language and C# or C++, which do I generate the schema from? If I want to be able to implement game components in both languages, I probably want both. So I need to allow my preprocessor to accept metadata from multiple sources. For that reason, I will start calling it a metacompiler.

artefaktur C++ framework

I found this existing C++ framework that supports reflection. Like my metaprogramming idea, this uses a C++ preprocessor to parse metadata out of C++ class definitions. I believe it focuses on using reflection to exploit metadata. I want to continue down the metaprogramming route. I think it fits better with C++.

I also investigated C++0x. This is an attempt to standardize the next version of C++. Bjarne Stroustrup is involved. It looks like it will have some interesting new features, including ways to get metadata out of C++ programs for cleaner integration with SQL databases and the like. It's going to be a while before I can buy a C++0x compiler though!

I'm not sure if I'm pleased. It looks like C++0x will have advantages over C++ for games development. But I can't help thinking that it will extend the life of C++ still further, when we are better off opting for "higher level" languages.

February 13, 2005

CDL use cases

These are some use cases for my C++ metaprogramming framework. First some CDL:
// Texture.h
#ifndef GUARD_Texture_H
#define GUARD_Texture_H

#include "Object.h" // physical dependency on base class

#include "StringPtr.h" // no physical dependency on referenced classes
#include "ArrayPtr.h"
#include "MipMapPtr.h"

CDL class Texture: public Object
{
public:

PROPERTY
Ptr<String> GetName() const;
void SetName(const Ptr<String> &);

PROPERTY
Ptr<Enumeration> GetMipMaps() const;
void SetMipMaps(const Ptr<Enumeration> &);

PROPERTY
Ptr<Palette> GetPalette() const;
void SetPalette(const Ptr<Palette> &);

private:

Ptr<String> name;
Array< Ptr<MipMap> > mipmaps;
Ptr<Palette> palette;
};

#endif // ifndef GUARD_Texture_H

It's just a C++ header file, limited to a restricted sub-language enforced by the CDL preprocessor. Notice that Texture.h is not physically dependent on String, Array or MipMap but rather on smart pointers to these types. Unlike conventional smart pointers, these are template classes specialized on the referenced type, which means they can provide access to the referenced type's public API without a physical dependency. They might be implemented like this:
// TexturePtr.h
// AUTOMATICALLY GENERATED!

class Texture; // forward declaration

template <>
class Ptr<Texture>
{
public:
// ...
Ptr<String> GetName() const;
void SetName(const Ptr<String> &) const;
// ...

private:
Texture *pointer;
};

// TexturePtr.cpp
// AUTOMATICALLY GENERATED!

#include "Texture.h" // physical dependency here
#include "TexturePtr.h"

Ptr<String> Ptr<Texture>::GetName() const
{
// trampoline to implementation
return pointer->GetName();
}

Texture.h is not just input to the CDL preprocessor. It is also compiled as part of the program like any other header file. So the member functions might be implemented like this:
// Texture.cpp

#include "Texture.h"

Ptr<String> Texture::GetName() const
{
return name;
}

Texture.cpp is not CDL. CDL is only concerned with the class definition, not the implementation of the functions.

The CDL preprocessor will automatically generate some code based on the CDL class definition. For example, it could automatically generate serialization code. So without the programmer having to write any additional serialization code, they can just write this:
void TestSave()
{
Ptr<Texture> texture = New<Texture>();
texture.SetName("Stone");

Ptr<MipMap> mipmap = New<MipMap>(128, 128);
Array< Ptr<MipMap> > mipmaps = NewArray< Ptr<MipMap> >(1);
mipmaps[0] = mipmap;

texture.SetMipMaps(mipmaps.GetEnumeration());

SaveXML(texture, "file.xml");

// texture, array and mipmap automatically freed through reference counting
}

SaveXML and LoadXML invoke code that is automatically generated by the CDL preprocessor. The output might look like this:
<Texture>
<Name>Stone</Name>
<MipMaps>
<MipMap>
<Width>128</Width>
<Height>128></Height>
<!-- ... -->
</MipMap>
</MipMaps>
<Palette>
<null/>
</Palette>
</Texture>

Then the programmer can write this to load the data:
void TestLoad()
{
Ptr<Object> object = LoadXML("file.xml");
Ptr<Texture> texture = DynamicCast<Texture>(object);
}

C++ metaprogramming framework

In my previous post, I considered some ways of exploiting reflection in a C++ program. Specifically, I proposed the use of a C++ data definition language (DDL) from which metadata could be extracted in order to support reflection. In retrospect, I think the choice of the term DDL was a mistake. C++ classes are not just data after all. From now on I will use the term class definition language (CDL).

I was eager to try out test driven development on another project, this time something a little more substantial. So I started out by thinking what my first test should be. I considered using TDD to write the DDL preprocessor. But I wasn't sure what it should do exactly. I realized that this wasn't really programming by intention.

Then I decided to start by thinking about what kinds of things the users of the CDL would want to do. I came to the conclusion that it wasn't the CDL itself that was important but its application. Indeed, if using CDL and reflection is an appropriate thing to do, then TDD would hopefully lead me in that direction.

The most useful application is probably automatic serialization. So I decided that my short-term goal should be write a program that saved the state of some simple test objects to a stream. Eventually this would be achieved using serialization code automatically generated from the CDL.

But to get things started, I would "automatically generate" the code by hand. Once the amount of "automatically generated" code started to get unmanageable, I would have a good idea of what the CDL preprocessor should do and I would be able to use TDD to start writing it. Having got a basic CDL preprocessor and a basic runtime framework underway, I would be able to use TDD to evolve them in parallel.

Some interesting things happened. First, I ended up not using TDD. Perhaps I don't get it yet but I found that it was taking me too long to make progress and that I was taking a lot of wrong turns. I found it was more effective to hack together a series of throw away prototypes. The trouble I am having is that I still don't know exactly what the problem is. I understand the problem in C# but I don't yet fully understand how to translate it into C++.

I am still using some of the ideas of TDD. For example, I am still programming by intention and I am finding it to be very useful in shaping my prototypes.

Also, this approach did not lead me to a CDL + reflection solution. Rather, it lead me to a metaprogramming solution. What I have at the moment is a system where I use CDL to describe my classes. The CDL is just a subset of C++ that should be easy to parse and analyze by the preprocessor. From the CDL, I "automatically generate" (still by hand) C++ code that performs serialization.

This differs from reflection. Using reflection, I would automatically generate a data structure that some general purpose serialization code would query at runtime.

Metaprogramming is quite natural in C++. C++ already supports metaprogramming with templates. What I am aiming at now is a more powerful and easy-to-use metaprogramming framework to automate information processing tasks such as serialization and GUI generation. While template metaprograms work by using templates as a rudimentary compile-time functional programming language, my metaprograms will just be normal C++ programs that are run as a pre-build step. They will probably take a tree representing the parsed CDL as input and generate C++ code as output.

I will post some example use cases for CDL in my next post.

January 30, 2005

C++ data definition language

In a previous post, I said I would consider some alternative ways of taking advantage of the benefits of C# (primarily reflection) on game console platforms. Two approaches spring to mind.

The first approach is to use an existing interpreted language with reflection support, such as Python. An interpreted language is appealing because, if the interpreter, debugger and other tools can be easily used on a console platform, there is considerably less development effort and risk than a C# to C++ translator. Of course, for performance reasons, a considerable amount of code would still need to be written in a compiled language like C++.

The second approach is to develop a C++ reflection framework. Although this would still be a considerable effort, it is lower risk than a C# to C++ translator because it is still in the realm of library development rather than compiler development.

Actually, these are not necessarily alternative approaches. For many games, it would be appropriate to employ both, writing "higher-level" code in an interpreted language and "lower-level" code in C++. In fact, a C++ reflection framework would be useful for allowing more transparent interoperation between the two languages by automatically generating interpreted bindings for C++ classes where appropriate. This would allow the interpreted language to easily manipulate C++ objects.

Also, I believe reflection is as useful for this "lower-level" code as it is for code that can be appropriately implemented in an interpreted language.

I have developed a C++ reflection framework before, with a certain amount of success. The main problem I encountered was that the semantics of C++ are too "loose". A language that supports reflection really needs sufficiently "tight" semantics that a program can unambiguously understand itself by examining its class hierarchy at runtime. For example, if a C++ program examines one of its classes and sees that a function returns a pointer, does that mean the function returns a pointer to an object or a pointer to an array? If it is an array, how can it determine the size of the array? In general it can't. Take this example:


class Foo {

Foo *getParent();
Bar *getBars();
};

A program examining this class through reflection cannot know whether getBars() returns a pointer to an object or a pointer to an array. It takes a human to realize that "Bars" is a plural noun and so the function probably returns a pointer to an array.

A solution to these two problems would be to avoid the direct use of pointers in reflected classes. This could be achieved by replacing pointers with more abstract types that can be unambiguously interpreted through reflection:


class Foo {

Reference<Foo> getParent();
Array<Bar> getBars();
};

Now it is clear, even to a program using reflection, that getParent() returns a reference to an object and getBars() returns a reference to an array of objects. The Array type would provide a way of determining the size of the array.

This seems like a reasonable approach. Take C++ as a starting point and find the aspects of the language that don't work well with reflection. Then use the powerful abstraction and meta-programming facilities of C++ to build a C++ sub-language with the necessary "tight" semantics. The sub-language would have a lot in common with languages like Java and C# but it would still be "proper" C++.

A problem with this approach is that C++ metadata is only known to the compiler. Except for the very limited metadata available through RTTI, it is lost after compile time and there is no way for the program to access it once it is running. So many approaches to C++ reflection require that the programmer express it again using some kind of data structure or meta-program that parallels the program's class hierarchy. See Reflection Support by Means of Template Metaprogramming.

I think this is kind of silly because a key advantage of reflection is to avoid exactly this kind of redundancy. Admittedly, if reflection is used extensively, the redundancy of the parallel data structure or meta-program may be less troublesome than the redundancy of the code that would be written if reflection were not supported. Also, this approach does not support automatic code generation. There is no way for a compile time tool to generate code based on the reflected metadata.

To avoid this redundancy, it is possible to use a C++ parser to examine the program before compilation and automatically generate the metadata code. This is the approach I used for my previous C++ reflection project. It is also the approach described in the paper Non-intrusive object introspection in C++. There are numerous suitable C++ parsers. This project used a modified version of the G++ compiler. I used Visual C++ as the parser and extracted the metadata from the debug information in the generated PDB files. Open C++ and Aspect C++ are also possibilities.

This approach works quite well. An issue is finding a parser that can deal with all the different dialects of C++ that are used in a multi-platform game project. Another issue is that a standard C++ parser does not enforce the use of a C++ sub-language better suited to reflection. From my previous experiences with C++ reflection, I think this is the key to success.

Another approach, and the approach that I think I will investigate in more depth if time permits, is to use a Data Definition Language (DDL) to define classes. My instinct is to make the DDL as close to the language of C++ class definitions as possible. But I would modify and restrict it where necessary to ensure that it enforced the use of the reflected C++ sub-language. It would be rather like Managed C++ or C++/CLI. These are both C++ variants modified to support, among other things, reflection.

Then a C++ programmer should feel as at home working with the DDL for a class as they would working with a C++ class definition. A compile time tool would parse the DDL and automatically generate the corresponding C++ class definition, which can then be compiled by the C++ compiler. The code generator would also output the data structure that parallels the class hierarchy. This would be the underlying data structure that allows the program to examine itself through reflection. Furthermore, the compile time tool could use the information in the DDL to generate other kinds of code, such as serialization code.

To be clear, I would not be developing a new programming language or compiler. It would be a simple DDL that would take the place of C++ header files for reflected classes.

This is not a new approach, even in game development. For example, this is exactly the approach employed by NetZ for describing objects that are synchronized in network games. It is also the approach of UnrealScript's "native classes". In fact, the combination of an interpreted language like Python with automatic binding to C++ code described through a DDL would be remarkably similar to UnrealScript + native classes.

Of the benefits of C# that I listed, which would and which would not apply to this approach? The biggest benefit - reflection - still holds. The benefits that are built on top of reflection, such as automatic code generation and automatic memory management also hold.

Tools are a problem. There will be no IDEs or automatic refactoring tools with knowledge of the DDL syntax, unless it is very close to C++. This is a good reason to keep the DDL syntax as close as possible to standard C++, possibly a subset of the C++ syntax. Then at least all the usual C++ IDEs and tools such as Visual Assist will work.

With regard to testing, reflection would permit the development of a unit testing framework like NUnit or JUnit. The DDL compiler could automatically generate mock objects to support a mocking framework similar to EasyMock.

There is no possibility of using the .NET class library. But there are C++ class libraries such as STL and boost. A design goal should be to ensure that it is possible to use types that are not described with DDL, such as std::string.

With regard to higher level language features such as array bounds checking, this can be made part of the semantics of the reflected C++ sub-language. For example, a template array class might perform bounds checking.

The DDL compiler can take steps to ensure fast build times, such as minimizing physical dependencies between generated header files and automatically employing the pimpl idiom in the generated C++ code for appropriate classes.


This page is powered by Blogger. Isn't yours?