February 27, 2005

Other applications for CDL

My main justification for CDL to this point has been as a tool to automatically generate a schema from an annotated C++ class definition, together with additional automatically generated C++ code to support the binding of objects to binary data cooked with the schema. As I see things, the only viable alternative (using C++ at least) is to automatically generate C++ code from a hand-written schema. See my previous post for more info.

Now I want to consider some of the other applications of CDL, which I don't think could be achieved, or could only be achieved with increased redundancy, through the code from schema approach. As I said before, I think serialization, specifically serialization of level data, is the most important application for a language like CDL. But there are other forms of serialization that are useful in game code and for tool / game interoperation.

Here are two examples. The first is allowing realtime editing of game data as the game is running, for example with a level editing tool connected to the game engine over a network. The second is supporting automatic synchronization of objects in network games. These are both related. They require that it be possible to access and change individual properties of an object after it has been instantiated.

With the CDL approach, realtime level data updates could be supported without extending the CDL language. It is a case of automatically generating the code needed to access and modify individual accessor functions and member variables in response to network messages. For synchronizing objects in a network game, it would be necessary to annotate the class definition with attributes indicating which properties should be synchronized and which should not.

With the C++ from schema code generation approach, I can't see a way of supporting this easily. The problem is the game objects are not directly described by the metadata in the schema. So there is no way to automatically generate code to access or modify individual properties of game objects after they have been instantiated.

Another CDL example

This is another example of CDL. I am posting it in response to Noel Llopis' comment on another thread. This is intended to demonstrate how CDL could allow the structure of the schema to differ from that of the member variables.
// Orientation.cdl
// Hand written annotated class
CDL class Orientation
{
public:

Orientation(): x(0), y(0), z(0), w(1) {}

// Public API uses Euler angles
float GetX() const;
void SetX(float);

float GetY() const;
void SetY(float);

float GetZ() const;
void SetZ(float);

// Other members...

private:

// Implemented in terms of quaternions
float x, y, z, w;
};

// Orientation.xml
// Automatically generated schema for binding with level editing tool
// and to allow a build tool to cook C++ CInfo structs from level
// data. The field names and types are inferred from the accessor
// functions in class Orientation.
[schema]
[class name="Orientation"]
[field name="X" type="float"/]
[field name="Y" type="float"/]
[field name="Z" type="float"/]
[/class]
[/schema]

// Orientation.auto.cpp
// More automatically generated code

struct OrientationCInfo
{
float X;
float Y;
float Z;
};

// These are invoked automatically by the serializer
// when loading or saving level data.
void Load(Orientation &l, const OrientationCInfo &r)
{
l.SetX(r.X);
l.SetY(r.Y);
l.SetZ(r.Z);
}

void Save(const Orientation &l, OrientationCInfo &r)
{
r.X = l.GetX();
r.Y = l.GetY();
r.Z = l.GetZ();
}

// Other automatically generated code for factory, factory
// registration, etc...

I have changed things so that properties are identified through a naming convention rather than through explicit annotations. Instead, annotations would be used when the wrong interpretation would be implied by the naming convention, for example if Orientation had a function called GetAxisAngle, which should not be serialized.

February 20, 2005

Puma C++ transformation library

Today I looked at Puma. Puma is a library for applying transformations to C++ programs. It can do C++ lexical analysis, parsing and semantic analysis. The library provides an API for examining and modifying C++ abstract syntax trees. To support program transformation, it also allows a modified syntax tree to be "unparsed" back into C++ source code, which can then be compiled using a C++ compiler.

It seems to be quite robust. For example, looking through some of the tests, I saw that it can parse templates and even template metaprograms. I was also pleased to see that it is used as the back-end of AspectC++, which is relatively mature aspect oriented programming system for C++.

I considered using AspectC++ instead of CDL. Aspect oriented programming has a lot of cross-over with metaprogramming. One thing it wouldn't let me do though is generate a schema from C++ class definitions. It provides no way of outputting anything other than C++ as far as I can tell. I'll take a closer look at it soon.

I am considering using Puma to parse CDL. If it works out, it will eliminate one of the big risks and most of the development effort for my C++ metaprogramming system. As with my C# and Java to C++ translators, I think the key to getting a prototype up and running quickly is to find a library that will do most of the parsing and analysis for me.

Puma is GPL. I am not sure if that would be a problem for a commercial project. The game would not be compiled with or link to any of the GPLed code. But some of it would be automatically generated by GPLed code. Is code that is generated by a GPL tool also GPL? I'll do some research.

I haven't had a chance to actually try it out yet. It took me all day to get it to compile in Visual C++. I had to upgrade from Visual C++ 7.0 to 8.0 beta. 7.0 was not sufficiently standards compliant. Then I had to wade through GCC makefiles and figure out how to set up a VC++ project with all the necessary preprocessor definitions. Grr, C++ makes things too difficult for a lazy programmer like me!

February 15, 2005

Schema from program or program from schema?

One of the goals of my C++ metaprogramming system is that it should be possible to generate a schema for game data directly from an annotated C++ program. This schema is important because it decouples the game engine from the tools that are used to process the game's data.

There seem to be three options here. The first option is the one I am currently investigating and also the motivation for my C# to C++ translator: generate the schema from the code. The second is to generate the code from the schema. The third is to write both the schema and the code by hand.

The last option is undesirable. It violates the Don't Repeat Yourself principle. The schema and the program express the same thing and one of the two is redundant.

The second option, generate the code from the schema, has some appeal, primarily simplicity of implementation. It has one problem I can't see a solution to though. A schema is data-centric whereas an OO program hides data within classes that expose a more abstract public interface. The schema language could of course be extended to include OO constructs for information hiding but then it would become coupled to the structure of the program, which is undesirable.

So lets say I generate C++ classes from a data-centric schema. The generated C++ classes will inevitably be data-centric too. They might just be structs with public data members. Or one might generate accessor function pairs for the data members. But that is just C++ gorp to make a C style struct look like a OO class.

Then, being a good OO programmer, I would have to write more classes that exposed a better OO abstraction. These classes might be initialized directly from the automatically generated classes. Here is an example:
// Hand written schema
[schema]
[dynamite]
[field name="fuseTime" type="int" default="20"/]
[/dynamite]
[/schema]

// Automatically generated from the schema
struct SDynamite
{
int fuseTime;
};

// Hand written OO abstraction for dynamite
class CDynamite
{
public:

CDynamite(const SDynamite &d): fuseTime(d.fuseTime) {}

// Abstract interface to dynamite
void StartCountDown();
void Update(float dt);

private:
int fuseTime;
};
Of course it would be possible to embed the SDynamite as a private data member of CDynamite. But then CDynamite is tightly coupled to the schema format. It does not allow the internals of CDynamite to change without changing the schema. Notice the redundancy in the constructor. It is essentially converting between the schema format and the internal representation of CDynamite.

So generating the code from the schema does not appear to eliminate the redundancy. It just moves it around. The fundamental problem is the schema does not contain all the information necessary to generate an OO abstraction to itself. Whereas, through metadata annotations or naming conventions, OO code has sufficient information to generate a data-centric schema without redundancy.
// Hand written annotated dynamite class
CDL class Dynamite
{
public:

PROPERTY
ATTR(Serialized)
ATTR(Default=20)
int GetFuseTime() const;
void SetFuseTime(int);

// Abstract interface to dynamite
void StartCountDown();
void Update(float dt);

// ...
};

// Automatically generated schema
[schema]
[dynamite]
[field name="FuseTime" type="int" default="20"/]
[/dynamite]
[/schema]

Both examples express how to convert between the schema format and an object. In the second example, the schema format is inferred from the conversion. In the first example, both the schema format and the conversion must be separately expressed. There must be redundancy in the first example.

In the case where the OO class has the same or similar structure to the schema (probably a high proportion of cases), this redundancy can also be eliminated.
CDL class Dynamite
{
public:

// Abstract interface to dynamite
void StartCountDown();
void Update(float dt);

private:

PROPERTY
ATTR(Serialized)
ATTR(Default=20)
int fuseTime;
};

This is similar to embedding an SDynamite data member in CDynamite but with one key difference. Embedding SDynamite is all or nothing. It is not possible to say I only want the fuse time from SDynamite but these other schema fields will be represented differently.

I see a potential problem with generating a schema from code as well. Which language? If I am using multiple languages, say a scripting language and C# or C++, which do I generate the schema from? If I want to be able to implement game components in both languages, I probably want both. So I need to allow my preprocessor to accept metadata from multiple sources. For that reason, I will start calling it a metacompiler.

artefaktur C++ framework

I found this existing C++ framework that supports reflection. Like my metaprogramming idea, this uses a C++ preprocessor to parse metadata out of C++ class definitions. I believe it focuses on using reflection to exploit metadata. I want to continue down the metaprogramming route. I think it fits better with C++.

I also investigated C++0x. This is an attempt to standardize the next version of C++. Bjarne Stroustrup is involved. It looks like it will have some interesting new features, including ways to get metadata out of C++ programs for cleaner integration with SQL databases and the like. It's going to be a while before I can buy a C++0x compiler though!

I'm not sure if I'm pleased. It looks like C++0x will have advantages over C++ for games development. But I can't help thinking that it will extend the life of C++ still further, when we are better off opting for "higher level" languages.

February 13, 2005

CDL use cases

These are some use cases for my C++ metaprogramming framework. First some CDL:
// Texture.h
#ifndef GUARD_Texture_H
#define GUARD_Texture_H

#include "Object.h" // physical dependency on base class

#include "StringPtr.h" // no physical dependency on referenced classes
#include "ArrayPtr.h"
#include "MipMapPtr.h"

CDL class Texture: public Object
{
public:

PROPERTY
Ptr<String> GetName() const;
void SetName(const Ptr<String> &);

PROPERTY
Ptr<Enumeration> GetMipMaps() const;
void SetMipMaps(const Ptr<Enumeration> &);

PROPERTY
Ptr<Palette> GetPalette() const;
void SetPalette(const Ptr<Palette> &);

private:

Ptr<String> name;
Array< Ptr<MipMap> > mipmaps;
Ptr<Palette> palette;
};

#endif // ifndef GUARD_Texture_H

It's just a C++ header file, limited to a restricted sub-language enforced by the CDL preprocessor. Notice that Texture.h is not physically dependent on String, Array or MipMap but rather on smart pointers to these types. Unlike conventional smart pointers, these are template classes specialized on the referenced type, which means they can provide access to the referenced type's public API without a physical dependency. They might be implemented like this:
// TexturePtr.h
// AUTOMATICALLY GENERATED!

class Texture; // forward declaration

template <>
class Ptr<Texture>
{
public:
// ...
Ptr<String> GetName() const;
void SetName(const Ptr<String> &) const;
// ...

private:
Texture *pointer;
};

// TexturePtr.cpp
// AUTOMATICALLY GENERATED!

#include "Texture.h" // physical dependency here
#include "TexturePtr.h"

Ptr<String> Ptr<Texture>::GetName() const
{
// trampoline to implementation
return pointer->GetName();
}

Texture.h is not just input to the CDL preprocessor. It is also compiled as part of the program like any other header file. So the member functions might be implemented like this:
// Texture.cpp

#include "Texture.h"

Ptr<String> Texture::GetName() const
{
return name;
}

Texture.cpp is not CDL. CDL is only concerned with the class definition, not the implementation of the functions.

The CDL preprocessor will automatically generate some code based on the CDL class definition. For example, it could automatically generate serialization code. So without the programmer having to write any additional serialization code, they can just write this:
void TestSave()
{
Ptr<Texture> texture = New<Texture>();
texture.SetName("Stone");

Ptr<MipMap> mipmap = New<MipMap>(128, 128);
Array< Ptr<MipMap> > mipmaps = NewArray< Ptr<MipMap> >(1);
mipmaps[0] = mipmap;

texture.SetMipMaps(mipmaps.GetEnumeration());

SaveXML(texture, "file.xml");

// texture, array and mipmap automatically freed through reference counting
}

SaveXML and LoadXML invoke code that is automatically generated by the CDL preprocessor. The output might look like this:
<Texture>
<Name>Stone</Name>
<MipMaps>
<MipMap>
<Width>128</Width>
<Height>128></Height>
<!-- ... -->
</MipMap>
</MipMaps>
<Palette>
<null/>
</Palette>
</Texture>

Then the programmer can write this to load the data:
void TestLoad()
{
Ptr<Object> object = LoadXML("file.xml");
Ptr<Texture> texture = DynamicCast<Texture>(object);
}

C++ metaprogramming framework

In my previous post, I considered some ways of exploiting reflection in a C++ program. Specifically, I proposed the use of a C++ data definition language (DDL) from which metadata could be extracted in order to support reflection. In retrospect, I think the choice of the term DDL was a mistake. C++ classes are not just data after all. From now on I will use the term class definition language (CDL).

I was eager to try out test driven development on another project, this time something a little more substantial. So I started out by thinking what my first test should be. I considered using TDD to write the DDL preprocessor. But I wasn't sure what it should do exactly. I realized that this wasn't really programming by intention.

Then I decided to start by thinking about what kinds of things the users of the CDL would want to do. I came to the conclusion that it wasn't the CDL itself that was important but its application. Indeed, if using CDL and reflection is an appropriate thing to do, then TDD would hopefully lead me in that direction.

The most useful application is probably automatic serialization. So I decided that my short-term goal should be write a program that saved the state of some simple test objects to a stream. Eventually this would be achieved using serialization code automatically generated from the CDL.

But to get things started, I would "automatically generate" the code by hand. Once the amount of "automatically generated" code started to get unmanageable, I would have a good idea of what the CDL preprocessor should do and I would be able to use TDD to start writing it. Having got a basic CDL preprocessor and a basic runtime framework underway, I would be able to use TDD to evolve them in parallel.

Some interesting things happened. First, I ended up not using TDD. Perhaps I don't get it yet but I found that it was taking me too long to make progress and that I was taking a lot of wrong turns. I found it was more effective to hack together a series of throw away prototypes. The trouble I am having is that I still don't know exactly what the problem is. I understand the problem in C# but I don't yet fully understand how to translate it into C++.

I am still using some of the ideas of TDD. For example, I am still programming by intention and I am finding it to be very useful in shaping my prototypes.

Also, this approach did not lead me to a CDL + reflection solution. Rather, it lead me to a metaprogramming solution. What I have at the moment is a system where I use CDL to describe my classes. The CDL is just a subset of C++ that should be easy to parse and analyze by the preprocessor. From the CDL, I "automatically generate" (still by hand) C++ code that performs serialization.

This differs from reflection. Using reflection, I would automatically generate a data structure that some general purpose serialization code would query at runtime.

Metaprogramming is quite natural in C++. C++ already supports metaprogramming with templates. What I am aiming at now is a more powerful and easy-to-use metaprogramming framework to automate information processing tasks such as serialization and GUI generation. While template metaprograms work by using templates as a rudimentary compile-time functional programming language, my metaprograms will just be normal C++ programs that are run as a pre-build step. They will probably take a tree representing the parsed CDL as input and generate C++ code as output.

I will post some example use cases for CDL in my next post.

This page is powered by Blogger. Isn't yours?