January 30, 2005

C++ data definition language

In a previous post, I said I would consider some alternative ways of taking advantage of the benefits of C# (primarily reflection) on game console platforms. Two approaches spring to mind.

The first approach is to use an existing interpreted language with reflection support, such as Python. An interpreted language is appealing because, if the interpreter, debugger and other tools can be easily used on a console platform, there is considerably less development effort and risk than a C# to C++ translator. Of course, for performance reasons, a considerable amount of code would still need to be written in a compiled language like C++.

The second approach is to develop a C++ reflection framework. Although this would still be a considerable effort, it is lower risk than a C# to C++ translator because it is still in the realm of library development rather than compiler development.

Actually, these are not necessarily alternative approaches. For many games, it would be appropriate to employ both, writing "higher-level" code in an interpreted language and "lower-level" code in C++. In fact, a C++ reflection framework would be useful for allowing more transparent interoperation between the two languages by automatically generating interpreted bindings for C++ classes where appropriate. This would allow the interpreted language to easily manipulate C++ objects.

Also, I believe reflection is as useful for this "lower-level" code as it is for code that can be appropriately implemented in an interpreted language.

I have developed a C++ reflection framework before, with a certain amount of success. The main problem I encountered was that the semantics of C++ are too "loose". A language that supports reflection really needs sufficiently "tight" semantics that a program can unambiguously understand itself by examining its class hierarchy at runtime. For example, if a C++ program examines one of its classes and sees that a function returns a pointer, does that mean the function returns a pointer to an object or a pointer to an array? If it is an array, how can it determine the size of the array? In general it can't. Take this example:


class Foo {

Foo *getParent();
Bar *getBars();
};

A program examining this class through reflection cannot know whether getBars() returns a pointer to an object or a pointer to an array. It takes a human to realize that "Bars" is a plural noun and so the function probably returns a pointer to an array.

A solution to these two problems would be to avoid the direct use of pointers in reflected classes. This could be achieved by replacing pointers with more abstract types that can be unambiguously interpreted through reflection:


class Foo {

Reference<Foo> getParent();
Array<Bar> getBars();
};

Now it is clear, even to a program using reflection, that getParent() returns a reference to an object and getBars() returns a reference to an array of objects. The Array type would provide a way of determining the size of the array.

This seems like a reasonable approach. Take C++ as a starting point and find the aspects of the language that don't work well with reflection. Then use the powerful abstraction and meta-programming facilities of C++ to build a C++ sub-language with the necessary "tight" semantics. The sub-language would have a lot in common with languages like Java and C# but it would still be "proper" C++.

A problem with this approach is that C++ metadata is only known to the compiler. Except for the very limited metadata available through RTTI, it is lost after compile time and there is no way for the program to access it once it is running. So many approaches to C++ reflection require that the programmer express it again using some kind of data structure or meta-program that parallels the program's class hierarchy. See Reflection Support by Means of Template Metaprogramming.

I think this is kind of silly because a key advantage of reflection is to avoid exactly this kind of redundancy. Admittedly, if reflection is used extensively, the redundancy of the parallel data structure or meta-program may be less troublesome than the redundancy of the code that would be written if reflection were not supported. Also, this approach does not support automatic code generation. There is no way for a compile time tool to generate code based on the reflected metadata.

To avoid this redundancy, it is possible to use a C++ parser to examine the program before compilation and automatically generate the metadata code. This is the approach I used for my previous C++ reflection project. It is also the approach described in the paper Non-intrusive object introspection in C++. There are numerous suitable C++ parsers. This project used a modified version of the G++ compiler. I used Visual C++ as the parser and extracted the metadata from the debug information in the generated PDB files. Open C++ and Aspect C++ are also possibilities.

This approach works quite well. An issue is finding a parser that can deal with all the different dialects of C++ that are used in a multi-platform game project. Another issue is that a standard C++ parser does not enforce the use of a C++ sub-language better suited to reflection. From my previous experiences with C++ reflection, I think this is the key to success.

Another approach, and the approach that I think I will investigate in more depth if time permits, is to use a Data Definition Language (DDL) to define classes. My instinct is to make the DDL as close to the language of C++ class definitions as possible. But I would modify and restrict it where necessary to ensure that it enforced the use of the reflected C++ sub-language. It would be rather like Managed C++ or C++/CLI. These are both C++ variants modified to support, among other things, reflection.

Then a C++ programmer should feel as at home working with the DDL for a class as they would working with a C++ class definition. A compile time tool would parse the DDL and automatically generate the corresponding C++ class definition, which can then be compiled by the C++ compiler. The code generator would also output the data structure that parallels the class hierarchy. This would be the underlying data structure that allows the program to examine itself through reflection. Furthermore, the compile time tool could use the information in the DDL to generate other kinds of code, such as serialization code.

To be clear, I would not be developing a new programming language or compiler. It would be a simple DDL that would take the place of C++ header files for reflected classes.

This is not a new approach, even in game development. For example, this is exactly the approach employed by NetZ for describing objects that are synchronized in network games. It is also the approach of UnrealScript's "native classes". In fact, the combination of an interpreted language like Python with automatic binding to C++ code described through a DDL would be remarkably similar to UnrealScript + native classes.

Of the benefits of C# that I listed, which would and which would not apply to this approach? The biggest benefit - reflection - still holds. The benefits that are built on top of reflection, such as automatic code generation and automatic memory management also hold.

Tools are a problem. There will be no IDEs or automatic refactoring tools with knowledge of the DDL syntax, unless it is very close to C++. This is a good reason to keep the DDL syntax as close as possible to standard C++, possibly a subset of the C++ syntax. Then at least all the usual C++ IDEs and tools such as Visual Assist will work.

With regard to testing, reflection would permit the development of a unit testing framework like NUnit or JUnit. The DDL compiler could automatically generate mock objects to support a mocking framework similar to EasyMock.

There is no possibility of using the .NET class library. But there are C++ class libraries such as STL and boost. A design goal should be to ensure that it is possible to use types that are not described with DDL, such as std::string.

With regard to higher level language features such as array bounds checking, this can be made part of the semantics of the reflected C++ sub-language. For example, a template array class might perform bounds checking.

The DDL compiler can take steps to ensure fast build times, such as minimizing physical dependencies between generated header files and automatically employing the pimpl idiom in the generated C++ code for appropriate classes.


Comments:
you too busy to blog?

:-) xxx
platmum
 
nice blog,
 
Yes I enjoy this blog as well
 
You have written an excellent blog.keep sharing your knowledge.
Linux Server Hosting
Linux Web Hosting Server
 
This waas great to read
 
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?