Please Note

Introduction

The officially sanctioned way of making distributed function calls between C++ programs is to use CORBA, but for many applications, this is overkill. The CORBA specifications allow distributed function calls to be made between code written in any number of languages, and to make it all work, specialized tools need to be integrated into the build process, in order to translate object definitions written in CORBA's IDL to whichever native language is being used (C++, Java, etc.).

However, if we assume that the server and client are both written in the same language, let us assume C++, since it is possible to do away with these complexities. In particular, instead of elaborate definitions of interfaces and marshalling specifications, we can simply defer to C++.

Instead of separate IDL files with object interfaces, we specify the interfaces directly in C++ source code, using the preprocessor, and to marshal arguments across process boundaries, we use the native C++ serialization framework provided in the latest release of the Boost library.

The Boost.Serialization library is used to serialize parameters and return values. It handles standard types and containers automatically, and is easily extended to user defined classes. It also allows us to serialize pointers, with proper handling of polymorphic pointers and multiple pointers to single objects.

Basic Usage

There are three basic steps to using this framework:

Use the RCF_xxx macros to define interfaces.

Use the RcfServer class to expose objects that implement the interface.

Use the RcfClient<> classes to invoke methods on the objects exposed by the server.

type is the identifier for the interface, type_id is a string giving a runtime description of the interface. The RCF_METHOD_xx macros define the member functions, and are named according to the number of arguments and whether the return value is void or not. So, for a function func accepting two strings and returning an integer, we write:

RCF_METHOD_R2(int, func, std::string, std::string);

and if the function has a void return type, we would instead write:

RCF_METHOD_V2(void, func, std::string, std::string);

Dispatch IDs for each function are generated automatically; the first member function is numbered 0, the next one 1, and so on. So, the order in which the functions appear in the definition is important, unlike in CORBA, where dispatch IDs are based on the function name. The dispatch IDs are generated using templates and not any preprocessor __LINE__ trickery, so the interface does not change if blank lines are inserted. The maximum number of member functions that can appear between RCF_BEGIN() and RCF_END() is at the moment limited to 25, but this limit is arbitrary.

The purpose of the RCF_xxx macros is to define the class RcfClient<type>. This class serves as a client stub, from the user's point of view, but also has facilities that allow the framework to use it as a server stub. These macros can be used in any namespace, not just the global namespace.

Once we have defined an interface using the RCF_xxx macros, we can start a server and bind the interface to concrete objects:

{
// create the server and tell it which port to listen on
RCF::RcfServer server(port);
// Interface is the identifer of the interface we're exporting,// Object is a type that implements that interface// one object for each client
server.bind<Interface, Object>();
// ... or one object shared by all clientsObject object;
server.bind<Interface>(object);
// tell the server to start listening for connections
server.start();
// ...// the server will shut down automatically as it goes out of scope
}

The objects are statically bound to the corresponding interface; there is no need for the object to derive from an interface class as is the case for traditional dynamic polymorphism. Instead, the compiler resolves the interface at compile time, which is not only more efficient, but also allows more flexible semantics.

The server can handle multiple simultaneous clients, even in single threaded mode, and can be stopped at any time. The lifetime of objects exposed by the server is determined by the number of current connections to the given object; once there are no more live connections to the object, a timeout is set, and when it expires, the object is deleted.

To make a client call, we instantiate the corresponding RcfClient<> template and pass the server IP and port number to the constructor. When the first remote method is called, the client then attempts to connect to the server, queries for the given object, invokes the requested member function of the remote object, and then returns the remote return value.

Should any exceptions arise on the server side while invoking the requested object, an exception of type RCF::RemoteException will be propagated back to the client and thrown. Should any exceptions arise anywhere else on the server side, e.g., while serializing arguments, then the server will forcibly close the connection, and the client will throw an exception.

RCF will automatically handle a range of parameter types, including C++ primitive types (int, double, etc.), std::string, STL containers, and pointers and references to any of the previously mentioned types. Polymorphic pointers and references, and multiple pointers to single objects are correctly handled as well. Smart pointers are also supported (boost::shared_ptr, std::auto_ptr), and are the safest way of passing polymorphic parameters.

In CORBA, one can tag a parameter as in, out, or inout, depending on which direction(s) one wants the parameter to be marshaled. In RCF, the marshaling directions are deduced from the parameter type, according to the following conventions:

Value: in
Pointer: in
Const reference: in
Nonconst reference: inout
Nonconst reference to pointer: out

To use user-defined types as parameters or return values, some additional serialization code is needed. What that code is depends on which serialization protocols are being used; by default Boost.Serialization is used, and an example of passing a user-defined type would look like the following:

Details

The server and client classes use BSD-style sockets to implement the networking, over TCP, and the whole framework has been compiled and tested on Linux, Solaris (x86 and SPARC) and Win32, using Visual C++ 7.1, Codewarrior 9.0, Borland C++ 5.5, and GCC 3.2. Building RCF requires v. 1.32.0 or later of the Boost library, although the only parts of Boost that need to be built are Boost.Serialization, and, for multithreaded builds, Boost.Threads. Multithreaded builds are enabled by defining RCF_USE_BOOST_THREADS before including any RCF headers.

To use RCF in your own application, you'll need to include the src/RCF.cpp file among the sources of the application, and link to the necessary libraries from Boost, along with OS-specific socket libraries (on Windows that would be ws2_32.lib, on Linux libnsl, etc.).

I've included a demo project for Visual Studio .NET 2003, which includes everything needed to compile, link, and run a server/client pair, with the exception of the Boost library, which needs to be downloaded and unzipped, but no building is needed.

Performance, as measured in requests/second, is highly dependent on the serialization protocol, and also on the compiler being used. Before turning to Boost.Serialization, I used a serialization framework of my own, with which I could clock around 3000 minimal requests/sec. using Visual C++ 7.1, and 3300 requests/sec. with Codewarrior 9.0, on a loopback connection on a 1400Mhz, 384Mb PC running Windows XP. GCC 3.2, on the other hand, was far slower. Using Boost.Serialization, however, I've been nowhere near these numbers; on average, it's around five times slower.

Conclusion

RMI is a well known concept in Java circles, what I've done here is to do something similar in C++, without all the complications of CORBA. If you like it, please tell me, if you don't, well, please tell someone else.... Jokes aside, any and all feedback is appreciated, all I ask is that if you grade the article, and do so with a low grade, then please leave an explanatory comment!

History

8 Feb 2005 - First release.

10 Mar 2005

Now includes a custom serialization framework, so you no longer have to use Boost's. Both serialization frameworks are supported though, use the project-wide RCF_NO_BOOST_SERIALIZATION and RCF_NO_SF_SERIALIZATION defines to control which ones are used. Default behaviour is to compile both.

Default client timeout changed to 10s.

Server can be configured to only accept clients from certain IP numbers.

Server can be configured to listen only on a specific network interface, such as 127.0.0.1.

Client stubs automatically reset their connections when exceptions are thrown (eg for timeouts).

Finer-grained exception classes.

11 July 2005

Stripped CVS folders from distribution.

Added user-definable callback functions to be called when RcfServer has started.

16 Aug 2005

Added facilities for server-bound objects to query the IP address of the client that is currently invoking them. To see how it works, open the file RCF/test/Test_ClientInfo.cpp in the download. Just place a call to RCF::getCurrentSessionInfo().getClientInfo().getAddress(), and you'll receive a string containing the IP address of the client that is invoking the method.

23 Sep 2005

Initialization and deinitialization of the framework can now be done explicitly, be defining the project-wide preprocessor symbol RCF_NO_AUTO_INIT_DEINIT, and then calling RCF::init() and RCF::deinit() at appropriate times. This is mainly useful for DLL builds, so that the DLL can be loaded without automatically initializing Winsock.

19 Oct 2005

Compatible with Boost 1.33.0.

Added enum serialization to the built-in serialization engine, through the SF_SERIALIZE_ENUM macro. For an example of its use, see test/Test_Serialization.cpp.

Added a license.

30 Jan 2006

Miscellaneous bugfixes.

The built-in maximum message size limit has been changed to 50 Kb. Look in src/RCF/Connection.cpp, line 374, if you need to change this.

I'll only be making sporadic maintenance releases of this version of RCF from now on. You can find the next generation of RCF here.

License

Share

About the Author

Software developer, ex-resident of Sweden and now living in Canberra, Australia, working on distributed C++ applications. Jarl enjoys programming, but prefers skiing and playing table tennis. He derives immense satisfaction from referring to himself in third person.

Comments and Discussions

1. C/C++ does not has standard library that allow you to communicate over network. So "SOAP is redundant between C++ program" is a false impression, or totally out of question.

It's redundant in the sense that there are much simpler and more efficient ways of passing data over the wire than to represent the data in XML. If both parties speak the same language, then why not use that language, instead of inventing another?

2. SOAP does not necessary slower than RMI/DCOM, especially DCOM, which bound to heavy security negotiation and communication.

If you're doing a single request, then you might be able to pull it off relatively quickly with SOAP, compared to DCOM. But with any kind volume, be it calls or data, I doubt if SOAP stands any chance. But then I don't have much experience with DCOM, so I can't say for sure

SOAP is useful in many situations, but I hardly think one can say that it has all the benefits of DCOM or CORBA. It does a few things better, but a lot of stuff it doesn't do at all. SOAP is specialized for portable lightweight RPC's, while something like CORBA can do so much more than that.

Well, in heterogeneous environment, not only the "language" is different, even platform (in sense of hardware architecture, backend server etc) is totally different. Compare to binary format, I don't have to mention how much it is, in terms of component reuse, edit etc etc using a text format, WITH a standard schema that can be publish, parse and pass around on every platforms, yet it build on top of exisiting security infrastructure.

RPC/RMI/J2EE/DCOM/COM+ all exist to create a cluster of application server which host reusable, most frequently call business logic, extra layer like transaction/MQ etc can build on top of it. Common design problem with such technology, surprisingly, is not the "language" itself but SECURITY.

SOAP resolve this by first it based on XML (a standard way of formatting data), it runs on top of HTTP (but not necessary bound, so far it is common to see SOAP runs on top of web server), which not only reuse in terms of infrastructure, it open a wider door in heterogeneous environment, be it EDI or simply remote procedure call.

CORBA of course is good, no doubt. Unfortunately it doesn't address security in large environment well enough. While it does, don't be surprise the performance (in term of speed) will drop dramatically.

I've used a lot of remoting protocols and tools over the years from hand cranking my own, to ACE/TAO/CORBA, DCOM, XMLRPC, gSOAP,... and I like this a lot. Many times there is no need for anything especially scalable or with a huge amount of security baggage, just lightweight, quick and easy to use.

It's especially valuable to have the interface definition in code; it was that one feature which drew me to boost::spirit and subsequently the boost library in general. The only thing I don't like about it is the use of macros - e.g. RCF_METHOD_V1 - but that's personal prejudice.

I'd think I'd like to see more of boost::mpl to provide some of the functionality in meta.hpp, though that may be a short cut to obfuscation.

It also looks as though it should be possible to use boost::function and boost::bind to avoid the proliferation of RCF_METHOD_Rx macros, but I confess that I haven't looked hard at it yet.

Glad you liked it! I always thought remoting protocols were way too complicated, so I write this one to see how minimal it could be made.

The macros are kind of a necessary evil, I guess. They hide a substantial amount of boilerplate code, that I wouldn't want to type by hand. The main problem is that C++ doesn't support macros with variable numbers of arguments, if it did then all the RCF_METHOD_V1, V2, R2... macros could be replaced by just one RCF_METHOD macro, which would be a whole lot nicer.

I've tried changing some of the boilerplate code to use member function pointers, but wasn't too pleased with the result, and i'm not sure boost::function w/ boost::bind would help either. The macro way is ugly but I've found it to be more practical than the alternatives, for instance it supports invoking static member functions on server objects. It's easier on the compiler as well.

I'll have to see about Boost! There's still a few things I want to add, like a better suited serialization system and async networking.

There's a proposed libary for Boost, called the Boost Interface Library (BIL). They do something similar to what I've done here, specify static interfaces that are then bound to concrete classes, and they also use macros. I don't really see how one can avoid macros either, there's a fair amount of boilerplate code that can't be factored out. In this case there's also some metaprogramming going on to generate dispatch ID's, having to set all that up manually would be asking for problems.

I agree though, a non-macro solution would be preferable, as long as it's easy to use.

I think it will be rather easy to integrate boost.interfaces with your library (I think I can do it). And there's an additional benefit: it may be possible to seamlessly add COM support to Boost.Interfaces later.

Also it would be nice if transport layer is separated from the other code.

I assume you mean changing the definitions of the BOOST_IDL_xxx macros, to include the stuff that is now hidden behind the RCF_xxx macros? I agree that that shouldn't present any problems, but it would bloat the BOOST_IDL_ macros in those situations where one only wants a plain static interface.

Do you have an example of what you mean by integrating the two libraries?

I assume you mean the RCF_xxx macros? Those macros expand into a whole class definition with prescribed member functions, I'm not sure how that could be done with inline functions. The macros are there to hide all the boilerplate code for generating client and server stubs (something a dedicated IDL compiler would otherwise do). The boilerplate code can't, as far I can tell, be factored out of the class definition. You could type it all out by hand if you wanted too, but there's just too much code for that to be practical.

You should consider using typelists. I read about them in "Modern C++ Design" by Andrei Alexandrescu. They use templates to handle type lists at compile time. I don't know how useful they'd be here but seems to me like your RCF_METHOD_XX macros' main purpose is to define a typelist (along with the return type and method name).

Oh, I was curious why you need a specialized macro for void return types and then still require that void be specified as the return type?

Actually the main purpose of the macros is to define a class with certain member functions, where the member functions have 1) user-specified signatures and 2) implement RPC's.

Typelists help with the second part, serializing requests and deserializing responses. The first part, though, defining a class with certain user-specified member functions, can only be done by hand or via macros. I don't think even Alexandrescu can write the compile-time magic that would generate bona fide C++ class definitions

Using the first form wouldn't even require an IDL definition; the client is assumed to know the member function signatures anyway. If the users get the signatures wrong, they'll get an exception, or they'll accidentally call something they didn't want to call.

I want to make both of these forms possible, but I'm afraid the second one will always require some kind of code generation, either by macros or by external IDL compilers.

As for the 2nd question, specializing macros for void/non-void is needed, I think, because I have to know when to generate a return statement with argument, and when not to. The issue of having the user type "void" as return type anyway is because I find it to look more consistent when you're defining a class with both void and non-void functions.

Well, I think the most apparent disadvantage is that DCOM is for all practical purposes limited to Windows, on both server and client ends, and you have to be COM-conversant to use it. If you're writing Win32-only applications, are using Visual C++, and don't mind double underscores, GUIDS, macros, message maps, explicit reference-counting, mandated HRESULT return values, and the rest of the junk that infests the world of COM and MS-extended C++, then its a pretty good option.

Admittedly Microsoft has it made it a lot easier to use in recent versions of Visual C++, using attributes, as you mention. While compiling, it then automagically generates the corresponding IDL for you, which can then be used to generate the clients. Thats a backward and poorly scaleable way of doing things, though; usually you want settle on an interface first, and then work out the implementations from there.

First of all you have to be careful with optimizations, since they tend to degrade clarity and flexibility, so I'd only invoke them in specific situations where they make a definite and measurable difference. In fact, unless there's a performance problem, I wouldn't even bother looking for optimizations. Its far more important to keep the code clean and idiomatic, making it easier to reshape and refactor it.

In this case, if I'm not misstaken, that function is only called once by each server for each exposed object, so there's nothing to be had in optimizing it. The original to me looks clearer, (even a non-C programmer would understand it), and I think that's more important.

Yes, but it is those types of mistakes that helps to kill scalability. Depending on your return type, you might be hitting the memory manager five to six times. It doesn't sound like a lot, but personally I would like it if you didn't go out of your way to stall threads needlessly.

Doing things poorly such as this can lead to universally slow code. Before you quote Knuth on optimizations, please look up the original quote.

If you show me a situation in which that "optimization" makes a difference, then I'll be the first to suggest changing it. If it doesn't make a difference, then why mess with it?

That code is part of a macro definition, and 'Name' is a macro parameter, wholly untyped. Writing "std::string(Name)" makes it pretty clear what is expected of the 'Name' parameter, while writing "!*Name" doesn't.

It's standard practice in C++ these days to use std::string, instead of char *. By your logic though, that's unscaleable and causes threads to stall needlessly, so we should all go back to the heady days of char*. And not only that; in order to avoid universally slow code, we should use char * even when there is no noticeable higher-level performance gain.

Hi Jarl,
Thanks for the very interesting article. A few questions and comments.

Have you looked at XML-RPC at all? If so how does it compare with your approach. I'm a big fan of XML and quite like the idea of XML-RPC, even though I've never used it.

You mention performance and replacing your own serialization framework with Boost.Serialization. I'm curious why you did this given the performance degradation. BTW I am also a big fan of Boost.

Finally on performance, I would have thought the real life bottleneck would be in the network connection and that the serialization etc. would have little overall impact. I appreciate running the client and server on the same PC is an exceptional case.

I've glanced sideways at ACE a few times over the years and after seeing the comments here I'm glad I've never tried to develop a serious relationship with it.

I'm no expert on XML-RPC, but AFAICT the main feature there is that you use an XML format for the messages, and service documentation etc is also specified in XML. That makes it relatively easy to write clients and servers in a host of languages, xml tools are omnipresent these days. But if you're using the same language on both server and client ends, then I don't see any particular advantage to using XML, it just adds an extra translation step, a step that is then promptly reversed on the other side. Its simpler and more efficient to use a native format, like I've done here. SO I think XML is good if you need the ability to write clients in multiple languages, otherwise it's just a needless complexity.

As for my own serialization library, it has most of the functionality of Boost's, but nowhere near the polish , so for now I haven't included it. I'm working on it though, changing the interface to match that of Boost's, and then it will be included as well. The sooner the better, I'd rather not have any dependencies beyond the header files of Boost.

That's a good point about performance, I haven't yet tested it enough across remote network connections to say how much the serialization actually matters. It's very noticeable when I run tests locally on my PC, but as you say that difference might well be dwarfed by network latency.

I haven't seen Spread before, either, but there are other similar libraries. Seems to be more about setting up a distributed environment and then passing and broadcasting opaque sequences of bytes between the processes that are involved. It's a bit more lowlevel, as a user I'd much prefer to wrap distributed functionality in a native rpc veneer, it's a lot easier on the eyes if nothing else....