Introduction

This is part II of the article series. Part I had introduced Python/C API, a C library that helps to embed python modules into C/C++ applications. Basically, the API library is a bunch of C routines to initialize the Python Interpreter, call into your Python modules and finish up the embedding. In part I, I demonstrated how we can call functions, classes and methods defined within Python modules. Then, we discussed the details of multi-threaded embedding, an issue C/C++ programmers usually face during the integration stage. One question was raised during the discussion: How does our C/C++ code communicate with the embedded Python module when they are running on separate threads/processes? See the article: "Embedding Python in C/C++: Part I".

I am going to explore alternative solutions to this problem, IPC mechanisms in particular. As in part I, the article will not teach the Python language systematically, rather it will describe how the Python code works when it comes up. The discussion will be focused on how to integrate Python modules with your C/C++ applications. I will take the same practical approach as in Part I, but here I will present some limited theoretical discussions, thanks to the nature of the topics in this part. For example, the topic of shared memory deserves some theory. Still, I will leave most of the discussions to Jeffrey Richter's classic book.

Again the source code I provide is portable, meaning that it's aimed to run on both Windows and Linux. In order to use the source code, you should install a recent Python release, Visual C++ (or GCC compiler on Linux). The environment I have used to test is: Python 2.4 (Windows and Linux), Visual C++ 6.0 (Windows) or GCC 3.2 (RedHat 8.0 Linux). With Visual C++, select the Release configuration to build, as Debug configuration requires the Python debug library "python24_d.lib", which is not delivered with normal distributions.

Background

Python is a powerful interpreted language, like Java, Perl and PHP. It supports a long list of great features that any programmer would expect, two of my favorite features are "simple" and "portable". Along with the available tools and libraries, Python makes a good language for modeling and simulation developers. Best of all, it's free and the tools and libraries written for Python programmers are also free. For more details on the language, visit the official website.

TCP/IP sockets for embedding

Python has implemented several IPCs, including socket, memory mapped file (MMAP), queue, semaphore, event, lock, mutex, and so on. We are going to study two forms of IPC in the following: TCP/IP socket and MMAP. This section discusses a simple TCP/IP client/server model. The next section will describe how C/C++ and Python modules use MMAP to communicate with each other.

Let us start from a simple application, which implements a TCP client in the C code to communicate with a TCP server within a Python module. Here is the complete source "call_socket.c":

On Windows, simply compile the C source and get the executable, which we call "call_socket.exe". To test this IPC, open a command window and start Python. Then import "py_socket_server" to start the Python TCP server. Open another command window, run "call_socket.exe". You will get the following outputs on the two windows:

The C code can run on both Windows and Linux platforms. It could be further simplified by removing the portability. Note that the checks on the validity of returns are omitted for brevity. The C source is self-explanatory, except that we have to bind the local address with the client, which is usually unnecessary on the client side. In this integration, however, without the binding, the Python server reports the following error:

It is fairly easy to use the above client/server model in multi-threaded embedding. Normally, we have two choices while running the Python module on a separate thread:

If you create the thread inside the Python module, place its server code in the Python thread function.

If you create the thread in the C/C++ code, call the Python server code from within the C thread function.

I leave this as an exercise to the reader. Refer to part I of this article sries for more details on the two approaches.

Shared memory (MMAP) for Python and C/C++

First, some theoretical preparation for this section. In Unix-like systems such as GNU/LINUX, shared memory segment and memory-mapped file (MMAP) are two different things. MMAP is memory-mapped file I/O. You can use MMAP as an IPC, but it is not very efficient, due to copying from each process' memory space to the disk file. In contrast, shared memory segment is a much faster form of IPC, because processes can share the memory segment in each of their address spaces. No disk copying or memory move-around.

Windows has implemented memory-mapped file (called "MMF") in a slightly different way. The MMF can be backed by either a user-defined disk file or by the system page file. When MMF is used as an IPC, Windows creates a named file-mapping (kernel) object. Through the kernel object, processes can map to the same disk file. This is the same as MMAP. But when MMF is backed by paging, this type of IPC can be very efficient. Because if you have got enough physical memory, paging will not be performed. It becomes a shared memory segment. Windows actually unifies MMAP and shared memory segment under the same cover of MMF! For more details on Windows implementation, refer to Jeffrey Richter's "Programming Applications for Microsoft Windows".

Now let's consider the following scenario. Somebody has written a Python module which is intended to run on a separate thread/process. It has defined an MMAP interface to communicate with the user of this module through MMAP. When we integrate it with our C/C++ application, we set up the MMAP interface for it and then start its execution. Our implementation on the client side is in "call_mmap.c". Here is the complete source:

The Python module "py_mmap" defines one class "MMAPShmem", which has one method run(). All it does is opening the disk file created by the C code and mapping it to the memory. Then the module can use the mapped file just as you use a normal file I/O. In each for loop, Python reads MMAP and prints its contents. Then, it overwrites to the MMAP. Note that the ten reads/writes are running in parallel with the ten writes of the main C thread.

Open a command window and run "call_mmap py_mmap MMAPShmem run". You should get the output as shown below:

Wrapper has created a MMAP for file 'input.data'
The Main thread has writen 0 to MMAP.
inDataFile size: 1024 MMAP size: 1024
Python thread read from MMAP: 567
Python thread write back to MMAP: 567
The Main thread has writen 1 to MMAP.
Python thread read from MMAP: 1
Python thread write back to MMAP: 567
The Main thread has writen 2 to MMAP.
Python thread read from MMAP: 2
Python thread write back to MMAP: 567
The Main thread has writen 3 to MMAP.
Python thread read from MMAP: 3
Python thread write back to MMAP: 567
The Main thread has writen 4 to MMAP.
Python thread read from MMAP: 4
Python thread write back to MMAP: 567
The Main thread has writen 5 to MMAP.
Python thread read from MMAP: 5
Python thread write back to MMAP: 567
The Main thread has writen 6 to MMAP.
Python thread read from MMAP: 6
Python thread write back to MMAP: 567
The Main thread has writen 7 to MMAP.
Python thread read from MMAP: 7
Python thread write back to MMAP: 567
The Main thread has writen 8 to MMAP.
Python thread read from MMAP: 8
Python thread write back to MMAP: 567
The Main thread has writen 9 to MMAP.
Python thread read from MMAP: 9
Python thread write back to MMAP: 567
Main thread waiting for Python thread to complete...
My thread is finishing...
Main thread finished gracefully.

Apparently, the C and Python code running on two separate threads are communicating through the MMAP file "input.dat". In this case, since we have used text I/O (compared to binary I/O), you can actually check the contents.

Points of interest

Our MMAP has not implemented synchronization, which is usually required for data protection with the shared memory. In practice, you would want to coordinate access to the shared memory by multiple threads/processes. Otherwise, exclusiveness cannot be guaranteed and the data you get from the shared memory may be unpredictable.

Conclusion

We have demonstrated that we can utilize Python/C API to integrate Python modules with our C/C++ applications effectively. Our primary focus was how to embed Python in multi-threaded applications. The IPC as a communication mechanism between Python and C/C++ modules has been discussed in great depth. This is the concluding part of the article series.

Share

About the Author

Jun is an experienced software architect. He wrote his first computer code on the tape machine for a "super computer". The tape machine reads holes on the black pape tape as source code. When manually fixing code, you need a punch and tranparent tape. To delete code, you block holes or cut off a segment and glue two ends together. To change code, you block old holes and punch new holes. You already know how to add new code, don't you? Anyway, that was his programming story in early 1980's.

Currently, Jun is an architect at GuestLogix, the global leader in providing onboard retail solutions for airlines and other travel industries. He is also the founder of Intribute Dynamics, a consulting firm specialized in software development. He has a personal blog site, although he is hardly able to keep it up to date.

In his spare time, Jun loves classic music, table tennis, and NBA games. During the summer, he enjoyes camping out to the north and fishing on wild lakes.