Accessing C from Python

Hacked by chrootstrap August 2003

Python is a Very High Level Language (VHLL) with powerful facilities for dynamic typing, reflection, interpretation, and modularity. It is also, however, very slow. While python code performs much faster than shell scripting it is slower than, say, Perl equivalents. It is runs very quickly, though, when accessing pre-compiled C code as built-in and extension modules.

Python is very easy to extend in C and building C modules is a very important skill for a Python programmer to have. By learning how to do special sections in C, Python becomes an excellent 'glue' language in the traditional of Tk and presents a powerful paradigm of prototyping everything in Python and then switching in C where necessary for library interfacing or for greatly increased performance.

You can create Python objects in C, but, I will focus here on how to make simple, imperative module functions. I would recommend making the most of the object-oriented facilities of Python rather than creating your objects in C. But, anytime you need to do something very quickly, seriously conserve memory, perform low-level functions not available in Python, or access a C API, I would advise doing it with a C module.

Here is a simple example that creates the module, mymod, which contains function, test. test prints out "Hello!" upon invocation:

#include "Python.h"

PyObject * mymod_test(PyObject *self) {
puts("Hello!");
Py_INCREF(Py_None);
return Py_None;
}

static PyMethodDef mymod_methods[] = {
{"test", (PyCFunction)mymod_test, METH_NOARGS, "Prints test string.\n"},
{NULL, NULL, 0, NULL}
};

DL_EXPORT(void) initmymod(void)
{
Py_InitModule3("mymod", mymod_methods, "Provides a test function.\n");
}

mymod_test is the actual function invoked when test is called from Python. All Python functions return a Python object. We don't have anything important to return, so we just return None, which is available in C as Py_None. Python employs reference counting for garbage collection. There is only one None object and if we return it and no one retains it, it will have its reference count decreased (i.e. if no one says "x = mymod.test()"). Once an object's reference count gets to zero, the object is released. So, to prevent the release of None we need to increment it's reference count to balance the decrement that will happen to it. Understanding this is important. If you are concentrating on the C code and just using Python objects as transports, you aren't likely to get burned by this, though.

Once you are in C, you are pretty much free to do anything you want. You can call C APIs, allocate and free memory, etc. In the case of our example we just puts some text. Your function gets registered in a method table as the module is initialized. The example shows the basic sequence for doing this. The string explainations are actually help strings that will be visible from Python. Note that your PyMethodDef table will need to have an empty entry at its end. initmymod is init + whatever the name of your module is. This needs to be consistent as it is automatically invoked by python.

To compile the module, you can build it into Python or build it as an extension module. This varies quite a bit, especially with Windows, but, a basic command for building extension modules with GCC is (for a source file named "mymod.c"):

gcc -shared -I/usr/include/python2.3 mymod.c -o mymod.so

In the Python interpreter, you can now load the module in Python by entering "import mymod" in the same directory as the module (you can also put it your Python library directory or add its location to the PYTHONPATH environment variable). You can run the module by entering "mymod.test()" and you can see the help strings by entering "help mymod".

Okay, now handling arguments in the function is just a small modification.

PyObject * mymod_test(PyObject *self) {
/* .... */

static PyMethodDef mymod_methods[] = {
{"test", (PyCFunction)mymod_test, METH_NOARGS, "Prints test string.\n"},
becomes
PyObject * mymod_test(PyObject *self, PyObject *args) {
/* .... */

static PyMethodDef mymod_methods[] = {
{"test", mymod_test, METH_VARARGS, "test(int, object)\n"},

Now we are ready to handle arguments. Note that every Python object is represented as a Py_Object struct. In the case of arguments passed to a C function, the args is actually a Python tuple. You can get the tuple's size with the PyType_GET_SIZE macro (e.g. "int size = PyTuple_GET_SIZE(args);"). Extracting the arguments from the tuple is usually done with the PyArg_ParseTuple function. That function takes a format string and addresses for putting the tuple's data:

PyObject * mymod_test(PyObject *self, PyObject *args) {
int x;
PyObject *obj;

if (! PyArg_ParseTuple(args, "iO", &x, &obj)) {
return NULL;
}
printf("%d\n", x);
return obj;
}

This takes two arguments, an integer (the "i" in the format string) and an object (the "O"). It will set x to the integer and return the same object that was give to it as the second argument. PyArg_ParseTuple automatically prepares a reasonable Python exception if the tuple doesn't parse correctly and returning NULL tells the Python engine that an exception occurred. We can also explicitly throw exceptions like this:

    if (x > 1000) {
PyErr_SetString(PyExc_Exception, "The number is too large!");
return NULL;
}

A convenient function, PyErr_SetFromErrno, sets up an exception from the standard C library's errno:

    if (open("d3iohwdkionc290j4fj", O_RDONLY) < 0) {
PyErr_SetFromErrno(PyExc_Exception);
}

Additional exception types can be found in pyerrors.h where your include files are located (e.g. /usr/include/python2.2). Lots of format variables are available. For example, "s" is a char * string, "d" is a double, and "s#" is a useful format returning two parameters, the char * string and an int length (this way you can pass strings with embedded NULL characters).

Now, there is a function that is converse to extracting arguments; it prepares Python objects for returning. This function is Py_BuildValue and it takes a format string, variables, and returns a PyObject. The format codes are the same as before. If two or more format codes are used (i.e. if there are two or more variables being stored), it returns a tuple containing objects for those variables. Note that Py_BuildValue takes the actual values of ints, doubles, etc rather than their addresses. The following returns two int objects in a tuple. Their value is the same as was passed in args

    int x;
PyObject *num;
PyObject *tuple;

PyArg_ParseTuple(args, "i", &x);
num = Py_BuildValue("i", x);
tuple = Py_BuildValue("iO", x, num);
return tuple;

An interesting feature of formatting for both parsing and building values is the ability to use parentheses and brackets to handle tuples, list, and dictionaries. For example, "[i,i,i]" would handle three integers in the format of a list, either parsing or returning a list. And, "{s:i}" would handle a dictionary with a char * key and an int value. Cool beans!

Although Python is multithreaded, this is at the bytecode level. Only one C function is actually entered at a time in Python unless you relinquish the thread temporarily (as some blocking I/O functions do). So do this with the macros, Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS. To leave such a block in the middle, you must insert Py_BLOCK_THREADS:

    Py_BEGIN_ALLOW_THREADS
/* .... */
if (surprise_condition) {
Py_BLOCK_THREADS
PyErr_SetFromErrno(PyExc_IOError);
return NULL;
}
/* .... */
Py_END_ALLOW_THREADS

These macros are normally harmless if you don't have threading enabled; they just don't release the thread. Threading inside a Python function is pretty complex, but, you can easily handle a blocking function with those macros (note that they don't have a semicolon; they actually create their own bracketed blocks (which is why you jump out with Py_BLOCK_THREADS)).

One thing that the C bridge is really useful for is calling C libraries. For example, this will produce an SDL window on the screen (this will require that the SDL libraries are installed):

#include "SDL.h"

SDL_Surface *sdl_screen;

PyObject * mymod_test(PyObject *self, PyObject *args) {
int x;

if (! PyArg_ParseTuple(args, "i", &x)) {
return NULL;
}
SDL_Init(SDL_INIT_VIDEO);
sdl_screen = SDL_SetVideoMode(x, x, 8, SDL_SWSURFACE);
Py_INCREF(Py_None);
return Py_None;
}

To build it, you need to make sure you link in the library you are using (it won't complain if you omit the "-lSDL" until you actually load your module in Python):

gcc -shared -I/usr/include/python2.3 -I/usr/include/SDL mymod.c -lSDL -o mymod.so

Well, you now know enough to easily build your own C-based Python modules. It really is quite easy to do. The Python C library is fairly extensive and you'll find there is a variety of ways to do things. You can find the API's reference here:

http://www.python.org/doc/current/api/api.html

Happy hacking!

1