typedef struct { int count; int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; #if (MPI_STATUS_SIZE > 4) int extra[MPI_STATUS_SIZE - 4]; #endif } MPI_Status;so I had declared the foreign struct in my binding source (line 65):
(cffi:defcstruct MPI_Status (count :int) (MPI_SOURCE :int) (MPI_TAG :int) (MPI_ERROR :int))This basically specifies that MPI_Status is a C struct with int slots, and says that the first 4 bytes (int) of the struct is the count field, the next int is the MPI_SOURCE, and so on. Basically a straightforward translation of the C struct definition. This worked great -- until I tried to use the bindings with MPICH2. Running the CL-MPI test suite resulted in mysterious crashes and freezes -- not a happy transition to from MPICH to MPICH2! After thrashing around trying to isolate the problem, it turns out that in MPICH2, the MPI_Status struct definition had been changed to:
typedef struct MPI_Status { int count; int cancelled; int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; } MPI_Status;Oops! No wonder everything got messed up -- Note the new 'cancelled' field inserted after the count field. The fields of the struct no longer aligned with the defcstruct defintion above!
OK, so what to do? I could create two different versions of the defcstruct, one which works with MPICH1.2 and the other for MPICH, and then use conditional compilation (e.g., #+MPICH1 #+MPICH2...). The problem with that solution is that if the C definition of MPI_Status changes yet again in a future MPICH release, then we are back to square one.
In this case, the solution is to not directly use the defcstruct to create a foreign struct definition. Instead, CFFI-Grovel can be used to automagically generate the correct defcstruct. This is the file I created to use CFFI-Grovel for CL-MPI. Here (Line 47), I add the CFFI-Grovel declaration:
(cstruct MPI_Status "MPI_Status" (count "count" :type :int) (MPI_SOURCE "MPI_SOURCE" :type :int) (MPI_TAG "MPI_TAG" :type :int) (MPI_ERROR "MPI_ERROR" :type :int))This states that I want to use a foreign (C) struct named "MPI_Status", giving it the Lisp name MPI_Status, and that I am interested in 4 integer fields: count, MPI_SOURCE, MPI_TAG, MPI_ERROR. This declaration does not specify anything about the ordering of the slots in the C struct. It also does not say anything about the completeness of this mapping. In other words, the C MPI_Status struct can contain other fields which are not mentioned in this CFFI-grovel cstruct definition. In contrast, the original CFFI defcstruct used above is a more concrete declaration, which completely specifies the memory layout of the C struct we are mapping. The CFFI-Grovel based code does the right thing for both MPICH1.X and the latest MPICH2, and if future versions of MPICH2 change the MPI_Status struct definition by chaning field order or adding fields, no problem. Lesson learned: before directly declaring a foreign object with CFFI, consider whether CFFI-Grovel might be more appropriate. Using CFFI-Grovel is more robust to changes in the library to which we're binding, and if I had done this to begin with, it would have saved me several hours of painful debugging.
Interesting illustration with cstruct; I didn't realize that it would always pull out the right variables. That is useful.
ReplyDeleteOne drawback of CFFI-grovel is that the documentation and examples are quite thin. I had to use IOLib for a sample file to copy.
As one of the original authors of CFFI-grovel, though I've since given up maintainership and the project has moved on, I'd like to say - thanks for the kind words!
ReplyDeleteProbably someday I should get around to writing and donating the documentation to go with the donated code...
lhealy - I ended up referring to IOLib for examples too. I found that once I saw some of the grovel files in IOLib and got going, the sparseness of docs wasn't a problem, but it would have saved a little time if the docs included more samples.
ReplyDeletedankna - Thanks for making CFFI-grovel available! It's a great tool.