Apr 122011
 

Mex files are an excellent tool to have in the HPC toolchest. Matlab is a great tool to prototype in; You don’t have to worry too much about allocating variables and all of those things that can take time in a compiled language … you can just get on with the job.

But Matlab is slow. Particularly when you are dealing with large datasets. This is where mex files come in. You can take advantage of the ease of changing parameters, easy saving/loading of datasets, quick visualisation, etc but also get the computational backend grunt of specific hardware coupled with compilers that understand the hardware.

Enough talk.

The hardest thing with Mex file programming is getting your variables into and out of your C/C++/Fortran program. Matlab has plenty of examples for doing this, I’m just providing additional examples. You get data in and out of Matlab and C/C++/Fortran via the

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
// Matlab includes
// Note: These are located under MATLAB/extern/include
#include "mex.h"
#include "matrix.h"
 
// Standard includes
#include <math.h>
 
#define GRAD_A_RAD(g) ((g)*0.0174532925199433)
 
// Forward declaration
bool ConvertLL2UTM(double Lat, double Long, double *northing, double *easting);
 
 
/*  The Matlab call is as follows:
 *  [Easting, Northing] = LL2UTM(Longitude, Latitude);
 *
 *  Lat or Long can be a single value or they can be arrays.
 *  As a result, Easting/Northing can be scalars or vectors.
 *  
 */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    // Array size checking
    int numberOfElements_Lat;
    int numberOfElements_Long;
    int n, m;
    // Inputs arrays
    double *latitude;
    double *longitude;
    // Output arrays
    double *northing;
    double *easting;
 
    // First, verify the correct number of input/output parameters
    if(nrhs!=2) 
    {
        mexErrMsgIdAndTxt("LL2UTM", "Two inputs required.");
    }
    if(nlhs!=2) 
    {
        mexErrMsgIdAndTxt("LL2UTM","Two outputs required.");
    }
 
    /////////////////////////////
    // Query the input parameters
    /////////////////////////////
 
    // Get the size of the first input parameter : Latitude
    numberOfElements_Lat = mxGetNumberOfElements(prhs[0]);
 
    // Get the size of the second input parameter: Longitude
    numberOfElements_Long = mxGetNumberOfElements(prhs[1]);
 
    // m must equal n
    if ( numberOfElements_Lat != numberOfElements_Long )
    {
        mexErrMsgIdAndTxt("LL2UTM", "Two inputs are not equal dimensions.");
    }
 
    // Get the latitude and longitude
    latitude = mxGetPr(prhs[0]);
    longitude = mxGetPr(prhs[1]);
 
    // Get the sizes of the input such that the output matches
    // Get the number of columns
    n = mxGetN(prhs[0]);
    // Get the number of rows
    m = mxGetM(prhs[0]);
 
    ///////////////////////////////
    //  Setup the output parameters
    ///////////////////////////////
    plhs[0] = mxCreateDoubleMatrix(m, n, mxREAL);
    easting = mxGetPr(plhs[0]);
 
    plhs[1] = mxCreateDoubleMatrix(m, n, mxREAL);
    northing = mxGetPr(plhs[1]);
 
    // Perform the calculations
    for (int i = 0; i<numberOfElements_Lat; i++)
    {
        ConvertLL2UTM(latitude[i], longitude[i], &northing[i], &easting[i]);
    }
}
 
 
/*
 *  Convert Latitude/Longitude to UTM's northing and easting
 *
 *  I would not take this conversion for gospel - works for Australia
 *  I'd test it before relying on it - just sayin' :P
 */
bool ConvertLL2UTM(double Lat, double Long, double *northing, double *easting)
{
    const double a  = 6378137.0;
    const double ee = 0.00669437999;
    const double k0 = 0.9996;
    const double e2 = ee / (1-ee);
 
    // -180.00 .. 179.9;
    double LongTemp = (Long+180)-int((Long+180)/360)*360-180;
    double LatRad  = GRAD_A_RAD(Lat);
    double LongRad = GRAD_A_RAD(LongTemp);
    double LongOriginRad;
    double N, T, C, A, M;
 
    //Make sure the longitude is between -180.00 .. 179.9
    int zona = int((LongTemp + 180)/6.0) + 1;
    if (Lat >= 56.0 && Lat < 64.0 && LongTemp >= 3.0 && LongTemp < 12.0)
        zona = 32;
 
    // Special zones for Svalbard
    if (Lat >= 72.0 && Lat < 84.0) 
    {
        if (LongTemp>=0.0  && LongTemp<9.0)
            zona = 31;
        else if (LongTemp>=9.0 && LongTemp<21.0)
            zona = 33;
        else if (LongTemp>=21.0 && LongTemp<33.0)
            zona = 35;
        else if (LongTemp>=33.0 && LongTemp<42.0)
            zona = 37;
    }
    LongOriginRad = GRAD_A_RAD((zona-1)*6 - 180 + 3);
 
    N = a/sqrt(1-ee*sin(LatRad)*sin(LatRad));
    T = tan(LatRad)*tan(LatRad);
    C = e2*cos(LatRad)*cos(LatRad);
    A = cos(LatRad)*(LongRad-LongOriginRad);
    M = a*((1 - ee/4 - 3*ee*ee/64 - 5*ee*ee*ee/256)*LatRad
              - (3*ee/8     + 3*ee*ee/32 + 45*ee*ee*ee/1024)*sin(2*LatRad)
              + (15*ee*ee/256 + 45*ee*ee*ee/1024)*sin(4*LatRad)
              - (35*ee*ee*ee/3072)*sin(6*LatRad));
 
    *easting = (double)(k0*N*(A+(1-T+C)*A*A*A/6
                   + (5-18*T+T*T+72*C-58*e2)*A*A*A*A*A/120) + 500000.0);
    *northing = (double)(k0*(M+N*tan(LatRad)*(A*A/2+(5-T+9*C+4*C*C)*A*A*A*A/24
                   + (61-58*T+T*T+600*C-330*e2)*A*A*A*A*A*A/720)));
 
    if (Lat < 0) 
    {
        //10000000 metre offset for southern hemisphere    
        *northing += 10000000; 
    } 
    return true;
}

Note that prhs[x] denotes input variables and plhs[x] denotes output variables.

And the associated Makefile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/make
 
# Path to where Matlab is installed
MATLAB = /home/dwyer3/Software/R2010b
 
# The source file with the mexFunction
SRC = Main.cpp
 
# What is the name for the mex file?
BINARY = LL2UTM
 
# I want to generate the mex file for a linux x86_64 
MEX_SUFFIX = mexa64
 
# What compiler to use?
CC = icpc
 
 
OPTS = -DMATLAB_MEX_FILE -fPIC -fno-omit-frame-pointer -DMX_COMPAT_32 -O3
OBJS = $(SRC:.cpp=.o)
 
INC = -I$(MATLAB)/extern/include -I$(MATLAB)/simulink/include
LIBS = -shared -Wl,--version-script,$(MATLAB)/extern/lib/glnxa64/mexFunction.map -Wl,--no-undefined
LIBS += -Wl,-rpath-link,$(MATLAB)/bin/glnxa64 -L$(MATLAB)/bin/glnxa64 -lmx -lmex -lmat 
LIBS += -lpthread
 
.SUFFIXES: .cpp .o
 
.cpp.o:
        $(CC) $(OPTS) $(INC) -c $< -o $@
 
Go: $(OBJS)
        $(CC) $(OBJS) $(OPTS) -o $(BINARY).$(MEX_SUFFIX) $(LIBS)
        @echo Binary created!!
 
clean:
        set nonomatch; rm -f $(BINARY).$(MEX_SUFFIX) *.o

To compile:

[dwyer3@hpc-md LL2UTM]$ make
icpc -DMATLAB_MEX_FILE -fPIC -fno-omit-frame-pointer -DMX_COMPAT_32 -O3 -I/home/dwyer3/Software/R2010b/extern/include -I/home/dwyer3/Software/R2010b/simulink/include -c Main.cpp -o Main.o
icpc Main.o -DMATLAB_MEX_FILE -fPIC -fno-omit-frame-pointer -DMX_COMPAT_32 -O3 -openmp -o LL2UTM.mexa64 -shared -Wl,--version-script,/home/dwyer3/Software/R2010b/extern/lib/glnxa64/mexFunction.map -Wl,--no-undefined -Wl,-rpath-link,/home/dwyer3/Software/R2010b/bin/glnxa64 -L/home/dwyer3/Software/R2010b/bin/glnxa64 -lmx -lmex -lmat -lpthread
Binary created!!

You will see that there is now an executable mex file:

[dwyer3@hpc-md LL2UTM]$ ll
total 40
-rwxrwxr-x. 1 dwyer3 dwyer3 12407 Apr 12 12:38 LL2UTM.mexa64

In Matlab, I make sure that my “Current Folder” is the same as where the executable mex file just created (or add it to the Matlab path [a way to do this is to add a directory via /$MATLAB/toolbox/local/matlabrc.m] – I have a directory of Mex files for specific jobs … the above being one). I have a list of latitude and longitude coordinates

lat <14316603x1 double>
long <14316603x1 double>

>> tic; [Easting, Northing] = LL2UTM(lat, long); toc;
Elapsed time is 3.491809 seconds.

Now, I don’t really know whether this is good or bad since I don’t have a Matlab version of LL2UTM. However, I know I can be very happy if I parallelise the for loop in the Main.cpp

On line 81, just prior to the for loop, add
#pragma omp parallel for

In the Makefile, add
-openmp
to the end of line 19.

Recompile the mexfile and run the example above again in Matlab
>> tic; [Easting, Northing] = LL2UTM(lat, long); toc;
Elapsed time is 0.304706 seconds.

Now, you can see why I like Mex files. It’s not a perfect scaling, but I get results quicker. Note that the -openmp is an Intel Compiler thing. To see the same for GNU compilers, please see my OpenMP Introduction post.

Speed Tip:
Currently, the above uses math.h. I should be using mathimf.h (particularly given the number of times those math functions are called). If I use Intel’s -limf, note that these will have to be compiled statically so that there is no version mismatch between my version and Matlab’s version. To see an example, please see my MATLAB MKL MEX post

Thanks to observations by Anthony Rasmussen, the above code can be sped up. He was quite right to point these flaws out and in order to keep up the pretence that I know what I’m talking about, an alternative version of the above can be found here. 🙂

Serial:
>> tic; [Easting, Northing] = LL2UTM(lat, long); toc;
Elapsed time is 0.993255 seconds.

Parallel:
>> tic; [Easting, Northing] = LL2UTM(lat, long); toc;
Elapsed time is 0.222407 seconds.

I’ll put a cuda version up soon … stay tuned.

  4 Responses to “Mex Files: Introduction”

  1. In ConvertLL2UTM(), lots of multiplication and division by numbers that are a factor of two. Does replacing these ops with bit-shifts help, or the function doesn’t get hit enough for it to matter? (DISCLAIMER: large glasses of sav blanc = 5. Anger with computers at large = high.)

    • Hey man, I see your sav. and raise a semillon.

      You’re right, but personally I’d go with a const declaration (as opposed to a #define) of the inverse. I used to be a fan of bit shifting, but according to the Intel Architecture Optimisation Guide, the bit shift requires 2 cycles (1 latency, 1 throughput). Given that a multiply is also 1 latency, 1 throughput but also has the advantage of vectorisation (see here – admittedly I’m not using here) and bit shifting is not a vectorial operation (currently), it is probably much of a muchness. Probably worth a micro-benchmark. Hmmm, an hour of tomorrow just got booked solid.

      I like code to be easily readable – like a book. Personally, me seeing bit shifting and other tricks makes you do a double take. The same goes with relying on obscure IEEE754 rules. Easily overlooked to the inexperienced eye.

      Looking at it again, what I’d do to milk the flops would be to bring the sin, cos and tan operations together. Bust out some trig identities. Would probably register some of the ee and A combinations.

      I get a sick pleasure out of these sort of things.

  2. Mark, sober again. Point taken re bit-shifting. Makes for code that is hard to read.

    Since ee is const, can’t you pre-evaluate things like (15*ee*ee/256 + 45*ee*ee*ee/1024) as another const ?

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

Human Conf Test * Time limit is exhausted. Please reload CAPTCHA.