Mar 262011
 

I like pThreads. I have come across numerous blogs that talk about pThreads (IEEE POSIX 1003.1c) being a difficult way of achieving parallelism, but I have to disagree. Although the basics are a little more involved than programming with OpenMP, pthreads get easier as the problem you are trying to solve gets harder. On the other hand OpenMP gets more difficult. To me, pthreads make sure you know exactly what you are doing with all variables at all times. As a result, all variables behave exactly as you want them to.

It can be useful to make sure you have the relevant man pages installed:

sudo aptitude install manpages-posix-dev
man 3 pthread_join

Let’s get started with a simple example (HelloWorld.cpp)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
 
int numThreads;
pthread_t *threads;
 
/*
 *  A function that will take the integer parameter
 *  that is the threadID
 *
 */
void *say_hello_function( void *arg )
{
    long long sig1 = reinterpret_cast<long long> (arg);
    int threadNum = static_cast<int>(sig1);
 
    fprintf(stdout, "Hello from Thread %dn", threadNum);
    return NULL;
}
 
 
/*
 * Program kicker
 */
int main(int argc, char **argv)
{
    if (argc != 2)
    {
        fprintf(stdout, "Usage: Hello <n>n");
        fprintf(stdout, "where n = number of threadsn");
        return 1;
    }
    // initialise the number of threads
    numThreads = 0;
    // Get the number of threads to spawn
    numThreads = atoi(argv[1]);
 
    if (numThreads < 1)
    {
        fprintf(stdout, "Usage: Hello <n>n");
        fprintf(stdout, "where n = number of threadsn");
        fprintf(stdout, "Hint: try and keep n > 0n");
        return 1;
    }
 
    // Create an array of threads
    threads = new pthread_t[numThreads];
 
    // For each thread 
    for ( int i = 0; i < numThreads; i++ )
    {
        // Create the thread and execute "say_hello_function"
        pthread_create(&threads[i], NULL, say_hello_function, (void*)i);
    }
    // Wait for all threads to finish
    for ( int i = 0; i < numThreads; i++ )
    {
        pthread_join(threads[i], NULL);
    }
 
    return 0;
}

g++ -O2 -c HelloWorld.cpp -o Hello.o
g++ -O2 Hello.o -o Hello -lpthread
Binary created!!

icpc -O2 -c HelloWorld.cpp -o Hello.o
icpc -O2 Hello.o -o Hello -lpthread
Binary created!!


./Hello 4
Hello from Thread 0
Hello from Thread 3
Hello from Thread 2
Hello from Thread 1

That code looks to be a little rough but let’s examine it a little. Essentially we now have a portable piece of code that will run on compute nodes on any HPC installation. If I run this code on a machine with 8 cpus (2 sockets of 4 core dies) and execute ./Hello 8, whatever is in that function say_hello_function will be executed in parallel. If I move to a node with 12 cores – ./Hello 12 . If I have fairly inefficient code and there is hyperthreading turned on, ./Hello 24.

We start all the threads, being able to pass an parameter and we then wait for all threads to finish.

So we now what to do something a little more involved than saying hello. Let us look at a simple dot product (naive version) to see how we can get threads to start crunching numbers for us.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
 
 
/*
 *  Struct for dot product (each thread will see this)
 */
struct ThreadDotData 
{
    double *a;
    double *b;
    int n;
    int numThreads;
    double *sum;
};
 
 
// An array of threads
pthread_t *threads;
 
// The Data
ThreadDotData dotData;
 
 
/*
 *  Dot Product
 *
 *  Executed by all threads
 */
void* Dot_Product_Thread(void *params)
{
    // Retrieve the argument
    long long sig1 = reinterpret_cast<long long> (params);
    // Which processor am I? 
    int threadNum = static_cast<int>(sig1);
 
    // How many processors do I have?
    int procCount = dotData.numThreads;
    // How much work will each processor do?
    int chunk = dotData.n/procCount;
    // There is always that little bit left over
    int extra = dotData.n % procCount;
    int start, end;
 
    // What is my starting index?
    start = threadNum*chunk;
    // What is the index I stop working on?
    end = start + chunk;
    // if we're the last thread, take on any extra cycles
    if (threadNum == procCount-1)
        end += extra;
 
    double mySum;
    mySum = 0;
    for ( int i = start; i < end; i++ ) 
    {
        mySum += (dotData.a[i] * dotData.b[i]);
    }
    dotData.sum[threadNum] = mySum;
    return NULL;
}
 
 
/*
 *
 *
 *
 */
int main (int argc, char *argv[])
{
    if (argc != 2)
    {
        fprintf(stdout, "Usage: Hello <n>n");
        fprintf(stdout, "where n = number of threadsn");
        return 1;
    }
    // initialise the number of threads
    int numThreads = 0;
    // Get the number of threads to spawn
    numThreads = atoi(argv[1]);
 
    if (numThreads < 1)
    {
        fprintf(stdout, "Usage: Hello <n>n");
        fprintf(stdout, "where n = number of threadsn");
        fprintf(stdout, "Hint: try and keep n > 0n");
        return 1;
    }
 
    // Create an array of threads
    threads = new pthread_t[numThreads];
 
    // How much work to do
    long n = 100000000;
    // Declare
    double *x;
    double *y;
    // Allocate
    x = new double[n];
    y = new double[n];
    // Initialise
    for ( int i = 0; i < n; i++ )
    {
        x[i] = 1.0;
        y[i] = x[i];
    }
 
    // Set the global structre
    dotData.n = n; 
    dotData.a = x; 
    dotData.b = y; 
    dotData.numThreads = numThreads;
    dotData.sum = new double[numThreads];
 
    // Start all the threads
    for( int i = 0; i < numThreads; i++ )
    {
        // Also pass to each thread, which ID it is
        pthread_create(&threads[i], NULL, Dot_Product_Thread, (void *)i);
    }
 
    double totalSum = 0.0;
    // Wait for the threads to finish
    for( int i = 0; i < numThreads; i++ )
    {
        pthread_join(threads[i], NULL);
        totalSum += dotData.sum[i];
    }
 
    // Display the results
    fprintf(stdout , "Sum = %fn", totalSum);
 
    return 0;
}

Now, there is nothing there that is particularly difficult to understand but I will summarise what I think are a few crucial point to walk away with
1. Although we have our arrays x and y being of local scope, they are made visible to the threads because of the struct.
2. It is safe for threads to read from the same piece of data … the issue is when writing to the same WORD [C++0x guarantee]
3. Dot product requires each thread to return their portion of results. I have done this by setting up the struct with the sum array
4. The sum array is then added (or reduced) by the master – a serial portion.

Now, the sum array in our struct is not necessary. Indeed, some would say that we should be using a mutex lock here. This is where I begin to disagree with a lot of pThread coders. Much later as an exercise, I’m going to adapt this code such that it uses sockets to run multi-node. Later still, I will use sockets to communicate across many private networks independently so, theoretically, you can go multi-rack efficiently. If I were to use a mutex, the code architecture would not scale beyond a single motherboard.

But the mutex is useful. But in different ways as demonstrated by the following code.

To be continued …

  3 Responses to “pThread: Introduction”

  1. Nice post Mark. I’ll be dropping by your blog from time to time!

    • Thanks Anthony. This one is a little light on content at the moment as I’m trying to pick good examples for the next iteration.

      I’ve been dropping in on http://blog.xlrantlabs.com/ regularly for quite a while now. Certainly have a better flair for writing than myself.

      • Keep up the good work! Hope QUT is treating you well.

        My blog is getting a bit of traffic right now because I have been posting about Shiogama, Japan. It’s where Nicole used to live. We were there last year for a family holiday, and were due to return in a couple of weeks time. Thankfully all our friends are physically ok. Fate is a wondrous thing.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

Human Conf Test * Time limit is exhausted. Please reload CAPTCHA.