Mar 082011
 

Quick Guide with using GPUMat on Lyra

GPUMat
GPUMat Forums
GPUMat Userguide

To start using GPUMat (open source toolbox for GPU-Matlab programming), do the following:

Create a file called “interactive.submit” and put the following into it

#!/bin/bash -l
 
#PBS -N Interactive
#PBS -l walltime=1:00:00
#PBS -l mem=2000mb
#PBS -l ncpus=1
#PBS -l gpu=1

At the command line, on lyra, type:
$qsub -I interactive.submit
Not that in the above, that -I is a capital “i”

This should get you a nice, interactive job on our GPU node.

Type:
$module load cuda
$module load matlab/2010a

Create a file called FirstTest.m

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
function FirstTest()
 
N = 10:10:5000;
timecpu = zeros(1,length(N));
timegpu = zeros(1,length(N));
index=1;
for i = 1:size(N,2)
    n = N(i);
    Ah = single(rand(n,n)); % CPU
    A = GPUsingle(rand(n,n));    % GPU
 
    %% Execution on GPU
    tic;
    A.*A;
    GPUsync;
    timegpu(index) = toc;
 
    %% Execution on CPU
    tic;
    Ah.*Ah;
    timecpu(index) = toc;
 
    % increase index
    index = index +1;
    fprintf(1, '%dn', n);
end
 
speedup = timecpu./timegpu;
 
save('times.mat', 'N', 'speedup');

Start Matlab:
$matlab -nojvm

If you type
>>GPUinfo
you should see

There are 4 devices supporting CUDA
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20

Device 0: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

Device 1: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

Device 2: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

Device 3: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

To set a working GPU, type:
>>GPUstart

And select a GPU to work with. Please note that there is a problem here. If somebody is already using GPU0, and you select GPU0, both you and the other person will suffer with performance and if you call cudaMalloc at the same time the second user to call cudaMalloc will error out.

It appears that it is best that you let the NVidia driver automatically select the GPU device for you. Apparently, to do otherwise is considered bad coding habit. I can see that this is best if you are on a homogeneous system, but if you had several GPUs of different generations/makes/models, how does the driver pick which one? Do you always want the most powerful?


Copyright gp-you.org. GPUmat is distribuited as Freeware.
By using GPUmat, you accept all the terms and conditions
specified in the license.txt file.

Please send any suggestion or bug report to gp-you@gp-you.org.

Starting GPU
- GPUmat version: 0.270
- Required CUDA version: 3.2
There are 4 devices supporting CUDA
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20

Device 0: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

Device 1: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

Device 2: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes

Device 3: "Tesla S2050"
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 2817982464 bytes
- Your system has multiple GPUs installed
-> Please specify the GPU device number to use [0-3]: 0
- CUDA compute capability 2.0
...done
- Loading module EXAMPLES_CODEOPT
- Loading module EXAMPLES_NUMERICS
-> numerics20.cubin
- Loading module NUMERICS
-> numerics20.cubin
- Loading module RAND

You are now ready to start computation.
>>FirstTest


10
20
30
....
4990
5000

Copy the .mat file across to your local machine and plot
Speedup

Offtopic

Gee, that performance cliff around the 2750 matrix looks interesting. Here is another graph that concentrates on it:
Speedup_Window_Label

  One Response to “GPUMat on cl1n041”

  1. I switched from GPUmat to Jacket. The $350 cost for Jacket is well worth the added functionality, faster performance, and reduced hassle with all the GPUmat bugs. Just my two cents.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

(required)

(required)

Human Conf Test * Time limit is exhausted. Please reload CAPTCHA.