So you have some Matlab code that takes a long time to run. You have access to a compute cluster but are not quite sure how to yield the power of it. I hope this blog sorts that little issue out. Please contact me if there is something that is not clear and needs clarification. I’ve been doing this for 14 years so may not remember the sticky points.
For this series, I’m going to be using a Ray Tracing example of the famous Cornell box. I have adapted the code from Kevin Beason’s Blog. I have rewritten his code in Matlab and I feel dirty for having done so. I needed a simple, high compute example. I hope nobody holds it against me.
The images we will be creating will look similar to these
All the code that is used on this page is located: HERE
Typically, you have a piece of Matlab code that looks like this:
function RayTrace(samples, width, height) warning off all; % The width and height of the final image w = width; h = height; % Define the scene % Left Wall/Sphere spheres(1) = Sphere(1e5, [1e5+1;40.8;81.6], [0;0;0], [.75;.25;.25], 0); % Right Wall/Sphere spheres(2) = Sphere(1e5, [-1e5+99;40.8;81.6], [0;0;0], [.25;.25;.275], 0); ....[Code Cut for Brevity].... end % y toc; % Combine the Red, Green and Blue Channels RGB = cat(3, cr, cg, cb); % Display an image of our scene image(RGB); % We are done. end
Your “Main” function takes zero or more parameters, you compute and you do something with the result (a simple visualisation/statistic). Getting this problem to run in PBS is really quite simple.
The first thing you want to do is define a Matlab script file that takes no parameters and returns no results. In order to do so, we need to change the original script file just a little. We don’t want any graphics popping up … we can do that on our local machines later.
function [RGB] = FirstRayTrace(samples, width, height) warning off all; % The width and height of the final image w = width; h = height; % Define the scene % Left Wall/Sphere spheres(1) = Sphere(1e5, [1e5+1;40.8;81.6], [0;0;0], [.75;.25;.25], 0); % Right Wall/Sphere spheres(2) = Sphere(1e5, [-1e5+99;40.8;81.6], [0;0;0], [.25;.25;.275], 0); ....[Code Cut for Brevity].... end % y toc; % Combine the Red, Green and Blue Channels RGB = cat(3, cr, cg, cb); % Display an image of our scene % image(RGB); % We are done. end
Notice that we now return the results that we need to save and I have commented out the line of code that pops up some graphics?
This file is defined at myRun.m
function myRun() % Define a 'unit' of work [RGB] = FirstRayTrace(50, 100, 80); % Instead of immediately displaying graphics, save the graphics to a file. imwrite(RGB, 'MyFirstFile_50_100_80.png', 'png'); % You can also save the Matlab variable to file: save('RGB.mat', 'RGB'); % We have finished, so tell Matlab to quit quit;
The above code will three things:
1) Run your ‘simulation’
2) Save the resulting image directly to file (png)
3) Save the data to a .mat file in case I want to look at it later on my local machine.
This looks like too much work right? You could easily put those changes into the original RayTrace.m file. Well, it sorta is too much work but I’m going to be showing you some funky features later so please bear with me.
In order for this to run in PBS, we need to define another file. A file that PBS uses to schedule what resources you need and what it is going to do.
1 2 3 4 5 6 7 8
#!/bin/bash -l #PBS -N RayTrace #PBS -l walltime=3:00:00 #PBS -l mem=800mb module load matlab cd $PBS_O_WORKDIR matlab -nodisplay -r myRun
Line 1: We need this line. It is in all PBS scripts. The “-l” is a minus capital L
Line 2: Give our Matlab job a name. Makes things easier when looking at how many jobs we have running
Line 3: How many hours do we need … 3 hours. Format:HHHH:MM:SS
Line 4: How much memory do we need … I only need 800mb (I could also use 2gb)
Line 6: I know that I need the Matlab module (environment)
Line 7: I need to make sure that PBS changes directory to where my files are. I could do this by going
or, PBS tries to make things easier for us by using the $PBS_O_WORKDIR which means, “change to the directory where this submit.pbs file is located, the qsub command is run from”
Line 8: This is where we run matlab. We do not want a display (uses too much memory and not required as we can’t see anything anyway) and I want to run the myRun.m file we created just before.
Everything is set up. The only thing we need to do now is to submit our job to the batch queue.
The beauty of the compute cluster and PBS is that now, I can turn off my PC and go home while I’m still working as quickly as I can.
Eventually, the job will finish. I will get the output files I wanted and defined but I’ll also get two other files: RayTrace.e821007 and RayTrace.o821007. These files contain all of the errors (if any) and the output you would normally see in the Matlab Command Window.
Congratulations. You are now batch processing. It seems like a lot of work, and it is a little, but you get to reuse these codes and they become second nature. Now I start to show you the really cool stuff.
Notice that the png image was really poor quality? That is because we used a small image and the sampling was very low. If we want to scale our problem up we can but it will probably take a very long time (orders of magnitude; since RayTracing is a compute intensive problem).
Mass PBS Job Submission
We are going to use a very basic way of parallel processing. This is what compute clusters are very good at. The Ray Trace is what is called an ’embarrassingly parallel’ problem. We can have a single cpu working on a single row of data and it can operate completely independently. That is, it doesn’t need to know what the other cpus are doing. A lot of problems are structured this way; Parameter sweeping and Monte Carlo to name a couple.
Instead of a PBS batch submit file, we write a quick bash (linux) script to do the work for us
#!/bin/bash # Define the problem here - same as before SAMPLES=50; HEIGHT=80; WIDTH=100; STEP=1; # Process one row of pixels at a time for ((i=1; i<=$HEIGHT; i+=$STEP)); do # Test to make sure looping is correct echo $i # set a useful job name jobname="RAYTRACE_$i" cat << EOF | qsub #!/bin/bash -l #PBS -l walltime=3:00:00 #PBS -l mem=800MB #PBS -l ncpus=1 #PBS -N $jobname module load matlab cd \$PBS_O_WORKDIR matlab -singleCompThread -nodisplay -r 'SecondMyRun($SAMPLES, $WIDTH, $HEIGHT, $i)' EOF done
The above script, sends parameters to our new “SecondMyRun.m” script defined below. Note that we only have a single loop that defines which row of pixels to process (the last parameter).
function SecondMyRun(samples, width, height, y) % Define a 'unit' of work BatchRayTrace(samples, width, height, y); % We have finished, so tell Matlab to quit quit;
This is almost the same format as the previous. Notice that I kind of keep the same program shell? This is because I use it all the time because it fits almost all types of matlab problems.
The changes to our new RayTracing.m file is as follows:
function BatchRayTrace(samples, width, height, y) fprintf(1, 'Processing %d %d %d %d\n', samples, width, height, y); warning off all; %matlabpool open local 12 w = width; h = height; ... [Cut for Brevity] ... % Write out the results % Create a filename filename = sprintf('RayTrace_%d.txt', y); % Open a file handle fid = fopen(filename, 'w'); % Write all the results out for x=1:w fprintf(fid, '%d %d %f %f %f\n', y, x, cr(1,x), cg(1,x), cb(1,x)); end % Close the file handle fclose(fid); end
Notice that this time, I write out the
[Y pixel] [X pixel] [Red Component] [Green Component] [Blue Component]
to a file. This is because I will need a little script when all the processing is done to collect the results together.
To run the work above, I type
And to list the jobs in the batch system:
Notice that I ran the qstat a little too quickly. We only see two of our current jobs running. It turns out that our friendly HPC staff allow me to use upto 200 cpus at once. This is the true power of the computer cluster and PBS. Our program is now parallel and I can process the work much faster. Or, I can scale the problem up.
I write a little matlab program to collect all the results back together:
function CollectResults() start=1; finish = 80; length = 100; cr = zeros(finish, length); cg = zeros(finish, length); cb = zeros(finish, length); for i=start:finish % Create the file name filename = sprintf('RayTrace_%d.txt', i); % Open the file fid = fopen(filename, 'r'); % Read each line from the data file while 1 % Get a line tline = fgetl(fid); % make sure the line is not null if ~ischar(tline) break; % For end of file recognition end % Process the line of data data = sscanf(tline, ['%d %d %f %f %f']); % Load the data into our standard structure cr(finish - data(1)+1, data(2)) = data(3); cg(finish - data(1)+1, data(2)) = data(4); cb(finish - data(1)+1, data(2)) = data(5); end % Close the file fclose(fid); end RGB = cat(3, cr, cg, cb); imwrite(RGB, 'CheckItOut.png', 'png');
The image file is exactly the same as the one from the single job submission (other than my poor image enhancement abilities) except that the total time for computation was much less. The single job took me 37 minutes (in my pbs.o123456) while the mass submission job took me a total of 2 minutes.
Of course, for completeness, the only thing left to do is to scale the problem up, submit it and go to bed. Hopefully the work will be done by the time I get to work tomorrow.
In SecondSubmit.m, change:
#!/bin/bash SAMPLES=25000; HEIGHT=720; WIDTH=1280;
That should keep all of those processors busy for a while 🙂 I’ll post the final pictures tomorrow morning.
Check out the behaviour of the random number generator across the different instances on Matlab on different cpus. I thought this was an error initially.