Today I’d like to introduce a guest blogger, Sarah Wait Zaranek, who is an application engineer here at The MathWorks. Sarah previously has written about speeding up code from a customer to get acceptable performance. She again will be writing about speeding up MATLAB
applications, but this time her focus will be on using the parallel computing tools.

Contents

Introduction

I wanted to write a post to help users better understand our parallel computing tools. In this post, I will focus on one of
the more commonly used functions in these tools: the parfor-loop.

This post will focus on getting a parallel code using parfor up and running. Performance will not be addressed in this post. I will assume that the reader has a basic knowledge of the
parfor-loop construct. Loren has a very nice introduction to using parfor in one of her previous posts. There are also some nice introductory videos.

Note for clarity : Since Loren's introductory post, the toolbox used for parallel computing has changed names from the Distributed Computing
Toolbox to the Parallel Computing Toolbox. These are not two separate toolboxes.

Method

In some cases, you may only need to change a for-loop to a parfor-loop to get their code running in parallel. However, in other cases you may need to slightly alter the code so that parfor can work. I decided to show a few examples highlighting the main challenges that one might encounter. I have separated these
examples into four encompassing categories:

Independence

Globals and Transparency

Classification

Uniqueness

Background on parfor-loops

In a parfor-loop (just like in a standard for-loop) a series of statements known as the loop body are iterated over a range of values. However, when using a parfor-loop the iterations are run not on the client MATLAB machine but are run in parallel on MATLAB workers.

Each worker has its own unique workspace. So, the data needed to do these calculations is sent from the client to workers,
and the results are sent back to the client and pieced together. The cool thing about parfor is this data transfer is handled for the user. When MATLAB gets to the parfor-loop, it statically analyzes the body of the parfor-loop and determines what information goes to which worker and what variables will be returning to the client MATLAB. Understanding
this concept will become important when understanding why particular constraints are placed on the use of parfor.

Opening the matlabpool

Before looking at some examples, I will open up a matlabpool so I can run my loops in parallel. I will be opening up the
matlabpool using my default local configuration (i.e. my workers will be running on the dual-core laptop machine where my
MATLAB has been installed).

if matlabpool('size') == 0 % checking to see if my pool is already open
matlabpool open2end

Independence

The parfor-loop is designed for task-parallel types of problems where each iteration of the loop is independent of each other iteration.
This is a critical requirement for using a parfor-loop. Let's see an example of when each iteration is not independent.

Checking the above code using M-Lint (MATLAB's static code analyzer) gives a warning message that these iterations are dependent
and will not work with the parfor construct. M-Lint can either be accessed via the editor or command line. In this case, I use the command line and have defined
a simple function displayMlint so that the display is compact.

output = mlint('dependentLoop.m');
displayMlint(output)

The PARFOR loop cannot run due to
the way variable 'a' is used.
In a PARFOR loop, variable 'a' is
indexed in different ways,
potentially causing dependencies
between iterations.

Sometimes loops are intrinsically or unavoidably dependent, and therefore parfor is not a good fit for that type of calculation. However, in some cases it is possible to reformulate the body of the loop
to eliminate the dependency or separate it from the main time-consuming calculation.

Globals and Transparency

All variables within the body of a parfor-loop must be transparent. This means that all references to variables must occur in the text of the program. Since MATLAB
is statically analyzing the loops to figure out what data goes to what worker and what data comes back, this seems like an
understandable restriction.

Therefore, the following commands cannot be used within the body of a parfor-loop : evalc, eval, evalin, and assignin. load can also not be used unless the output of load is assigned to a variable name. It is possible to use the above functions within a function called by parfor, due to the fact that the function has its own workspace. I have found that this is often the easiest workaround for the
transparency issue.

Additionally, you cannot define global variables or persistent variables within the body of the parfor loop. I would also suggest being careful with the use of globals since changes in global values on workers are not automatically
reflected in local global values.

Classification

A detailed description of the classification of variables in a parfor-loop is in the documentation. I think it is useful to view classification as representing the different ways a variable is
passed between client and worker and the different ways it is used within the body of the parfor-loop.

Challenges with Classification

Often challenges arise when first converting for-loops to parfor-loops due to issues with this classification. An often seen issue is the conversion of nested for-loops, where sliced variables are not indexed appropriately.

Sliced variables are variables where each worker is calculating on a different part of that variable. Therefore, sliced variables
are sliced or divided amongst the workers. Sliced variables are used to prevent unneeded data transfer from client to worker.

Using parfor with Nested for-Loops

The loop below is nested and encounters some of the restrictions placed on parfor for sliced variables.

The PARFOR loop cannot run due to
the way variable 'A1' is used.
Valid indices for 'A1' are
restricted in PARFOR loops.

In this case, A1 is a sliced variable. For sliced variables, the restrictions are placed on the first-level variable indices. This allows
parfor to easily distribute the right part of the variable to the right workers.

The first level indexing ,in general, refers to indexing within the first set of parenthesis or braces. This is explained in more detail in the same section as classification in the documentation.

One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

In this case, A1 has an loop counter variable for both first level indices (ix and jx).

The solution to this is make sure a loop counter variable is only one of the indices of A1 and make the other index a colon. To implement this, the results of the inner loop can be saved to a new variable and then
that variable can be saved to the desired variable outside the nested loop.

I have found that both solutions have their benefits. While cells may be easier to implement in your code, they also result
in A3 using more memory due to the additional memory requirements for cells. The call to cell2mat also adds additional processing time.

A similar technique can be used for several levels of nested for-loops.

Uniqueness

Doing Machine Specific Calculations

This is a way, while using parfor-loops, to determine which machine you are on and do machine specific instructions within the loop. An example of why you
would want to do this is if different machines have data files in different directories, and you wanted to make sure to get
into the right directory. Do be careful if you make the code machine-specific since it will be harder to port.

Sending a stop signal to all the labs ... stopped.
Starting matlabpool using the 'speedy' configuration ... connected to 16 labs.
On Machine 1
On Machine 1
On Machine 1
NOT on Machine 1
On Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1
NOT on Machine 1

Note: The ~ feature is new in R2009b and discussed as a new feature in one of Loren's previous blog posts.

Doing Worker Specific Calculations

I would suggest using the new spmd functionality to do worker specific calculations. For more information about spmd, check out the documentation.

Clean up

matlabpool close

Sending a stop signal to all the labs ... stopped.

Your examples

Tell me about some of the ways you have used parfor-loops or feel free to post questions regarding non-performance related issues that haven't been addressed here. Post your
questions and thoughts here.

81 CommentsOldest to Newest

It says somewhere about parfor that the order in which iterations are performed is not guaranteed. How random is that order? In my application I would very much like the order in which they are performed to be uniformly distributed. How can I get the efficiency advantages of parfor and also – in my case – the advantages of uniform random order of processing?

A related question is that the arrayfun function presumably is coded at a lower level as a for loop. Is there a way to parallelize arrayfun, or should I not use it and just rely on parfor (subject to the randomness requirement above)?

You can’t control the order of the parfor at all. Why is their an efficiency of uniform rand order of processing for you situation? arrayfun is not parallelized. parfor IS the mechanism for parallel for-loop constructs that have each loop independent.

In my application two or more different iterations of the loop could assign different numbers to a certain variable. Thus, the last one would be the only permanent result. That would be okay provided the last one is chosen uniformly at random from among the alternatives.

Okay, so I will use parfor instead of arrayfun, provided there is a neat way to achieve that required uniform randomness.

By using the variable test in the parfor body, it is clear that the variable test is needed on the workers. MATLAB will only transfer the value of test once to each worker per calling of the parfor loop.

If you call the parfor loop multiple times, the value of the variable test is always propagated from the client to the workers. Therefore, the workers will not accidentally get out of sync.

I am new to the parallel processing toolbox. I ran the following code and found that “parfor” is infact taking more time to run than simply running “for”. Am I missing something terribly? I would appreciate your thoughts on this.

parfor is unlikely to be faster than for in your situation where there is little work being done in each pass through the loop. The savings happens when the work is significantly more than the overhead of the loop itself.

Loren is correct (as usual :)). Sometimes for large data and a short running loop – the time it takes to transfer the data overwhelms the gain you get my putting the loop in parallel. You can check this by running the loop on 1 worker with and without data transfer. See code sample below:

I just noticed your question in comment 4 didn’t appear to be addressed. You said that “In my application two or more different iterations of the loop could assign different numbers to a certain variable. Thus, the last one would be the only permanent result.”

That means your application is NOT suited for use with PARFOR, as Sarah called out in the Independence section of this blog posting, because the result of the loop would depend on the order in which the loop applications were executed.

You could probably make your application suitable for PARFOR, by having each iteration assign a value to a particular element of an array or cell array and choosing one element at random from that array/cell array once the loop is finished, outside the PARFOR.

another parfor problem, which is not mentioned here… how to call a function from inside the parfor loop? I know… “The body of a parfor-loop cannot make reference to a nested function. However, it can call a nested function by means of a function handle.” … but how to call it using the function handle? I have tried something like

fce=@test1;
parfor i=1:10
result(i)=feval(fce,i);
end

(where fce is my own function, in the same directory as the script which calls the function), but the error message “Undefined function or method ‘test1′ for input arguments of type ‘double’.” displays.
I think it is because the file dependencies are not set. How can I do it?

It seems parfor has issues with temporary variables that exist as structs:

k = [1 2 3];
parfor h = 1:3
z.var = k(h)
disp(z.var)
end

Returns the error “The variable z in a parfor cannot be classified.” In this simplistic example I feel like z should definitely be a temporary variable. If I remove the “.var” portion, the loop works perfectly. Is parfor incapable of segmenting structures?

Here is an example of using the nested function via a function handle. However, I believe you must explicitly pass data into your function handle. I don’t believe the benefit of the nested function seeing the calling function’s workspace exists when you call it within a parfor loop. I did not have to add any FileDependencies when running this on my local workers.

function testMain
%The body of a parfor-loop cannot make reference to a
% nested function.
%However, it can call a nested function by means of a % function handle.
fce=@myNest;
result = zeros(1,10);
parfor ii=1:10
result(ii) = fce(ii);
end
disp(result)
function d = myNest(ii)
d = rand*ii;
end
end

I mispoke about having access to workspace variables. Although, your example will still not see ii without explictly passing it in. The following example will work as expected with respect to the variable a since the current value a is saved with the function handle.

function testMain
%The body of a parfor-loop cannot make reference to a
% nested function.
%However, it can call a nested function by means of a % function handle.
a = 5;
fce=@myNest;
result = zeros(1,10);
parfor ii=1:10
result(ii) = fce();
end
disp(result)
function d = myNest
d = a*rand;
end
end

For your above example – the following should work and not give an error.

function testMain
%The body of a parfor-loop cannot make reference to a
% nested function.
%However, it can call a nested function by means of a % function handle.
fce=@myNest;
result = zeros(1,10);
parfor ii=1:10
result(ii) = feval(fce,ii);
end
disp(result)
function d = myNest(ii)
d = rand*ii;
end
end

I am having trouble with using the variables/matrices created in a for loop within a parfor loop.
for example, when the following script is executed, the matrix A is not available in my base workspace.

I understand that parfor is creating a workspace of variables to be used in the loop on every worker, but arent the results supposed to be sent back to the base workspace when I run the script?

Your example was very helpful, but up to a point. In both my example, and in my actual code, I’m not actually trying to index into a field. Note all I want to do is assign z.var = k, not z.var(x) = k. Is there some implicit indexing applied? In my real code there is an operation like

etc. This is a separate variable for each instance of the variable. I’ve tried creating the variable using struct() before the loop begins, and that didn’t help. It still says it can’t classify the variable. Oddly enough, if you create a function that does the exact same operations and call that function from inside the loop, it works just fine. Is this a bug or a quirk of MATLAB’s distributed toolbox?

To amend my previous comment, I found that it does indeed work if I instantiate the struct explicitly using struct() inside the parfor loop and all the assignments are contained in just that loop. I guess I just need to pay more attention to how I create variables when doing parallelization, since it requires so much more explicit variable contexts!

In order to bring your data back to the client workspace, it either needs to be classified as a reduction variable or a sliced variable. The link above to the documentation on classification of variables can give you more in depth information on this (see blog section on classification). In a nutshell, you either need to be indexing your variable by the iterate variable for the parfor-loop or performing a reduction operation on that variable. The section on classification will have a master list of all supported reduction operations.

Using the code in your comment – I have adjusted it in two ways to bring back the data. In the version you sent me, only the values of A from the last iteration will be kept. I included one version where all values of A were kept – and one where only the last ones were kept.

%% In this case A will not be brought back
parfor ix1 = 1:10
A = zeros(1,10);
for ix3 = 1:10
A(ix3) = ix3;
end
end
clear ix3
%% In this case A will be brought back for each iteration
parfor ix1 = 1:10
A = zeros(1,10);
for ix3 = 1:10
A(ix3) = ix3;
end
Akept2(ix1,:) = A;
end
display(Akept2)
clear ix3
%% In this case A will only be brought back for last iteration
clear all
Akept3 = [];
parfor ix1 = 1:10
A = zeros(1,10);
for ix3 = 1:10
A(ix3) = ix3;
end
if ix1 < 10
A = [];
end
Akept3 = [Akept3 ; A];
end
display(Akept3)

Sarah Z,
thank you about your example about structures. I tried to do it, but what is the best solution, when you have the structure of the arrays of different size? Do I have to make an individual cell array for every field of the structure?

Sorry for the delay. I have been traveling spreading the joy of MATLAB :)

So the big issue is the restrictions for indexing a sliced variable that I discussed above. Here is the original code and how to convert it to one that works. I have simplified it, but hopefully the basic ide is clear!

You could also use persistent variables to keep track of the number of iterations done on a single worker – but there is no easy way to determine that across workers.

In your example, since the values of iter only get adding up per chunks of iterations separately and then summed together at the end of the parfor, there is no way to easily get that intermediate information.

Here is an example of how it could be done with persistent variables per worker:

Iteration Total 1 of 20
Iteration Total 2 of 20
Iteration Total 3 of 20
Iteration Total 4 of 20
Iteration Total 5 of 20
Iteration Total 6 of 20
Iteration Total 7 of 20
Iteration Total 1 of 20
Iteration Total 2 of 20
Iteration Total 3 of 20
Iteration Total 4 of 20
Iteration Total 5 of 20
Iteration Total 6 of 20
Iteration Total 7 of 20
Iteration Total 8 of 20
Iteration Total 9 of 20
Iteration Total 8 of 20
Iteration Total 9 of 20
Iteration Total 10 of 20
Iteration Total 10 of 20

1. You can use the methods I showed above for nested for-loops. In my mind, I think that the best option would be to use the cell workaround. In your example, I am assuming you want X to be saved for every value of n even for your example it wasn’t. For best performance, you want to put the parfor loop out as far as possible. To me it makes the most conceptual sense, to put the n in the outermost loop and put the parfor loop with it. Here is how it would work for you:

parfor n = 1:100000
for A = 1:1:36
for B = -1:-1:-40
for C = 0:1:35
for D = -0:-1:-25
Results{n}(A,B,C,D) = X;
end
end
end
end
end

Hello there,
I am trying to integrate some functions numericaly using ‘parfor’. The following is my code but it is not working. It is showing the following error

??? Error using ==> parallel_function at 598
Error in ==> syms at 77
Attempt to add "x" to a static workspace.
See MATLAB Programming, Restrictions on Assigning to Variables for details.
Error in ==> main at 16
parfor i=1:16

So, the issue is that the worker doesn’t get passed your global variable, it is just gets defined as global. Therefore, it just thinks it is an empty variable You need MATLAB to know that you need that variable on each worker.

This is similar to another question asked earlier in this thread. If you scroll up to my respond to Ninad (comment #5), you can see how to pass it to the workers and have it still act like a global. A note of caution is that when I say still act like a global -it is global per worker, not across workers or across worker and client.

You can also not use a global in this case and pass all of x to each worker by making in another input to your function solode. It seems as if you don’t even need all of x in this example, so I think you could even just put x(j) as an extra input to that function. Then x would be treated like a sliced variable and limit your communication overhead between client and workers.

Hi Sarah. When using a parfor for parallel computing in a multi-core scenario, I have had problems with large matrices. The data is copied to each worker causing out-of-memory failure. Is the recommended solution to use distributed arrays? Doesn’t this require communication between workers and potentially result in a bottleneck, or am I missing something?

As another possible solution, I implemented a shared memory wrapper (FEX: 28572-sharedmatrix). It uses POSIX, which potentially limits its use, although I believe many, if not most, people use Cygwin when compiling Mex code. The advantage is that the data lives outside of Matlab and can be accessed by multiple processes (not just Matlab).

Anyway, I am curious what other approaches you can recommend or approaches people have taken when each worker needs read-only access to the same data.

I am using a parfor to calculate the same function on several points of large dimensions. The problem is that the function is generated dynamically. There are some parameters (function_id, function_instance, the problem dimension, etc) that are used to differentiate between different set-ups of runs. In a same run, I want to use the same instance of the function to calculate the values of my set of vectors, so I have to generate the function before to start the parfor execution. That is:

% this line initialize the function that can be used
% through ‘fgeneric’.
fgeneric(‘initialize’, ifun, iinstance);

Hi there:
When I use parfor method to calculate an LDPC code,I encounter one problem.At first,I create LDPC encode and decode object outside the parfor body,an error appear like this “Not enough input variable”.But after I move the LDPC encode and decode object into the parfor body,my code goes successful!I know the abject is different with common variables,but I don’t understand why this happen!

Please, could you tell me please why the following code gives me this error:

??? Error using ==> parallel_function at 598
Error in ==> fgeneric at 356
fgeneric has not been initialized. Please do: fgeneric(‘initialize’, FUNC_ID,
INSTANCE_ID, DATAPATH) first, where FUNC_ID is the number of the chosen test function.

I was out traveling, so this is my first chance to respond to your question.

Since I don’t know exactly what fgeneric does, this is only a guess on my part. Since it doesn’t return a variable back obviously to the main MATLAB workspace my guess is that it is either changing the state/value of an object or using a global or persistent variable.

In many cases these values would not transfer over your client to your worker, the first step would be to initialize within the parfor loop and see if that makes a difference. If so, then we can alter the code so that it initializes using fgeneric only once per worker.

Hello, if MATLAB can’t serialize the variable to transfer it to the workers, it ends up getting passed as an empty value. I believe this is exactly what you are seeing. I think probably this Wikipedia entry does a such better job explaining serialization than I can ( http://en.wikipedia.org/wiki/Serialization ). I see this issue with older objects in MATLAB (occasionally, but rare) and objects that come from 3rd party tools.

The workaround is to do exactly what you did, which is to initialize within the worker instead of in the client. You can modify it so that you only need to initialize once per worker, if that is ideal for your case. See code example below to do that. It loads a file for the variable it creates, but you could do something else for your variable creation.

I am not sure exactly what fgeneric is doing (defining a global variable, a persistent variable, a state of a object). However, if you want to run something once per worker, the easiest way would be to do something like the following:

Hi Sarah–still wondering on my question (Sept 10). This issue actually comes up a lot in our group’s work. We do large-scale machine learning and end up doing a lot of (iterative) computation over large data-sets (sparse matrices) which are too big to have in memory more than once. Under this scenario, keeping the data distributed rapidly makes the computation IO bound–despite being on the same 8core machine. Even if this is not the case, it makes implementation substantially more difficult to require one to assess which parts of the data can be carved up for “local” computation and in many circumstances it is simply not possible.

This seems like an obvious problem/deficiency in the parallel computing toolbox so I would very much like to know if I’m missing some recommended approach for dealing with this issue.

Sorry for my delay in response. So, the Parallel Computing Toolbox doesn’t have built-in shared memory capabilities. I have let the development team know about your request, I have come across other customer interested in this capabilitiy.

For your case, you perhaps could use memmapfile and rely on the OS to provide shared memory access to data on disk.

My mex program “sharedmatrix” is essentially this–although I think its slightly better than a memory mapped file as POSIX provides additional functionality. Anyway, good to know that writing this program wasn’t a waste of time!

I have been trying to use parfor for code as shown below. I understand that my indexing is not sliced, but clearly the iterations are independent of each other? Why then does parfor do not allow this to run?

While running the above code, I get the following error:
??? Error using ==> parallel_function at 587
Error in ==> parallel_function>make_general_channel/channel_general at 864
Subscripted assignment dimension mismatch.

Any thoughts on the cause?
I did figure out that it was something to do with “if” condition within the parfor look, but am not sure what the issue is.

There are two issues I see in your code. The first is simply a warning that it is treating node like a broadcast variable instead of a sliced variables. That is an easy fix. I have described the fix below.

The bigger issue is with the function G_globalx. MATLAB has no idea that you are indexing into it uniquely for each iteration. node(e,1:6) could always return the same value, breaking the independence of each loop.

You must be able to index it like described in the blogpost above. I can try to help you with that.

Looks like the code doesn’t work even with a for-loop. I believe the issue is the the sum(k1) currently returns 1 x 4 matrix. I am not sure if this fits what you are trying to do, but if you create a 1 x 12 matrix to sum instead of a 3 x 4 matrix to sum – you get the correct dimensions. See code example below. The code then would work both with a for loop and a parfor loop.

First of all, let me start by saying this is a very terse and well put together blog and thanks for putting it together.. I got most of my questions answered in your explanations and the following questions…

I would like to know if we can make use of PBS job scheduling and exploit more than 8 processors but the distributed licensing restriction does not allow that. Is there any workaround?

I am using parfor loop in my application. The input data file that i have is huge file in the order of GB’s. The data is used in independent chunks across loop iterations.Inside the parfor loop i am using the file operations like fseek and fread from that file.

I know that is a FILE operations are a overhead in this parfor but is there any method to use this efficently inside parfor.

You are doing what I would suggest: either using something like fread, textscan or memorymapping.

Is the read becoming a bottleneck for you?

The only other suggestion I have is that it might be easier for you to use something like spmd, load a 1/4 of the data on each worker (if you can handle that in memory) and then work with it (I am assuming 4 workers here). It limits the number of loads, but then it would require more memory.

I have my application with parfor running with number of workers being 4 for over 10000 frames but after few thousand frames (ex: 2000 frames) one of the matlab worker session was shut down.

The license i use is a independent license (it is not shared with any other user).

Can anyone explain why such errors occur and what deos it infer.

Below is the error i encounter:
============================================================

Error using ==> parallel_function at 598
The session that parfor is using has shut down

??? The client lost connection to lab 2.
This might be due to network problems, or the interactive matlabpool job might have errored. This is causing:
java.io.IOException: An operation on a socket could not be performed because the system lacked sufficient buffer
space or because a queue was full

In versions of MATLAB prior to R2007b, parfor designated a more limited style of parfor-loop than what is available in MATLAB 7.5 and later. This old style was intended for use with codistributed arrays (such as inside an spmd statement or a parallel job), and has been replaced by a for-loop that uses drange to define its range.

The new parfor in R2007b used parentheses in defining its range to distinguish it from the old parfor. This only happened in this release to have customer code behave as expected. However, it should have thrown a warning that you were using the old functionality. Once this change occured (R2008a and on), parentheses were not needed.

If you put parenthese around your range, you will see the same error in R2007b and R2010b.

Your two options:

1) Convert parfor to drange to get old parfor behavior
2) Make minor changes in your code (described in above blog post) to use the new, more powerful version of parfor.

It took me a while to figure out how to get my problem (definition of elements in a large sparse matrix) working smoothly in parfor, but now I’m happily getting it done roughly nworkers times faster than before. Thanks Loren!

One caveat now though is: I have my big sparse matrix defined, but it takes me over 9 hours to run lsqnonneg on it! Is there any handy way to parallelize the lsqnonneg operations? ie maybe with codistribution somehow?

I think I solved my problem too. I realized we had both 2009a and 2010a installed and I was using the older of the two. Switching to 2010a and running again, it now seems to be using >100% of the CPU cycles in top (ie running on multiple threads?) where before in 2009a it only used up to 100%.

And here I was trying to figure out how to install ATLAS and TSNNLS to make life easier, joke’s on me! I guess that’s only parfor the course.

Kevin – Exactly, indexing into your data contained as a cell array will allow parfor to figure out what data goes to what worker and allowing it to act like a sliced variable.

Alex – I am happy you saw such a jump in performance for your code. We are always updated our multithreaded options and working on performance. R2010a increased performance for sparse matrix indexing and introduced multithreading (which has been around since R2007a) for more functions. Both of those may have helped you in terms of performance.

Sorry for the delay, but I have been out of town. Here are my thoughts below.

A few suggestions:

1) I would put the interior code in a function for easier handling of the indices.

2) I would concatenate the resulting output from the subfunction to get a master list of all desired m values. You can either display them at the end or as part of the interior function.

3) Since this code is very basic and fast running, it will not be sped up with a parfor loop. Hopefully this is just part of a larger simulation (e.g. once you get m you run some long running function using it).

4. I probably should preallocate the mtotal variable in the subfunction for performance and just for good programming considerations but since it is a small matrix, it really doesn’t effect performance that much. If you go on to use this code in production, I would do so.

As you can see the body of my parfor loop is not complicated but I dont understand why it crashes. I have got another question please. Is it possible to write two parfor loops with different body in the same m-file?

I would really appreciate any comment you would make on this error message. Thanks so much.

However, I can give you some things to check about why your performance isn’t as expected when using parfor.

I have found it is usually one of three things:

1. Very small problem: If your for-loop is running on the order of a couple of seconds, the overhead of transferring your data to the workers and back again just swamps out the gain you get using parfor.

The test would be just to time your serial code. I wouldn’t approach putting in parallel anything less than 10 seconds, personally.

2. Too much data transfer: Even if you problem is adequately sized, data transfer can still be an issue. If you are moving around large chunks of data, it still may swamp your improved performance.

The test would be to run your problem but run it on 1 worker. If you see a sizeable difference in run time, then you know it might be overhead on sending your data to the worker. You can then test for this by writing a dummy function that passing the data in – does a pause – and then passes it out. That way you can accurately get an estimate time of sending it to at least 1 worker.

3. Multithreading: MATLAB is multithreaded for various functions, mostly those that are element-wise operations or linear algebra operations. If you are using mainly these multithreaded functions, using parfor will not help you because your code is already essentially running in parallel and taking advantage of your multiple cores.

I have been using parfor for sometime on my quadcore pc for some pretty big optimization probelm which generally takes hours to solve..
I found sometime my code works fine but sometimes it breakes with the error .

??? Error using ==> parallel_function at 598
The session that parfor is using has shut down

??? The client lost connection to lab 4.
This might be due to network problems, or the interactive matlabpool job might have
errored.

It is hard to diagnose this without the actual code. Usually it is either a network issue or a data size limit issue. I have talked to support, and they encourage you to contact them to help you pinpoint the issue. Their number is listed below:

I have been a (happy) Matlab user for many years now but have run into a limitation (?) of the distributed computing toolbox. For my research, I require parallel execution of a very large number of matrix-vector multiplications of differing sizes in Real-Time. These matrices are constructed offline, in an extremely expensive operation, and saved in a static data structure (MatSet in the example below). This static data structure is used by all workers, and is not modified after creation.

When I run the code, which is equivalent to the code below, I find that the PARFOR loop takes 10x more time to complete than the FOR loop in Matlab 2009b. As I understand from your earlier explanations, this is because of the constant transfer of data (MatSet in this case) between workers. In my case, however, this data transfer is completely unnecessary!

My question is whether there is some way of loading a static dataset into the workspace of the workers so as to prevent unnecessary communication overhead between workers?

I have no clue how this message could arise. I set up a simulation based on matlab-classes, that should run for 5 different parameter sets simultaneously.
In the beginning there is the parameter ‘initRobotList’ that is handed over to the constructor.
The instances of the class are created successfully for all 5 simulations. This variable is nowhere else used but in the constuctor of the class, but after 5 min, and approx. 5000 simulation-steps in each of the 5 simulations, there is this message.

What could that be?
Where should I start looking?
It is a really big simulation so it is impossible to comment to a minimal example and see if it’s working.

Now it worked somehow. And how did I solve it? I did nothing! So I think I got this error, because I changed something in the sourcecode of the class and saved it, while the simulation tried to use it and got confused. Could that be?

I have not used the parallel computing features of matlab before, so I am quite new with the implementation of it. I might be asking a question that has already been adressed – apologies if this is the case. I have the following code:

Okay, so now if I use a parfor loop this does not work. There are many of my applications where I have an array with values that represent index values of a matrix and for some reason this is a challenge for me when I try to implement a par for loop.

This relates to the restrictions spelled out in the above section of the blog post entitled – Classification.

For parfor loops, when you index into a sliced variables, restrictions are placed on the first-level variable indices. This allows parfor to easily distribute the right part of the variable to the right workers. One of these first-level indices must be the loop counter variable or the counter variable plus or minus a constant. Every other first-level index must be a constant, a non-loop counter variable, a colon, or an end.

The maximum number of workers was raised from 4 to 8 (in R2009a) and from 8 to 12 (in R2011b). So, make sure you have at least R2009a.

The maximum number of workers is usually set to the number of cores that the OS tells MATLAB you have. You can change this by changing the local configuration by going to the parallel pulldown menu and managing the local configuration. If you look at the properties, you can set the cluster size to 8. However, if you only have 4 cores – this will not help you in terms of speed up.

Cheers,
Sarah

These postings are the author's and don't necessarily represent the opinions of MathWorks.