OpenMP® Forum

Discussion on the OpenMP specification run by the OpenMP ARB. OpenMP and the OpenMP logo are registered trademarks of the OpenMP Architecture Review Board in the United States and other countries. All rights reserved.

Hi, Hope u all will be fine. I need your help, i have been working on code to parallelize it since more than one week but didn't succeeded till now to get required results. I am going to attach my code in two files. I want to compute the parallel computation time using one and two threads using my core2duo system. But using 2nd thread the time increases as compared to 1st thread time, instead of decreasing and becoming half. Its my first programming using openmp thats why may be i am not able to realize my faults. In one attached file i am going to parallelize the nested for loops but all in vain as i studied that openmp is not good in nested parallelism, u will find lot of pragmas in that region that i has tried. In second file i am using sections to parallelize the block which call the user defined function but still i am not getting the required efficiency. If both files will work then at last i will merge both files to get final full one parallelized program.Plz help me as much as u can and elaborate ur reply with examples. Plz reply soon and help me our as its very important for me. Thanx

Let me try to answer your questions. In your first example you cannot expect any speedup because you use parallel construct and no worksharing constructs inside (all of them are commented). Thus (1) all threads will duplicate the same work, and (2) you will get undefined results because of lot of race conditions in the code (e.g. "sumX=0;", "sumX=sumX+...", while one thread will collect some data another thread will override it with 0). I would suggest you starting form one combined <parallel for> construct here, because your parallel region consists of the only outer loop. Later you can experiment with nested parallel regions and collapse clause, but try to get correct results with the only simple construct first. What you need to do carefully is to specify sharing attributes for all writable variables (you can leave read-only variables shared by default). For example, all loop iteration variables need to be private (only parallel loop iteration variables get private attribute implicitly, sequential loop iteration variable is shared by default, you need to make them private explicitly). And the variables used in calculations (sumX, sumY, SUM) need to be private, so that each thread works with its own copy. As you want to write result in a file char by char, you was right to use schedule(dynamic) and ordered clauses, though you should expect some slowdown because of ordered clause. Each iteration of (X,Y) loop writes in separate location of edgeImage.data array, so I think you don't need ordered here, but to correctly write data into file you definitely need ordered (otherwise data will be intermixed randomly).Hope this will help you to parallelize the first program.

As to your second program, I haven't got the idea of parallelization here. You read only three numbers from a file, this is so small chunk of work that you unlikely can get any speedup from parallelization. Anyway, you have fundamental synchronization problem in your code. You use shared file handle, that means different threads work simultaneously with the file. Imagine you set file position it thread 0 to char 2 and expect it to start reading there. But before the read another thread can move the position to char 18 or 22, so finally you will get undefined results. To solve this you can either make read operations ordered (which eventually kills all the parallelism), of open the file inside parallel using private file handles so that each thread has its own file position (not sure how much parallelism you can get here taking into account hardware serialization of access to the file).

Hi, I have tried what you have told, Andrey. I am attaching the updated file. Again instead of decreasing time in the second thread time is increasing as compared to 1st thread. I am using the following gray-scale image : "http://upload.wikimedia.org/wikipedia/commons/d/dc/Webcam_grayscale.jpg"(just save it as Bmp) in my program. Kindly help me out from this mess its a request. I shall be very thankful to you.

1. Even if your program would work correct, you can hardly expect any speedup from parallelization because (1) ordered clause and (2) write to file in parallel loop. Ordered only make sense if for example two threads make a lot of work in parallel on each iteration and only shortly make ordered things at the end. In your case the work on each iteration is too small to pay for ordered. Similarly the writing into file takes longer than the rest of calculations, so it impacts the parallelism significantly.

2. I see the following errors in attached program: - length of the vector is (Nx * Ny * 3) as each pixel takes 3 bytes in .bmp file. But you only work on one third of the data in a loops (X, Y), because you iterate (from 0 to Nx) by (from 0 to Ny) and get one char on each iteration. as a result I got corrupted bmp file after running your program (it should be 3 times longer due to header data). - you put array assignment under ordered clause but the write to file is not under the ordered. You would better do the opposite.

3. I tried to re-write a bit your program, got image from the link you pointed, converted it into bitmap and run the program. Results were 0.5 sec on 1 thread and 0.4 sec on 2 threads. Then I moved final file writing out of the timing region and got 0.2 sec on 1 thread and 0.1 sec on 2 threads - perfect scaling, isn't it. I attached the result program for you to try.

Major changes I made: - moved final file writing out of parallel region (and out of timing region finally). Thus no more ordered clause is needed, and the schedule can be any now, I tried both static and dynamic with little difference. - combined reading and writing so that (1) read and write input header in one operation each, (2) read input data 3 bytes at a time (save only the first byte, the two other should be the same), (3) write final array in one operation (my timing shows that it takes 0.3 sec on my laptop). - made input array 3 times shorter, so that only one char per pixel saved (as initial image is black-and white the two other chars are identical to first char for each pixel). Correspondingly I've written each value 3 times in the result array (I guess these are (red,green,blue) triplets, they are the same for black-and-white images). Making result array 3x longer than input array greatly simplifies final writing as one operation (should be much faster).

I am not an expert in binary image formats, so I could mistake in the program. But I couldn't found what is color table, because I saw in my converted bitmap that after 54 bytes of header the pixel data started immediately. Thus I skipped the color table, and finally was able to open the result image. It has the same header as initial image and re-calculated pixel data.

Hi anv, Thanks alot my friend atlast i have gotten what i wanted and its just because of ur help. Ur attached program and ur comments help me alot. When i run the program attached by i got segmentation error and little more errors. After making them correct i have done and its because of u. After a long silence i got reply and its solved my problem. Now i am thinking that u should teach me also,u can be my teacher . Please can u give me ur email id so that i can contact u directly? Thanks again,bye tc This forum is really very helpfull.

Hi I am trying to parallised a long long code which has been divided in subroutines. Here I attached a subroutine in which I try to collapse the nested loop but did not use any private directive. At the end the code should be able copy the global file that other subroutines can use .

I'm not sure exactly what your question is, but to get a correct code, you need to declare all the scalar variables accessed inside the loop in a PRIVATE clause. I would strongly recommend using the DEFAULT(NONE) clause and declaring all variables as either shared or private to avoid accidental omissions.