@Szabolcs You already got my upvote there although I guess your question is kind of specific. I rarely had luck when I asked very specific questions where I already had spent some time thinking about it.

@halirutan I have an issue running mathematica on linux clusters. I have a compiled function with listable and parallelization true. I requested 8 cores, but when it runs, it uses 1 core with 300% usage

do I need to specify Needs["SubKernelsLocalKernels`"] LaunchKernels[LocalMachine[8]]`

@brama Compiled parallel functions don't run on parallel kernels unless you specified it. Have you read the link I gave you?

This is an answer from the Wolfram support regarding one of my Linux Graphics bug reports:

> It would be helpful if you can send us a description of the problem along with a notebook that can reproduce the issue. External links are subject to change hence technical support will need a copy of the problem to keep track of issues and in case it needs to be forwarded to developers.

Am I nit-picking here when this answer kind of annoys me?

I'm testing the software to try to find every possible bug in the beta phase that concerns my work. After the final release which costs much much money and which introduces more bugs, I again take the time to write a bug report. Since I want that others know of this issue, I write it up here so users can comment and get notified when it is fixed. And the support is complaining that it needs to copy and paste the following description:

> "The moment I click on the graphics to drag it, it seems like it zooms the whole graphics for a fraction of a second."

@Nasser I looked at the question, tried a few things, and upvoted when it seemed clear it wasn't a simple mistake. OTOH, I've had a couple random downvotes on Qs (downvotes on Qs don't cost the voter any rep).

From the competition website: "This is an open contest. Anybody may participate except for the contest organisers and members of the same group as the the contest chairs. No advance registration or entry fee is required. Contestants are free to organize themselves into teams of any size."

I have no idea what kind of problems are included since they don't have any examples though.

@brama And $ProcessorCount is what? PBS typically allows you one CPU for the controlling process; the rest of the processes are expected to be started using TM. Aside from that, ps shows processes, not threads.

The first time I participate we had to implement a virtual machine from designs from a hypothetical ancient clay tablet based computer. It then turned out to have a mini unix like operating system that it could run, and some folders you had to hack, and each folder contained a separate programming problem.

The second time it was something about encoding pictures in DNA. I don't remember all the details, but I'm sure the links are still around.

The ICFP is more focused on a big problem (sometimes a meta-problem with smaller sub-problems) that you spend a weekend on with a team. The ACM contest is a bunch of unrelated small problems that you solve in a few hours with your team.

@brama seems okay. You should run a job that prints out $ProcessorCount and record it (is it the number of physical CPUs? is it the number of processors requested from PBS? something else?). PBS typically gives you a particular CPU affinity for your controlling process and you may not be able to run other threads in parallel except using TM-launched processes

TM is the Torque Manager. It is an API that user programs call to request cluster services such as process startup.

And you are trying to start threads here, not processes. So you would expect only one MathKernel, even if there are several parallel threads within it.

@OleksandrR. I I want to use the 8 cores I have specified in PBS. I want to make sure my program utilizes all the resources and runs parallel. so you think instead of $ProcessorCount I will put 8 as specified in my PBS

@OleksandrR. To be honest with you..I do not have a good understanding of kernels, subkernels, and processors. My understanding is every processor is a subkernel..with that in mind, I am assuming I should see 8 subkernals for 8 processors in my PBS script

@OleksandrR. you are welcome!! I guess I made a stupid mistake that is throwing you off....The variable aa in f is a vector of parameters to be optimized. a21 and a22 are the input parameters. Did you already figure this?

@brama I don't want to sound patronizing, but I hope you are already aware that parallelization is all about your problem structure, rather than throwing in as many Parallel -> True as possible? The structure of this problem just seems not amenable to parallelization. But maybe I'm wrong--sample inputs for f would help?

@OleksandrR. kind of understand it, but may not have done it that way....The philosophy is f is a numerical solution for the set of parameters and NMinimizer does parameter optimization. f has several matrices and f calls several other functions to perform matrix manipulations

@brama the reason is that changing some of the other functions changed the result. Since you know what the inputs and outputs should be better than I do, hopefully you can tell at a glance if the modification is correct or not

@brama I checked with 2 different inputs already. But maybe this is not enough; and maybe the problem appears only for specific inputs. Maybe our modification introduces a small error somewhere that is only noticeable sometimes. So I need semantic verification, not experimental!

Since demR is used in f I left the listable in place and removed the inlining in f. In the case of demF, it is only used in floF and gamma, which are both listable, I removed the listable and made them inline

Well, I think I have to go now (late here), but we can resume this discussion another time. It looks to me that the code could benefit from a bit of tidying up/commenting, which should help with optimization. At the moment, with just the code in front of me, I find it extremely hard to tell what ranks each of the values (should) have, and what they actually have by the time all the functions operate on them.

@brama the parallelization is happening in that you are threading scalar operations over matrices. But what these matrices mean in the calculation I don't know and this disturbs me.

I think ParallelEvaluate is hopeless; the parallelization happens inside the compiled function.

If you can use NM rather than DE it will help (DE is inefficient for function evaluations)

@brama okay, you're welcome. At least you have a 30% performance increase now I suppose. But I'm still worried about correctness. Hopefully we can resolve this with proper consideration of the problem. And we should get our own chat room too as the main chat will be busy at the weekend.