$\begingroup$Just compile as you would do for a full list of arguments, such as compJ = Compile[{x, y}, x + y], and then delegate the execution from your main function j to compiled one as j[x_, y_: 1] := compJ[x, y].$\endgroup$
– Leonid ShifrinFeb 15 '13 at 21:53

1

$\begingroup$@LeonidShifrin That would work, but might it not be faster to have an explicitly compiled separate version of the function that uses the default argument? If the default values are used often, the simplifications that could be made might result in significant speedup. It would depend on the exact nature of the compiled function, of course.$\endgroup$
– XerxesFeb 15 '13 at 22:26

$\begingroup$@Xerxes This is a good point. There indeed may be situations where treating the second argument as a constant may significantly speed up the code, although those probably won't represent the majority of use cases.$\endgroup$
– Leonid ShifrinFeb 15 '13 at 22:32

$\begingroup$@Leonid Why don't you post this as an answer? It looks like one to me :-)$\endgroup$
– SzabolcsFeb 15 '13 at 23:05

1

$\begingroup$@Szabolcs Ok, I added some more meat to it :-)$\endgroup$
– Leonid ShifrinFeb 16 '13 at 0:45

1 Answer
1

NOTE: it actually turned out that some of the conclusions here are not quite correct, will be corrected soon. Please see the comment of Oleksandr R. below.

A simple case

Just compile as you would do for a full list of arguments, such as

compJ = Compile[{x, y}, x + y]

and then delegate the execution from your main function j to compiled one as

j[x_, y_: 1] := compJ[x, y]

For example,

j[1]
(* 2. *)
j[1,2]
(* 3. *)

A small case study in auto-compilation

As was noted in comments, there may be cases where one may gain certain speedup by injecting the default argument as a constant at compile-time. One can do that with some amount of meta-programming. I made such an experiment and the results are rather interesting, so I will report them here.

Syntax and implementation

First, I will define a new syntax: it will look like

def[f[x_,y_:1] := Compile[{x,y},x+y]]

Now, def will be a custom assignment operator. It will create two different compiled functions, one for a general case and one for a special case. It will also create two different rules for f, so in this sense we will only imitate optional pattern.

Note the option InlineDefaultValue, which, when set to True (which is the default), allows one to inline the already computed default into the body of a function being compiled. I was also using Trott-Strzebonski technique to inject the evaluated default value.

Note also that the def operator works only on a rather specific function's signature, namely a function which has only one optional argument placed at the end of the argument list. This restriction can be removed, but the code would become more complex.

We can see that for a single argument, a special, different compiled function will be used.

A more interesting example

A more interesting example would be when the default argument is actually not a single number, but, for example, a (large) list. Here is a toy example: we will compute a total of some portion of a list:

This definition here did evaluate Range[10^5], and inlined it in this way into the code being compiled. Now, we will also make another definition, which will differ only in the setting of the InlineDefaultValue option, which we will now set to False:

Remarks

To my mind, these are pretty interesting results. First, we can see that inlining gives us an order of magnitude speed-up with respect to normal execution, in this case. This is because we trade memory for speed: by inlining the long list into Compile, we made the compiler allocate the array in the global memory space (where all global variables are allocated). This increase the size of an executable, but the memory is allocated statically (not on the heap). We don't have then to pass the array back and forth, and this is what speeds up the code.

But also, when we didn't inline, we actually ended up with a dramatically worse performance, than in the normal 2-argument case, and this may look puzzling. My guess is that when we pass an argument to a compiled function, Mathematica does not really create a copy of it (since the argument is immutable anyway). However, when we have Range[100000] inside the compiled code, it has to allocate the array on the heap for any single invocation.

So, the conclusion would be that one can indeed gain quite substantial speedups by inlining evaluated default value into the function being compiled, particularly when this value is a large list. OTOH, one has to be careful with such manipulations, since one can lose the performance just as easily if due attention is not paid to details.

$\begingroup$+1 very interesting analysis. Might it be fairer, though, to pre-evaluate the Range (using e.g. With) for the timing run? The performance of both partialSum forms (i.e. with and without inlining) is the same in this case, which makes sense as the bytecodes show the same operations are performed whether registers are loaded from arguments or from constants. Whether or not copies occur here is unclear to me. The bytecode generated for Range inlined literally as in partialSumAlt is different--an explicit loop to initialize the array--and this may account for the worse performance.$\endgroup$
– Oleksandr R.Feb 16 '13 at 2:47

$\begingroup$@OleksandrR. Yes, you are right. It actually crossed my mind, but I did not test. Too bad. So, this means that we don't actually gain anything from inlining which we can't get by using With or other means of pre-computing. I can't think clearly now, but will come back to this tomorrow and correct the answer. Also, if you'd like, please feel free to edit it as you see fit.$\endgroup$
– Leonid ShifrinFeb 16 '13 at 2:59

$\begingroup$More about the last point: indeed this seems to be the case. Compare e.g. With[{crange = Compile[{{len, _Integer, 0}}, Range[len]; Null]}, Do[crange[10^5], {2500}] // AbsoluteTiming] -- its timing is the same as for partialSumAlt with Range inlined literally, so the poor performance is due to an inefficient loop rather than excess allocations/copies. The VM surely can't match the many optimizations a modern CPU incorporates and its working set is too large for the latter to work efficiently.$\endgroup$
– Oleksandr R.Feb 16 '13 at 3:07

$\begingroup$@OleksandrR. Interesting. I should have spent more time and do this analysis. Now it looks like there wasn't a single correct statement in the second part of my answer. :-) Time to sleep now, but I will give it another shot tomorrow. As I said, please feel free to improve the answer in the meantime, should you wish to do so.$\endgroup$
– Leonid ShifrinFeb 16 '13 at 3:12

$\begingroup$It's time for me to sleep as well, so I will leave the answer alone for the time being. :) Interesting to note that the performance of the initialization loop when compiled to C is only slightly worse than Range itself and clearly limited by the rate at which the memory manager can map new pages. So, my hypothesis above seems to be more or less correct even though it seems unlikely given that the underlying operation is allocation-bound and thus quite expensive.$\endgroup$
– Oleksandr R.Feb 16 '13 at 3:35

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.