Programming an estimation command in Stata: Consolidating your code

\(
\newcommand{\xb}{{\bf x}}
\newcommand{\gb}{{\bf g}}
\newcommand{\Hb}{{\bf H}}
\newcommand{\Gb}{{\bf G}}
\newcommand{\Eb}{{\bf E}}
\newcommand{\betab}{\boldsymbol{\beta}}
\)I write ado-commands that estimate the parameters of an exponential conditional mean (ECM) model and a probit conditional mean (PCM) model by nonlinear least squares, using the methods that I discussed in the post Programming an estimation command in Stata: Nonlinear least-squares estimators. These commands will either share lots of code or repeat lots of code, because they are so similar. It is almost always better to share code than to repeat code. Shared code only needs to be changed in one place to add a feature or to fix a problem; repeated code must be changed everywhere. I introduce Mata libraries to share Mata functions across ado-commands, and I introduce wrapper commands to share ado-code.

Lines 2–29 define the ado-command, which uses the Mata work function mywork() defined on lines 54–89. Lines 33–52 define the evaluator function MYNLExp() used by optimize() in mywork(). This structure should be familiar from the Poisson regression examples previously
discussed.

mynlprobit1 implements an NLS estimator for the parameters of the PCM model.

Duplicated code is dangerous. Anytime you want to add a feature or fix a problem, you must do it twice. I highly recommend that you avoid duplicated code, and I illustrate how by rewriting these commands to have a single Mata code base and then a single ado-code base.

Libraries of Mata code

The mywork() functions used in mynlexp1 and mynlprobit1 differ only in the evaluator function they call; see line 76 in code blocks 1 and 2. I would like to have one mywork() function that is called by mynlexp1 and by mynlprobit1.

Mata functions defined at the bottom of an ado-file are local to that ado-file, so I cannot use this method for defining the mywork() function that will be used by both mynlexp1 and mynlprobit1. What I need is a file containing compiled Mata functions that are callable from any other Mata function or from within any ado-file or do-file. This type of file is known as a library.

I use mynllib.mata in code block 3 to make the library lmynllib.mlib containing the compiled Mata functions MYNLWork(), MYNLProbit(), MYNLExp().

Lines 1 and 99 open and close the Mata session in which I define the functions, create the library, and add the functions to the library. Lines 4–22 define MYNLExp(), which I have already discussed. Lines 24–43 define MYNLProbit(), which I have already discussed. Lines 45–94 define MYNLWork(), which is the work function that both mynlexp2.ado and mynlprobit2.ado will use. Note that I have used uppercase letters in the names MYNLExp(), MYNLProbit(), MYNLWork(). Functions in Mata libraries are global: they can be called from anywhere, and their names must be unique in the space of names for Mata functions. I try to avoid using function names that other programmers might use by prefixing the names of my functions with an uppercase name of the library and beginning the function name with an uppercase letter.

Line 49 specifies that the ninth argument MYNLWork() is a string scalar known as model inside the function. The ado-commands will pass either “expm” or “probit” in this argument.

If model contains “expm”, line 60 stores the address of the function MYNLExp() in f. The variable type that holds the address of a function is known as a pointer to a function. For this reason, line 56 declares f to a pointer to a function. If model contains “probit”, line 63 stores the address of the function MYNLProbit() in f. If model does not contain “expm” or “probit”, lines 66–68 display an error message and exit.

Pointers hold the address of an object. All I need here is a box that holds the address of the evaluator function corresponding to the model fit by the ado-command that will call MYNLWork(). Line 56 declares f to be this type of box, lines 60 and 63 store the memory of the correct function in f, and line 81 puts the address stored in f into the optimize object S. Type help M-2 pointers to learn more about pointers.

The remaining lines of MYNLWork() are the same as the lines in the mywork() functions in mynlexp1 and mynlprobit1.

There is still a lot of duplicated code in the evaluator functions MYNLExp() and MYNLProbit(). Instead of using a pointer to the evaluator function, I could have consolidated the evaluator functions MYNLExp() and MYNLProbit() into a single function that used an additional argument to decide which case to evaluate. I chose the presented method because it is faster. Consolidating the evaluator functions would have slowed down the function that I most want to speed up. (The evaluator function is called many times by optimize().) So, in this case, I accepted the risk of duplicated code for the advantage of speed.

Line 96 creates the Mata library lmynllib.mlib in the current directory, replacing any previously defined version of this library. Line 97 puts the compiled versions of MYNLWork(), MYNLProbit(), and MYNLExp() into lmynllib.mnlib. At this point, the file lmynllib.mlib in the current directory contains the compiled functions MYNLWork(), MYNLProbit(), and MYNLExp().

mynllib.mata is a do-file that makes a Mata library, hence the .mata suffix instead of the .do suffix. I can execute it by typing do mynllib.mata. Example 1 makes the library lmynllib.mlib.

After dropping all the ado-commands in memory, and clearing Mata, I used quietly do mynllib.mata to make the library, because I do not want to see the code again. mata: mata mlib index updates the list libraries known to Mata; this step adds lmynllib.mlib to the list of known libraries so that I can use the functions therein defined.

Having made lmynllib.mlib and added it to the list of known libraries, I can use the functions therein defined in one-line Mata calls in my ado-commands. Consider mynlexp2.

The code for mynlexp2 is almost the same as the ado-code for mynlexp1 in code block 1. The only differences are that line 12 calls MYNLWork() instead of mywork() and that MYNLWork() accepts a ninth argument, specified on line 13 to be “expm”.

The analogous changes are made to mynlprobit2 that were made to mynlexp2. In particular, note that line 13 passes “probit” in the ninth argument to MYNLWork().

Writing a work ado-command

The Mata library allowed me to consolidate the duplicated Mata code. I still have lots of duplicated ado-code. To consolidate the duplicated ado-code, I have mynlexp3 and mynlprobit3 call a single ado-command that does the work as seen in code blocks 6 and 7.

Recall that whatever the user specified is contained in the local macro 0. Line 5 of mynlexp3 passes whatever the user specified prefixed with “expm” to mynlwork. Similarly, line 5 of mynlprobit3 passes whatever the user specified prefixed with “probit” to mynlwork. mynlexp3 and mynlprobit3 are known as wrapper commands, because they just wrap calls to mynlwork, which does the actual work.

Line 5 uses gettoken to put the model prefixed to the user input by mynlexp3 or mynlprobit3 in the local macro model and to put whatever the user specified in the local macro 0. Lines 7–16 put the name of the calling command in the local macro cname or exit with an error message, if the model is not recognized. In theory, the error case handled in lines 13–16 is not necessary, because I should know how to call my own command. Experience has taught me that handling these extra error cases makes changing the code in the future much easier, so I consider this good practice.

By line 17, the local macro model and the local macro cname contain all that differs between the cases handled by mynlexp3 and mynlprobit3. model is passed to MYNLWork() on line 26, and cname is used to store the command name in e(cmd) on line 38.

It is almost always better to share code than to repeat code. Shared code only needs to be changed in one place to add a feature or to fix a problem; repeated code must be changed everywhere. I introduced Mata libraries to share Mata functions across ado-commands, and I introduced wrapper commands to share ado-code.

Hi, I have been trying to reproduce some results you made available online related to non-linear panel data (I think for a workshop you did in Germany in 2014). I keep getting this message when I run both the logit and probit CRE with endogenous variables on my data: