Re: st: mata courses?

David Airey <david.airey@Vanderbilt.Edu> asked,
> [...] Will Stata offer Mata courses, and if so, what kind of prerequisite
> knowledge will be required? [...]
Probably. Yes. I'm the one writing it and I keep going back and forth
on whether to make it a course first and then a book, or jump directly
to the book. As you can tell, I'm not as far along as I should be.
Prerequisites will be Stata do-file (not ado-file) programming.
NC-152 is *NOT* required; NC-151 would be more than sufficient, and
the minimum is somwhere between NC-101 and NC-151.
In the meantime, I am writing a column on Mata in the Stata Journal, which
I hope helps, and I'm answering questions here on Statalist, and I'm trying
to give more than the minimum answer.
I invite questions on my Statalist answers, even when they might be considered
silly, such as "I saw you used corr(Variance(X,1)). What is Variance() and
what is the 1 doing there?"
The details of Mata are well documented in the manual and the on-line help.
What is missing is motivation and application.
Mata serves thee purposes in Stata:
1. Once you get the hang of it, there are some problems (such as
Marcello Pagano's pairwise correlation problem) for which
Mata provides the easiest solution. It is important to realize
that Mata can be used interactively (no real programming required),
and that it can be combined with Stata, and with Stata's older
-matrix- programming language.
2. Mata is a full-fledged matrix programming language, with the emphasis
on matrix.
3. Mata is a full-fledged programming language, and to heck with
matrix.
I have listed these in the order of importance to most users. For us here at
StataCorp, the order is 3-2-1. We intend to implement most future additions
to Stata using Mata. I would say all, but I know there will be an exception.
If I could say all, then that would mean that finally, users would have
complete equality with developers here at StataCorp. That has been a
long-term goal.
In terms of (3), Mata will be as important a development in Stata as ado-files
were. You have already seen some of the payoff: Commands -adoupdate- and
-hsearch- would never have happened were it not for Mata.
But I want to emphasize (1) and (2), and especially (1). In terms of (1),
Mata will not change your life, but it will make it easier. Let me give
one example.
I have data,
. tabulate cat
cat | Freq. Percent Cum.
------------+-----------------------------------
1 | 60 62.50 62.50
2 | 23 23.96 86.46
3 | 6 6.25 92.71
4 | 7 7.29 100.00
------------+-----------------------------------
Total | 96 100.00
I have theory that says half the data should be in category 1, half
the remainder in category 2, half again in 3, and the remainder in 4.
That is, the expected counts in the cells are (48, 24, 12, 12).
Can I reject at the 5% level that the data is from the distribution
suggested by theory?
The chi-squared test is easy enough to perofrm but look around and you will
find nothing in Stata proper that will answer that question. That's absurd,
but true. Look around more and you'll find user-written routines. Nick Cox,
I believe, has written one.
The chi-squared statisic is simple enough conceptually; it is
4 (obs_i-exp_i)^2
chi^2(3) = Sum ---------------
i=1 exp_i
Which in this case is (60-48)^2/48 + (23-24)^2/24 + ...
In Mata, we could calculate thusly,
: obs = (60\ 23\ 6\ 7)
: exp = (48\ 24\ 12\ 12)
: sum( (obs-exp):^2 :/ exp )
8.125
: chi2tail(3, 8.125)
.0434977514
That's a pretty easy solution. The worse part of it was entering the data,
but we can solve that:
. tabulate cat, matcell(obs)
(output omitted)
. mata:
--------------------------------------- mata (type end to exit) -----
: obs = st_matrix("obs")
: exp = (48\ 24\ 12\ 12)
: sum( (obs-exp):^2 :/ exp )
8.125
: end
---------------------------------------------------------------------
The -matcell()- option of -tabulate- saved the counts as a Stata matrix.
-st_matrix()- grabbed the Stata matrix and saved it as a Mata matrix.
We can also change our "program" to calculate the expected number. Rather
than counting on our fingers and then typing
: exp = (48\ 24\ 12\ 12)
We could do the following:
: N = sum(obs)
: exp = (N \ N/2 \ N/4 \ N-N/2-N/4)
or
: N = sum(obs)
: exp = (N \ N/2 \ N/4)
: exp = exp \ N-sum(exp)
and then our entire solution would be,
. tabulate cat, matcell(obs)
. mata:
: obs = st_matrix("obs")
: N = sum(obs)
: exp = (N \ N/2 \ N/4)
: exp = exp \ N-sum(exp)
: sum( (obs-exp):^2 :/ exp )
: end
Forgive the long aside. The point of the above is that Mata is worth
learning and that you do not have to be a superprogrammer to use it.
I apologize for the delay in the NetCourse/Book. In the meantime, I recommend
my column in the SJ. I think I might even use the example above as part of
the next one.
In the meantime, the statements and functions of Mata are indeed powerful,
and it is useful to read about colon-operators and sum(). To find the first
help file, type -help mata-, then click on [M-2], then click on op_colon.
To find sum(), type -help mata-, click on [M-4], then on utility, then on
sum().
-- Bill
wgould@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/