which uses various methods, including rejection sampling, depending on the mean of
the standard gamma distbn required.

The cdf of the standard Gamma (f(x)=x^{gamma-1} e^{-x}/Gamma(gamma))
is F(x) = Gamma_x(gamma)/Gamma(gamma),
where Gamma_x is the incomplete gamma function.

Hmm, let's read the manual. On page 23, the BayeShape facility is
described; it doesn't mention Gamma explicitly.
Perhaps the simplest course of action is
to replace Gamma priors by log-normal.

[I see that Shape 2, Simplex volume, has a connection
to incomplete Gamma.]

Start simple

As a first step, I implemented the model

mu ~ dnorm(0,1)
Data ~ dnorm(mu,1)

which should give (since the variance of D is 2.0)

Evidence = log P(D) = -D**2/4 - log(sqrt(2*pi * 2)) = -2.2655.

I ran the model using bayestoy.c (modified by me)
and onegaussian.c (modified from bayesapp.c)(Makefile).
I tried setting the Data and Accuracy to various values,
checking the resulting values of <Mock> and Evidence.
For extreme data values, the answers were incorrect. (Which seems reasonable!)
Here are two examples, a moderately extreme case with perfect answer, and
a really extreme case with wrong answer.
All results came out much the same for any value of rate from 0.1 down to 0.0001.

Data, Acc

Correct
<Mock> and Evidence

Results
<Mock> and Evidence

4, 10

3.96 and -8.8447

3.96 and -8.75

Good.

20, 1

10 and -101.2655

6.3 and -116.44

Bad. (Result did not improve if small rate was used)

Further investigation indicates that "Bad" results can occur
when a datum requires a parameter to take on a 7-sigma outlying value.
6-sigma outliers seem to be all right.

This is exactly what's expected: real numbers are represented in 32 bits.
So anything beyond mass 2^{-32} in the tail of the prior
will be ill-represented. Now, at 7-sigma, the tail probability is roughly
exp(-49/2) --- say 1e-12; which is 2^{-40}. So in fact we should not expect
a 7-sigma event to be represented at all. 6-sigma corresponds to 30 bits.

Next steps

I made a sequence of factor analysis models.
They are available in
fa.c,
fa2.c,
fa3.c,
(mainfa.c,
mainfa2.c,
mainfa3.c),
and have increasing numbers of parameters of an increasing number of types.
To do simple Bayesian models like FA I force the number of "atoms" to be 1.
I encountered a bug in my first attempt, in which BayeSys requested the likelihood
for an object that had Natoms=0. I was not expecting this.
I fixed the problem by modifying UserBuild so that it returns "0" instead of "1"
in this situation.

fa3.c implements a 3-dimensional factor analysis model
in which there are six parameters: three components V[1], V[2], V[3]
for the loading vector, which come from a unit normal prior, and
three diagonal noise variances D[1], D[2], D[3], which
have a log-normal prior.
The generative model for x (a 3-component vector) is

P(x[1..3]) = Normal( 0, C )

where the covariance matrix C is

C = D + s2 V VT

where D is a diagonal matrix made from D[1], D[2], D[3],
and s2 is a fixed variance parameter.

Rather than explicitly enumerate N data points x,
I work with the sufficient statistics, {N, <x xT> }
and write down the likelihood analytically.
The parameter Ndata counts the number of sufficient stats,
not the number of data points.
The number of data points is placed in the UserCommonStr
which is defined in userstr3.h,
along with other essential fixed model parameters
such as the variance s2
and the fixed hyperparameters of the log-normal prior.

In the output vector Mock, I keep track of what the sufficient statistics
would be if perfect data were generated from the present parameters.

Results look fine. Mock matches Data nicely.

Aside: I notice that results obtained differ depending
on which of my linux machines runs the program. A redhat machine
and a Debian give slightly different answers.

More parameters

In
fa4.c,
(mainfa4.c),
I make the parameter
s2 a free parameter with a log normal prior.
When it has a fairly narrow prior (a s.d. of +/-2 over log(s2)),
the evidence changes almost negligibly

from Evidence = -6843.40 (fa3)
to Evidence = -6845.81 (fa4)

which is very appropriate. Increasing the prior width over log(s2)
by a factor of e2
should simply decrease the evidence by 2.0 nats. Check:

Evidence = -6848.04

Bingo.

Fewer parameters

In
fa5.c,
(mainfa5.c),
I reduce the number of free parameters again (in order to
carry out model comparison between two factor analysis models,
one of which claims to know the loading vector V).

I set the known V to (1,1,1), which is not well-matched to the data.
The evidence drops to

Evidence = -6980.84

(a drop of 130, with a data set of 1000 points, so a KL of 0.13 per data point, which seems
reasonable).