Analysis with Programming

Search This Blog

Posts

If you have been following this blog, you may have noticed that I don't have any update for more than a year now. The reason is that I've been busy with my research, my work, and I promised not to share anything here until I finished my degree (Master of Science in Statistics). Anyways, at this point I think it's time to share with you what I've learned in the past year. So far, it's been a good year for Statistics especially in the Philippines, in fact, last November 15, 2016, the team of local data scientists made a huge step in Big data by organizing the first ever conference on this topic. Also months before that, the 13th National Convention on Statistics organized by the Philippine Statistics Authority, invited a keynote speaker from Paris21 to tackle Big data and its use in the government.

So without further ado, in this post, I would like to share a new programming language which I've used for several months now, and it's called Julia. This progr…

One of the problems often dealt in Statistics is minimization of the objective function. And contrary to the linear models, there is no analytical solution for models that are nonlinear on the parameters such as logistic regression, neural networks, and nonlinear regression models (like Michaelis-Menten model). In this situation, we have to use mathematical programming or optimization. And one popular optimization algorithm is the gradient descent, which we're going to illustrate here. To start with, let's consider a simple function with closed-form solution given by
\begin{equation}
f(\beta) \triangleq \beta^4 - 3\beta^3 + 2.
\end{equation}
We want to minimize this function with respect to $\beta$. The quick solution to this, as what calculus taught us, is to compute for the first derivative of the function, that is
\begin{equation}
\frac{\text{d}f(\beta)}{\text{d}\beta}=4\beta^3-9\beta^2.
\end{equation}
Setting this to 0 to obtain the stationary point gives us
\begin{align}…

In my previous article, we talked about implementations of linear regression models in R, Python and SAS. On the theoretical sides, however, I briefly mentioned the estimation procedure for the parameter $\boldsymbol{\beta}$. So to help us understand how software does the estimation procedure, we'll look at the mathematics behind it. We will also perform the estimation manually in R and in Python, that means we're not going to use any special packages, this will help us appreciate the theory.

Linear Least Squares
Consider the linear regression model,
\[
y_i=f_i(\mathbf{x}|\boldsymbol{\beta})+\varepsilon_i,\quad\mathbf{x}_i=\left[
\begin{array}{cccc}
1&x_{11}&\cdots&x_{1p}
\end{array}\right],\quad\boldsymbol{\beta}=\left[\begin{array}{c}\beta_0\\\beta_1\\\vdots\\\beta_p\end{array}\right],
\]
where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,\cdots, N$. The $f_i(\mathbf{x}|\boldsymbol{\beta})$ is the deterministic part of the model that dep…

Consider the linear regression model,
$$
y_i=f_i(\boldsymbol{x}|\boldsymbol{\beta})+\varepsilon_i,
$$
where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,\cdots, N$ and the predictor or the independent variable is the $\boldsymbol{x}$ term defined in the mean function $f_i(\boldsymbol{x}|\boldsymbol{\beta})$. For simplicity, consider the following simple linear regression (SLR) model,
$$
y_i=\beta_0+\beta_1x_i+\varepsilon_i.
$$
To obtain the (best) estimate of $\beta_0$ and $\beta_1$, we solve for the least residual sum of squares (RSS) given by,
$$
S=\sum_{i=1}^{n}\varepsilon_i^2=\sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2.
$$
Now suppose we want to fit the model to the following data, Average Heights and Weights for American Women, where weight is the response and height is the predictor. The data is available in R by default.

A family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ for a univariate random variable $T$ with real-valued parameter $\theta$ has a monotone likelihood ratio (MLR) if, for every $\theta_2>\theta_1$, $g(t|\theta_2)/g(t|\theta_1)$ is a monotone (nonincreasing or nondecreasing) function of $t$ on $\{t:g(t|\theta_1)>0\;\text{or}\;g(t|\theta_2)>0\}$. Note that $c/0$ is defined as $\infty$ if $0< c$.
Consider testing $H_0:\theta\leq \theta_0$ versus $H_1:\theta>\theta_0$. Suppose that $T$ is a sufficient statistic for $\theta$ and the family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ of $T$ has an MLR. Then for any $t_0$, the test that rejects $H_0$ if and only if $T >t_0$ is a UMP level $\alpha$ test, where $\alpha=P_{\theta_0}(T >t_0)$.
Example 1
To better understand the theorem, consider a single observation, $X$, from $\mathrm{n}(\theta,1)$, and test the following hypotheses:
$$
H_0:\theta\leq \theta_0\quad\mathrm{versus}\quad H_1:\theta>\theta_0.
…

More on Likelihood Ratio Test, the following problem is originally from Casella and Berger (2001), exercise 8.12.

Problem
For samples of size $n=1,4,16,64,100$ from a normal population with mean $\mu$ and known variance $\sigma^2$, plot the power function of the following LRTs (Likelihood Ratio Tests). Take $\alpha = .05$.
$H_0:\mu\leq 0$ versus $H_1:\mu>0$$H_0:\mu=0$ versus $H_1:\mu\neq 0$
Solution
The LRT statistic is given by
$$
\lambda(\mathbf{x})=\frac{\displaystyle\sup_{\mu\leq 0}\mathcal{L}(\mu|\mathbf{x})}{\displaystyle\sup_{-\infty<\mu<\infty}\mathcal{L}(\mu|\mathbf{x})}, \;\text{since }\sigma^2\text{ is known}.
$$
The denominator can be expanded as follows:
$$
\begin{aligned}
\sup_{-\infty<\mu<\infty}\mathcal{L}(\mu|\mathbf{x})&=\sup_{-\infty<\mu<\infty}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\
&=\sup_{-\infty<\mu<\infty}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\…

Another post for mathematical statistics, the problem below is originally from Casella and Berger (2001) (see Reference 1), exercise 8.6.

ProblemSuppose that we have two independent random samples $X_1,\cdots, X_n$ are exponential($\theta$), and $Y_1,\cdots, Y_m$ are exponential($\mu$).
Find the LRT (Likelihood Ratio Test) of $H_0:\theta=\mu$ versus $H_1:\theta\neq\mu$. Show that the test in part (a) can be based on the statistic
$$
T=\frac{\sum X_i}{\sum X_i+\sum Y_i}.
$$
Find the distribution of $T$ when $H_0$ is true.Solution
The Likelihood Ratio Test is given by
$$
\lambda(\mathbf{x},\mathbf{y}) = \frac{\displaystyle\sup_{\theta = \mu,\mu>0}\mathrm{P}(\mathbf{x},\mathbf{y}|\theta,\mu)}{\displaystyle\sup_{\theta > 0,\mu>0}\mathrm{P}(\mathbf{x}, \mathbf{y}|\theta,\mu)},
$$
where the denominator is evaluated as follows:
$$
\sup_{\theta > 0,\mu>0}\mathrm{P}(\mathbf{x}, \mathbf{y}|\theta,\mu)=
\sup_{\theta > 0}\mathrm{P}(\mathbf{x}|\theta)\sup_{\mu > 0}\mathrm{P…