almost stochastic

2016/10/26

Say that you have a dynamical process of interest $X_1,\ldots,X_n$ and you can only observe the process with some noise, i.e., you get an observation sequence $Y_1,\ldots,Y_n$. What is the optimal way to estimate $X_n$ conditioned on the whole sequence of observations $Y_{1:n}$?

2016/09/23

If I give you a function on $[0,1]$ and a computer and want you to find the minimum, what would you do? Since you have the computer, you can be lazy: Just compute a grid on $[0,1]$, evaluate the grid points and take the minimum. This will give you something close to the true minimum. But how much?

2016/01/17

Suppose that you sample from a probability measure $\pi$ to estimate the expectation $\pi(f) := \int f(x) \pi(\mbox{d}x)$ and formed an estimate $\pi^N(f)$. How close are you to the true expectation $\pi(f)$?

2015/09/07

I submitted a preprint on matrix factorisations and linear filters. I managed to derive some factorisation algorithms as linear filtering algorithms. In the paper, I left a discussion to here, estimating parameters via maximising marginal likelihood. So here it is.

2015/06/19

Around this April, I was reading quasi-Newton methods (from this very nice paper of Philipp Hennig) and when I saw the derivation of the Broyden update, I immediately realized that this idea may be used for computing factorizations. Furthermore, it will lead to an online scheme, more preferable!

Although the basic idea was explicit, I set a few goals. First of all, I would like to develop a method that one can sample any column of the dataset and use it immediately. So I modified the notation a bit, as you can see from Eq. (2) in the manuscript. Secondly, I wanted that one must be able to use mini-batches as well, a group of columns at each time. Thirdly, it was obvious that a modern matrix factorization method must handle the missing data, so I had to extend the algorithm to handle the missing data. Consequently, I have sorted out all of this except a rule for missing data with mini-batches which turned out to be harder, so I left out that for this work.

2015/03/08

I was tinkering around logistic map $x_{n+1} = a x_n (1 - x_n)$ today and I wondered what happens if I plot the histogram of the generated sequence $(x_n)_{n\geq 0}$. Can it possess some statistical properties?

2015/03/04

Suppose we have a continuous random variable $X \sim p(x)$ and we would like to estimate its tail probability, i.e. the probability of the event $\{X \geq t\}$ for some $t \in \mathbb{R}$. What is the most intuitive way to do this?

2014/06/12

Fisher's identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications.

2014/01/30

Introduction

In this post, I review the convergence proofs of gradient algorithms. Our main reference is: Leon Bottou, Online learning and stochastic approximations. I rewrite the proofs described in Bottou's paper but with more details about the points which are subtle to me. I tried to write the proofs as clear as possible so as to make them accessible to everyone.