tag:blogger.com,1999:blog-87300148086789237432017-09-24T19:10:42.428+03:00almost stochastica blog on probabilityDenizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.comBlogger27125tag:blogger.com,1999:blog-8730014808678923743.post-15268195794967524022016-11-28T17:49:00.000+03:002017-07-26T13:52:55.499+03:00A note on estimating chaotic systems<div style="text-align: justify;">This is a short note unlike other posts but it is necessary: Recently I came across with some complex systems presentations in Youtube. Most of the scientists, especially complex system scientists, treat chaotic dynamical systems as mysterious objects. The models are beautiful (so you should be careful around them) <span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; text-align: left;">—</span> and they are not that difficult to deal with in real world. The typical example you would hear that a chaotic system is extremely sensitive to the uncertainty in the initial condition. That&#39;s true: If I give you the initial condition of a deterministic chaotic system with a <i>machine epsilon</i> uncertainty along with the exact dynamics, the trajectory you would predict will differ from the real one vastly after a while. This, then they say, can be counted as the evidence of undecidability in chaotic systems.</div><div style="text-align: justify;"></div><a href="http://www.almoststochastic.com/2016/11/a-note-on-estimating-chaotic-systems.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-22549233839465415372016-10-26T18:34:00.000+03:002017-06-04T17:06:27.758+03:00A primer on filtering<div style="text-align: justify;">Here, I discuss the core of the filtering idea in a relatively simple language. I will not introduce particle filters here but at the end you should have a really solid idea about what they are aiming at. In the following, I assume some familiarity with probability densities and fundamental rules (e.g. marginalisation or conditional independence or difference between a random variable and its realisation).<br></div><a href="http://www.almoststochastic.com/2016/10/a-primer-on-filtering.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-18695169273929772542016-09-23T15:36:00.002+03:002016-09-25T13:26:34.658+03:00A simple bound for optimisation using a grid<div style="text-align: justify;">If I give you a function on $[0,1]$ and a computer and want you to find the minimum, what would you do? Since you have the computer, you can be lazy: Just compute a grid on $[0,1]$, evaluate the grid points and take the minimum. This will give you something close to the true minimum. But how much? </div><div style="text-align: justify;"></div><a href="http://www.almoststochastic.com/2016/09/a-simple-bound-for-optimisation-using.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-75377487438871379262016-01-17T13:17:00.001+02:002016-04-25T18:10:03.204+03:00An $L_2$ bound for Perfect Monte Carlo<div style="text-align: justify;">Monte Carlo methods are widely used for estimating expectations of complicated probability distributions. Here I provide the well-known $L_2$ bound. </div><a href="http://www.almoststochastic.com/2016/01/an-l2-bound-for-perfect-monte-carlo.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com2tag:blogger.com,1999:blog-8730014808678923743.post-3960309271965957252015-09-07T18:50:00.003+03:002017-01-16T17:13:10.955+03:00Matrix Factorisation with Linear Filters (and discussion)<div style="text-align: justify;">I <a href="http://arxiv.org/abs/1509.02088">submitted a preprint</a> on matrix factorisations and linear filters. I managed to derive some factorisation algorithms as linear filtering algorithms.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the paper, I left a discussion to here, estimating parameters via maximising marginal likelihood. So here it is.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">In the paper, I consider the following probabilistic model, \begin{align} p(c) &amp;= \NPDF(c; c_0, V_0 \otimes I) \label{priorC} \\ p(y_k|c,x_k) &amp;= \NPDF(y_k; (x_k \otimes I_m) c, \lambda \otimes I). \label{LikelihoodCform1} \end{align} where each $x_k$ is a static model parameter vector, and $C$ is the latent matrix. Note that $(x_k^\top \otimes I_m) c = C x_k$ (see the paper). We would like to solve the following problem to estimate $x_k$ for each $k$, \begin{align*} x_k^* = \argmax_{x_k} \log p(y_k | y_{1:k-1},x_k). \end{align*} So we would like to obtain the incremental marginal likelihood $p(y_k | y_{1:k-1},x_k)$. We define it as, \begin{align*} p(y_k | y_{1:k-1},x_k) = \int p(y_k | c, x_k) p(c | y_{1:k-1}) \mbox{d}c \end{align*} where $c = \vect(C)$. The important thing here is $p(c | y_{1:k-1})$ is not the exact distribution but it is an approximation to the true posterior as one has to use $x_{k-1}^*$ to obtain the distribution (If $X$ were known, it would be exact). So we know from the paper, \begin{align*} p(c | y_{1:k-1}) = \NPDF(c; c_{k-1}, V_{k-1} \otimes I_m), \end{align*} and, \begin{align*} p(y_k | c, x_k) = \NPDF(y_k; (x_k \otimes I_m) c, \lambda \otimes I_m). \end{align*} Using formulas in Bishop's PRML (2006, page 93), we can arrive the marginal $p(y_k|y_{1:k-1},x_k)$ as the following, \begin{align*} p(y_k | y_{1:k-1},x_k) = \NPDF(y_k; (x_k^\top \otimes I_m) c_k, (\lambda \otimes I_m) + (x_k^\top \otimes I_m) (V_{k-1} \otimes I_m) (x_k \otimes I_m)). \end{align*} Using properties given in the Section I.A of the paper, we compactly obtain, \begin{align}\label{MarginalLikelihoodOfMFRLF} p(y_k | y_{1:k-1},x_k) = \NPDF(y_k; C_{k-1} x_k, (\lambda + x_k^\top V_{k-1} x_k) \otimes I_m) \end{align} Although integration is analytic, maximising the logarithm of \eqref{MarginalLikelihoodOfMFRLF} is not possible analytically. However the likelihood is differentiable, so one can use nonlinear optimisation methods. But in terms of performance this does not bring any advantage: One gains a bit in the overall error part but loses too much in terms of computation time. I have tried quasi-Newton methods (BFGS), and conjugate gradients but no cure.</div>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-32318217393944013882015-06-19T16:29:00.001+03:002015-12-07T01:20:38.364+02:00Online Matrix Factorization via Broyden Updates<div style="text-align: justify;">I arXived a new preprint titled <a href="http://arxiv.org/abs/1506.04389">Online Matrix Factorization via Broyden Updates</a>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Around this April, I was reading quasi-Newton methods (from <a href="http://jmlr.org/papers/volume14/hennig13a/hennig13a.pdf">this very nice paper</a> of Philipp Hennig) and when I saw the derivation of the Broyden update, I immediately realized that this idea may be used for computing factorizations. Furthermore, it will lead to an online scheme, more preferable!</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The idea is to solve the following optimization problem at each iteration $k$:\begin{align*} \min_{x_k,C_k} \big\| y_k - C_k x_k \big\|_2^2 + \lambda \big\|C_k - C_{k-1}\big\|_F^2.\end{align*}</div><div style="text-align: justify;">The motivation behind this cost is in the manuscript.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Although the basic idea was explicit, I set a few goals. First of all, I would like to develop a method that one can sample any column of the dataset and use it immediately. So I modified the notation a bit, as you can see from Eq. (2) in the manuscript. Secondly, I wanted that one must be able to use mini-batches as well, a group of columns at each time. Thirdly, it was obvious that a modern matrix factorization method must handle the missing data, so I had to extend the algorithm to handle the missing data. Consequently, I have sorted out all of this except a rule for missing data with mini-batches which turned out to be harder, so I left out that for this work.</div>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-78984168914080759182015-06-18T14:29:00.000+03:002015-08-05T22:03:46.867+03:00MH sampler for changepoint inference<div style="text-align: justify;">The following Metropolis-Hastings exercise was a homework of a course I assisted.</div><div style="text-align: justify;"><br></div><div style="text-align: justify;">In this post, we will design an independent MH sampler to localise a changepoint in a relatively short time series for tutorial purposes.</div><a href="http://www.almoststochastic.com/2015/06/mh-sampler-for-changepoint-inference.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-7332651197389802092015-03-17T17:29:00.003+02:002017-07-07T17:39:40.010+03:00Trapezoid rule as Bayesian inference<div style="text-align: justify;">I learned a very interesting relation between numerical integration and Bayesian inference <a href="http://www.probabilistic-numerics.org/assets/pdf/Diaconis_1988.pdf">from Diaconis</a>, which I shortly review in what follows.</div><div style="text-align: justify;"></div><a href="http://www.almoststochastic.com/2015/03/trapezoid-rule-as-bayesian-inference.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-54849307049143818252015-03-08T16:11:00.000+02:002015-07-30T19:42:05.239+03:00Tinkering around logistic map<div style="text-align: justify;">I was tinkering around logistic map today, I accidentally discovered a property of it. That is, if you look at histogram of the sequence generated by it, it will be same independent of the input. This immediately hit me that there may be an underlying probability density of this map. In fact, after Googled, I found out that Ulam and von Neumann studied this! So here we go. </div><a href="http://www.almoststochastic.com/2015/03/tinkering-around-logistic-map.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-71698218536321643852015-03-04T11:43:00.003+02:002015-03-11T08:18:04.230+02:00Monte Carlo as Intuition<div style="text-align: justify;">I am doing teaching assistantship for Monte Carlo methods course. I explained the following exercise as an application of the Monte Carlo principle. It is nice because it coincides with a very intuitive guess.</div><a href="http://www.almoststochastic.com/2015/03/monte-carlo-as-intuition.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-75173768743060044132014-07-10T15:45:00.002+03:002015-06-19T16:30:11.809+03:00Probabilistic models of nonnegative matrix factorisation<h2 style="text-align: justify;">Introduction</h2><div style="text-align: justify;">I wrote this post last year. I thought it is good to publish this here. Here, we give a brief review of probabilistic models of nonnegative matrix factorisation (NMF). We mainly list the papers which are important to gain intuition and sketch the main ideas without too much mathematical detail. </div><a href="http://www.almoststochastic.com/2014/07/probabilistic-models-of-nonnegative.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-6913923522388249812014-06-12T07:51:00.000+03:002017-08-14T00:53:46.334+03:00Fisher's Identity<div style="text-align: justify;">Fisher&#39;s identity is useful to use in maximum-likelihood parameter estimation problems. In this post, I give its proof. The main reference is Douc, Moulines, Stoffer; Nonlinear time series theory, methods and applications. </div><a href="http://www.almoststochastic.com/2014/06/fishers-identity.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com5tag:blogger.com,1999:blog-8730014808678923743.post-15990837262447836392014-06-04T20:05:00.000+03:002015-09-02T15:31:04.021+03:00Batch MLE for the GARCH(1,1) model<h2 style="text-align: justify;">Introduction</h2><div style="text-align: justify;">In this post, we derive the batch MLE procedure for the GARCH model in a more principled way than <a href="http://www.almoststochastic.com/2013/07/static-parameter-estimation-for-garch.html">the last GARCH post</a>. The derivation presented here is simple and concise. </div><a href="http://www.almoststochastic.com/2014/06/batch-mle-for-garch11-model.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-60140146672484282542014-01-30T17:00:00.000+02:002015-12-06T21:04:47.315+02:00Convergence of gradient descent algorithms<h2 style="text-align: justify;">Introduction</h2><div style="text-align: justify;">In this post, I review the convergence proofs of gradient algorithms. Our main reference is: Leon Bottou, Online learning and stochastic approximations. I rewrite the proofs described in Bottou&#39;s paper but with more details about the points which are subtle to me. I tried to write the proofs as clear as possible so as to make them accessible to everyone. </div><a href="http://www.almoststochastic.com/2014/01/convergence-of-gradient-descent.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com9tag:blogger.com,1999:blog-8730014808678923743.post-83760441320192980002013-11-18T18:59:00.000+02:002015-03-17T17:37:49.871+02:00Fatou's lemma and monotone convergence theorem<div style="text-align: justify;">In this post, we deduce Fatou&#39;s lemma and monotone convergence theorem (MCT) from each other. </div><a href="http://www.almoststochastic.com/2013/11/fatous-lemma-and-monotone-convergence.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-63772761120246301852013-11-14T18:57:00.000+02:002015-03-17T17:38:41.630+02:00Young's, Hölder's and Minkowski's Inequalities<div style="text-align: justify;">In this post, we prove Young&#39;s, Holder&#39;s and Minkowski&#39;s inequalities with full details. We prove Hölder&#39;s inequality using Young&#39;s inequality. Then we prove Minkowski&#39;s inequality by using Hölder. </div><a href="http://www.almoststochastic.com/2013/11/youngs-holders-and-minkowskis.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-4960174174830943782013-08-20T14:35:00.000+03:002015-03-27T00:01:23.255+02:00Sequential importance sampling-resampling<h2 style="text-align: justify;">Introduction</h2><div style="text-align: justify;">In this post, we review the sequential importance sampling-resampling for state space models. These algorithms are also known as particle filters. We give a derivation of these filters and their application to the general state space models. </div><a href="http://www.almoststochastic.com/2013/08/sequential-importance-sampling.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-83855835578719437392013-07-30T14:49:00.002+03:002015-03-17T17:45:48.348+02:00Importance sampling<h2 style="text-align: justify;">Introduction</h2><div style="text-align: justify;">This simple note reviews the importance sampling. This discussion is adapted from <a href="http://dl.dropboxusercontent.com/u/9787379/cmpe58n/cmpe58n-lecture-notes.pdf">here</a> and <a href="http://dl.dropboxusercontent.com/u/9787379/cmpe58n/mc-lecture02.pdf">here</a>. </div><a href="http://www.almoststochastic.com/2013/07/importance-sampling.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-62410438131161285332013-07-22T14:53:00.000+03:002017-06-25T01:14:58.324+03:00Static Parameter Estimation for the GARCH model<h2 style="text-align: justify;">Introduction</h2><div style="text-align: justify;">In this post, we review the online maximum-likelihood parameter estimation for GARCH model which is a dynamic variance model. GARCH can be seen as a toy volatility model and used as a textbook example for financial time series modelling. </div><a href="http://www.almoststochastic.com/2013/07/static-parameter-estimation-for-garch.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-72133479628218209072013-07-15T17:00:00.002+03:002015-03-17T19:09:47.580+02:00On the Poisson Random Variables<div style="text-align: justify;"><h2>Introduction.</h2>In this post, we give insights and theorems on Poisson random variables. Our main reference is: Poisson Processes, J. F. C. Kingman. Oxford Studies in Probability (1993). </div><a href="http://www.almoststochastic.com/2013/07/on-poisson-random-variables.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-40600618830483740602013-06-22T23:55:00.000+03:002017-09-02T21:46:26.543+03:00Nonnegative Matrix Factorization<div style="text-align: justify;"><h2>Introduction.</h2>In this post, we derive the nonnegative matrix factorization (NMF). We derive the multiplicative updates from a gradient descent point of view by using the treatment of Lee and Seung, Algorithms for Nonnegative Matrix Factorization. The code for this blogpost can be accessed from <a href="https://github.com/odakyildiz/Nonnegative-MF">here</a>.</div><a href="http://www.almoststochastic.com/2013/06/nonnegative-matrix-factorization.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com2tag:blogger.com,1999:blog-8730014808678923743.post-88022481156755584302013-06-13T19:42:00.003+03:002015-03-17T17:56:14.289+02:00Finite probability with example<div style="text-align: justify;"><h2>Introduction.</h2>In this post, we give the definitions of sample space, probability measure, random variable. We give these definitions on a very simple example of the space of two coin tosses. Note that definitions in this note are for finite probability spaces and the example simplifies everything significantly. This note is mostly based on the Shreve&#39;s Stochastic Calculus for Finance, vol. I, Chapter 2 and vol. II, Chapter 2.</div><div style="text-align: justify;"></div><a href="http://www.almoststochastic.com/2013/06/finite-probability.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-23505846719122307822013-05-25T01:55:00.000+03:002015-03-17T17:57:21.856+02:00The EM Algorithm<div style="text-align: justify;"><h2>Introduction.</h2>In this post, we review the Expectation-Maximization (EM) algorithm and its use for maximum-likelihood problems.</div><a href="http://www.almoststochastic.com/2013/05/the-em-algorithm.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com2tag:blogger.com,1999:blog-8730014808678923743.post-83965419640833152212013-05-23T18:50:00.000+03:002017-07-07T01:38:13.235+03:00Stochastic gradient descent<div style="text-align: justify;"><b>17/01/2017 update:</b> While searching for something else, I came across with my old blogpost on stochastic gradient descent (SGD) dated back to 23/05/2013. I found it a bit low-level and little informative (this, in fact, is true for most posts from that year). Despite there have been many great posts published on SGD since then, I still wanted to update the version in this blog. So I decided to rewrite it from scratch. </div><a href="http://www.almoststochastic.com/2013/05/stochastic-gradient-descent.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com0tag:blogger.com,1999:blog-8730014808678923743.post-3165621667454878952013-05-20T17:45:00.001+03:002013-07-20T21:21:53.375+03:00Gaussianity, Least squares, Pseudoinverse<div style="text-align: justify;"><h2>Introduction.</h2> In this post, we show the relationship between Gaussian observation model, Least-squares and pseudoinverse. We start with a Gaussian observation model and then move to the least-squares estimation. Then we show that the solution of the least-squares corresponds to the pseudoinverse operation. <br></div><a href="http://www.almoststochastic.com/2013/05/gaussianity-least-squares-pseudoinverse.html#more">Read more »</a>Denizhttp://www.blogger.com/profile/06043980704154563908noreply@blogger.com1