Tuesday, December 21, 2010

Wednesday, December 15, 2010

This Wednesday I'll pick up where we left off last week when we covered graphical models, exponential families, and the basic ideas behind variational inference. This week I will go over variational inference in greater depth, and then describe some approximations to the variational problem that render it tractable: sum-product and the Bethe entropy approximation; mean field methods (time permitting); and convex approximations, in particular tree-reweighted belief propagation. The material is drawn from chapters 3, 4, 5, and 7 of the same paper.

Wednesday, December 1, 2010

Monday, November 29, 2010

"This week I plan to present topics from the first half (chapters 1-5) of Wainwright and Jordan's "Graphical Models, Exponential Families, and Variational Inference". I will emphasize the ideas of 1) conjugate duality between partition function and negative entropy, and 2) nonconvexity in mean field approaches to inference. I will present the following week on some combination of ideas from the second half of the same paper, related papers by Wainwright, and related stuff Liam and I have been working on, depending on time and what people are interested in after the first hour."

Friday, November 19, 2010

If you have a large shared software project and want an easy way to manage collaboration, branch off different versions of the project and generally keep things organized, you need a version control system. Git is the one I know (Mercurial and Subversion are also popular choices) and the one other people in the group are using, so it's the one I'll go over here.

In particular, every call of "git pull origin master" pulls the currest master version off github. Be sure you're working with the current version before trying to push local changes, or you might get conflicts. To commit changes, either call

git add files_that_changed

git commit

or

git commit -a

which commits all changed files. Committing brings up a text editor where you describe any changes made. Be aware, if you don't write anything the commit will be aborted. Then to push your changes from the local machine to github just type

git push

and that's it! Things get messier when working with branches, checkouts and merges, but for 90% of what I do the above suffices. The web abounds with tutorials and a quick reference sheet can be found here.

Tuesday, November 16, 2010

I'm presenting joint work with Frank Wood and Nicholas Bartlett on learning simple models for discrete sequence prediction. We describe a novel Bayesian framework for learning probabilistic deterministic finite automata (PDFA), which are a class of simple generative models for sequences from a discrete alphabet. We first define a prior over PDFA with a fixed number of states, and then by taking the limit as the number of states becomes unbounded, we show that the prior has a well defined limit, a model we call a Probabilistic Deterministic Infinite Automata (PDIA). Inference is tractable with MCMC, and we show results from experiments with synthetic grammars, DNA and natural language. In particular, we find on complex data that averaging predictions over many MCMC samples leads to improved performance, and that the learned models perform as well as 3rd-order Markov models with about 1/10th as many states. For the curious, a write-up of my work can be found here.

Also, following the talk I'm going to give a brief tutorial on git, a free version control system used in the software community for maintaining large collaborative code bases. I'd like to set up a git repository for the Paninski group so we can avoid too much code duplication and build on each others' work, and I promise it's actually pretty easy once you learn the basics.

Monday, November 15, 2010

I'm giving a brief (30") talk about my recent work this Wednesday at noon in room 903 SSW. The abstract is below. I'll presumably be giving a fuller talk on the same eventually in our group meeting, but in case you're looking for lunchtime entertainment...

Cheers,Carl

P.S. This is a one-hour situation with two half-hour presenters. So if you come, you could wind up watching someone else first, or only!

Abstract: There is substantial interest in tractable inference on distributions of distributions, confined obviously to a simplex. Regularization of the Dirichlet distribution of random variables, without compromising tractability of inference, would be useful for encoding prior knowledge of interactions among the components, for instance in topic models. I will present a class of regularized Dirichlet distributions that are in fact especially scalable hidden Markov models. The same framework allows for tractable exact inference on certain loopy graphs of the same type.

Wednesday, November 10, 2010

Chaitu will describe the major results and proofs since the 2005 paper by Candes which gives sufficient conditions for stable recovery of sparse signals from incomplete measurements. Kamiar will be finishing off where he left off at his last presentation.

Thursday, October 21, 2010

In my talk I will describe a recent extension of the Sequential Monte Carlo (SMC) method. SMCs (particle filters) are a commonly used method to estimate a latent dynamical process from sequential noise-contaminated observations. SMCs are extremely powerful but suffer from sample impoverishment, a situation in which very few diﬀerent particles represent the distribution of interest. I will describe our attempt to circumvent this fundamental problem by adding an extra MCMC step in the SMC algorithm. I will illustrate the usefulness of this algorithm by considering a toy neuroscience example.

Thursday, September 30, 2010

I've improved the codes for the parallel computing, which I talked about at the seminar a month ago - it should be really simple to use now :). Also, the problem with the atomic operation, at least under Linux, is solved as well now. I've written up a description of the codes, commented them and compiled two examples: one to be run on a single machine with several copies of Matlab running in parallel and another is for the HPC cluster. Everything can be found here: http://neurotheory.columbia.edu/~max/codes/ParallelComputation.zip

If you have comments, suggestions - will be happy to hear! Will also be glad to help resolving problems, if they arise, or to explain the code, if needed. Also, if you start using the code, please let me know - it's always encouraging to know that the work goes to masses :).

We study the problem of signal decomposition where the signal is a noisy superposition of template features. Each template can occur multiple times in the signal, and associated with each instance is an unknown amount of transformation that the template undergoes. The templates and transformation types are assumed to be known, but the number of instances and associated amounts of transformation with each must be recovered from the signal. In this setting, current methods construct a dictionary containing several transformed copies of each template and employ approximate methods to solve a sparse linear inverse problem. We propose to use a set of basis functions that can interpolate the template under any small amount of transformation(s). Both the amplitude of the feature and the amount of transformation is encoded in the basis coefficients in a way depending on the interpolation scheme used. We construct a dictionary containing transformed copies of these basis functions, where the copies are spaced as far out as the interpolation is accurate. The coefficients are obtained by solving a constrained sparse linear inverse problem where the sparsity penalty is applied across, but not within these groups. We compare our method with standard basis pursuit on a sparse deconvolution task. We find that our method outperforms these methods in that they yield sparser solutions while still having lower reconstruction error.

Monday, September 20, 2010

Title: Testing efficient coding for a complete and inhomogeneous neural population

The theory of efficient coding under the linear Gaussian model, originally formulated by Linsker (1989), Atick & Redlich (1990), and van Hateren (1992), is quite well-known. However, its direct test with physiological data (a complete population of receptive fields) has been hampered in the past twenty years for two reasons: a) There is no physiological data available. b) The earlier models are too simplistic to compare with physiological data.

We resolve these two issues, and furthermore, we develop two novel methods to assess how the structures of the retinal transform match those of the theoretically derived, optimal transform. The main conclusion of this study is that the retinal transform is at least 80% optimal, when evaluated with the linear-Gaussian model.

We also clarify the characteristics of the retinal transform that are and are not explained by the proposed model, and discuss the future directions and preliminary results along these lines.

Sunday, August 22, 2010

The classic view of a neuron as a point element, combining a large number of small synaptic currents, and comparing the sum to a fixed threshold, is becoming more difficult to sustain given the plethora of non-linear regenerative processes known to take place in the soma, axon and even the dendritic tree. Since a common source for the complexity in the input, soma and output is the behavior of ionic channels, we propose a view of a neuron as a population of channels.

Analyzing the stochastic nature of ion channels using recently developed mathematical model, we provide a rather general characterization of the input output relation of the neuron, which admits a surprising level of analytic tractability.

The view developed provides a clear quantitative explanation to history-dependent effects in neurons and of the observed irregularity in firing. Interestingly, the present explanation of firing irregularity does not require a globally balanced state, but, rather, results from the intrinsic properties of a single neuron.

Thursday, August 19, 2010

While we are talking about tools for using the HPC cluster, here's an ad for a tool of my own.

I have been using agricola to submit jobs to the HPC cluster from within matlab. It is a very simple tool: Instead of launching a calculation on your local machine by typing in the matlab prompt:

my_result = my_function( some_parameters ) ;

one types:

sow( 'my_result' , @()my_function( some_parameters ) ) ;

This will copy all the .m files in your current directory into a folder on the HPC submit machine, generate a submit file there, and launch the calculation on the cluster. Then some time later, when you suspect the job is done, you type:

reap

which makes the variable my_result appear in your matlab workspace. reap itself returns all the .out, .log, and .err files for you to look at from within matlab.

Unlike Max's code, agricola does not aim to parallelize your code; it just handles sending files back and forth with ssh and job submission.

Tuesday, August 17, 2010

I will try to cover two topics: multithreading with Matlab on HPC and new numeric methods for the density forward propagation.

In the first part (~15-20min), I'll briefly present Matlab code which allows easy and flexible multithreading for the loops which have independent internal blocks with different values of loop-variables. It should be useful in many computationally expensive optimization problems. The main problem here was to devise a method for locking a JobSubmit file which is used for communication between the main programs and the threads. Unfortunately, I have just discovered that the method I implemented does not give 100% result. At the same time, the code works in most of the cases and simply leads to duplicate computations in the rare situations when the file-locking method failures.

The second part will be on numeric methods for the forward propagation. In recent years a number of articles has been published which focused on the methods for the solution of the Fokker-Planck equation for the associated stochastic integrate-and-fire model. We develop a new method for the numerical estimation of the forward propagation density by computing it via direct quadratic convolution on a dynamic adaptive grid. This method allows us to significantly improve the accuracy of the computations, avoid treating the extreme cases as such and to improve (or, at least, preserve) the speed of the computation in comparison to other methods. We also found that below some value of the time step of the numeric propagation the solution becomes unstable. By considering the density being not centered in the bins centers, but distributed across the bins, we derive a simple condition for the stability of the method. Interestingly, the condition we derive binds linearly the temporal and spatial resolutions - contrary to the well-known Courant stability condition for the Fokker-Planck equation. We further improve the speed of the method by combining it with the fast gauss transform.

We are looking for feedback, so if the abstract below piques your interest please take a look at the paper and let us know what you think.

Due to the limitations of current voltage sensing techniques, optimal filtering of noisy, undersampled voltage signals on dendritic trees is a key problem in computational cellular neuroscience. These limitations lead to two sources of difficulty: 1) voltage data is incomplete (in the sense of only capturing a small portion of the full spatiotemporal signal) and 2) these data are available in only limited quantities for a single neuron. In this paper we use a Kalman filtering framework to develop optimal experimental design for voltage sampling. Our approach is to use a simple greedy algorithm with lazy evaluation to minimize the expected mean-square error of the estimated spatiotemporal voltage signal. We take advantage of some particular features of the dendritic filtering problem to efficiently calculate the estimator covariance by approximating it as a low-rank perturbation to the steady-state (zero-SNR) solution. We test our framework with simulations of real dendritic branching structures and compare the quality of both time-invariant and time-varying sampling schemes. The lazy evaluation proved critical to making the optimization tractable. In the time-invariant case improvements ranged from 30-100% over simpler methods, with larger gains for smaller numbers of observations. Allowing for time-dependent sampling produced up to an additional 30% improvement.

Thursday, July 29, 2010

I will attempt to take the latest paper on the deterministic particle flow filter discussed in a previous blog post, and strip it down to the essentials. The authors present a more general, stable and improved version of their previous deterministic particle flow filter, supposedly. This paper is rife with ideas and peculiarly written; for a gentler introduction, please refer to the papers linked to in the previous blog post. Here is the paper:

Friday, July 23, 2010

No resampling, rejection, or importance sampling are used. Particles are propagated through time by numerically integrating an ODE. The method is very similar in spirit to Jascha Sohl-Dickstein, Peter Battaglino and Mike DeWeese's Minimum probability flow learning, but applied to nonlinear filtering.

The authors report orders of magnitude speedups for higher dimensional state spaces where sampling rejection would be a problem.

I will present work done with Liam, Jeff Gauthier and others in EJ Chichilnisky's lab on locating retinal cones from multiple ganglion cell recordings. We write down a single hierarchical model where ganglion cell responses are modeled as independent GLMs with space-time-color separable filters and no spike history. Assuming the stimulus was gaussian ensures that the ganglion cell Spike Triggered Averages are sufficient statistics. The spatial component is then assumed to be a weighted sum of non-overlapping and appropriately placed archetypical cone receptive fields. With a benign approximation, we can integrate out the weights and focus on doing MCMC in the space of cone locations and colors only. As it turns out, this likelihood landscape has many nasty local maxima; we use parallel tempering and a few techniques specific to this problem to ensure ergodicity of the markov chain.

In it the authors describe an algorithm that takes a spike train and jitters the spike times to create a new spike train which is maximally random while preserving the firing rate and recent spike-history of the original train.

ABSTRACT: Modern neuroimaging methods allow the rapid collection of
large (> 100,000 voxel) volumetric time-series. Consequently there has
been a growing interest in applying supervised (classification,
regression) and unsupervised (factor analytic) machine learning
methods to uncover interesting patterns in these rich data.

However, as classically formulated, such approaches are difficult to
interpret when ﬁt to correlated, multivariate data in the presence of
noise. In such cases, these models may suffer from coefficient
instability and sensitivity to outliers, and typically return dense
rather than parsimonious solutions. Furthermore, on large data they
can take an unreasonably long time to compute.

I will discuss ongoing research in the area of sparse but structured
methods for classification, regression, and factor analysis that aim
to produce interpretable solutions and to incorporate realistic
physical priors in the face of large, spatially and temporally
correlated data. Two examples--whole-brain classification of
spatiotemporal fMRI data and nonnegative sparse PCA applied to 3D
calcium imaging--will be presented.

Saturday, July 10, 2010

I'll be presenting a paper from Matt Harrsion. In it the authors describe an algorithm that takes a spike train and jitters the spike times to create a new spike train which is maximally random while preserving the firing rate and recent spike-history of the original train.

Thursday, July 1, 2010

We have developed a simple model neuron for inference on noisy spike trains. In particular, we have in mind to use this model for computationally tractable quantification of information loss due to spike-time jitter. I will introduce the model, and in particular its favorable scaling properties. I'll display some results from inference done on synthetic data. Lastly, I'll describe an efficient scheme we devised for inference with a particular class of priors on the stimulus space that could be interesting outside the context of this model.