David Gleich's Notebooktag:www.stanford.edu,2010:/~dgleich/notebook//12009-09-17T22:40:13ZA small collection of helpful snippets I've found.Movable Type 3.34RMatlab - Using R from Matlab!tag:www.stanford.edu,2009:/~dgleich/notebook//1.382009-09-17T22:39:54Z2009-09-17T22:40:13ZRMatlab is a wonderful package I just discovered.&#160; It interfaces R and Matlab.&#160; I use it to make sexy figure from within Matlab.&#160; (The alternative is to export data, import it to R, and make a figure.&#160; It's possible, but...RMatlab is a wonderful package I just discovered. It interfaces R and Matlab. I use it to make sexy figure from within Matlab. (The alternative is to export data, import it to R, and make a figure. It's possible, but a big pain.) The package works great, but I found it slightly difficult to get working. Here are the steps I took. ]]>
Prerequisites

A recent version of R; I'm using R 2.9.2 from the R debian repositories.

]]>
Appendix Slides in Beamer: Controlling frame numberstag:www.stanford.edu,2009:/~dgleich/notebook//1.372009-05-05T05:43:53Z2009-05-05T05:51:27ZA fellow student, Andrew Bradley, and I had a chat today about a feature missing from the beamer package in Latex: fine control over slide numbers.&#160; We both subscribe to the philosophy that a each slide in a presentation should...
A fellow student, Andrew Bradley, and I had a chat today about a feature missing from the beamer package in Latex: fine control over slide numbers. We both subscribe to the philosophy that a each slide in a presentation should have a slide number and the total number of slides. (An old professor strongly advocated this approach so that he could critique individual slides and had some idea of how much was left.)

Beamer makes it somewhat difficult to accommodate this system if you have backup slides in your presentation. You know, those things you are supposed to have prepared in case you get one of the "hard" questions? Anyway, these slides aren't a part of the standard presentation, and so they shouldn't count to the total slide number --- but they need to be there in your presentation file.

I had previously faced this issue and realized that you can just store and set the frame numbers in beamer at arbitrary points in the presentation.

However, Andrew wanted to go beyond just skipping backup slides and "uncount" the outline slides that appear at the beginning of a section, as well as the title slide. He proposed using the trick above with the following adjustments.

1. Add \addtocounter{framenumber}{-1} on the \AtBeginSubsection frame
2. Add \setcounter{framenumber}{0} or \setcounter{framenumber}{1}
to the \titlepage frame and/or the \tableofcontents frame
depending on taste.

There you have it, all the tricks about frame counters in beamer that we know!

I suspect there is some way to accomplish the "backup" slides with a modification of the \appendix command and the \AtEndDocument directive. My brief attempt at making these changes ended in failure and frustration. Failure because it didn't work. Frustration because I forgot to write down that it didn't work and didn't remember it the next time I wanted to use this feature. (In fact, I really wish that beamer modified the \appendix command to implement precisely this feature. What else would an appendix do in a presentation?)

]]>
Birthday distributiontag:www.stanford.edu,2009:/~dgleich/notebook//1.362009-04-17T08:31:17Z2009-04-17T08:31:26ZMany of my friends have birthdays in the next few weeks.&#160; This fact prompted a discussion about the uniformity of birthdays.&#160; In Outliers, Gladwell makes the case that birthdays of a group of individuals may appear skewed for subtle reasons;...
Many of my friends have birthdays in the next few weeks. This fact prompted a discussion about the uniformity of birthdays. In Outliers, Gladwell makes the case that birthdays of a group of individuals may appear skewed for subtle reasons; however such results shouldn't hold for the populace.

This question is easy to answer with a bit of Googling. A Dartmouth professor has precisely the required data --- though only for a single year.

The data in that file show many fewer births on weekends compared with weekdays. This effect is precisely what we see in the plot, which R helps us validate.

This analysis was good enough for my own personal edification. There is still a bit of work left to make these claims statistically valid, but that isn't my point here.

]]>
Easy software?tag:www.stanford.edu,2009:/~dgleich/notebook//1.352009-03-30T04:02:45Z2009-03-30T04:32:49Z
This afternoon, I added a "recent links" section to the left hand side of the main notebook index page. This section tracks links from my delicious account. After figuring out that I already had the Feeds App Lite plugin installed, it literally took about 5 minutes to get everything working. From my perspective, that is simply awesome --- I didn't have to write a line of code! Thanks to the authors of Movable Type and the Feeds App Lite for doing a great job.

Update It turns out Live Writer decided to change the quote setting to insert gnarly "curly" quotes instead of the standard straight quotes. Ick, now I have to fix all these ugly quotes. I'm not sure what isn't working now.

]]>
Paper: Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figurestag:www.stanford.edu,2009:/~dgleich/notebook//1.342009-03-07T07:08:00Z2009-03-30T04:30:39ZEvery so often, I come across a paper I find really exciting. Presently, it's Daniel E. Ho and Kevin M. Quinn, Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figures, The American Statistician, November 2008(doi:10.1198/000313008X366145).&#160; The paper...
Every so often, I come across a paper I find really exciting. Presently, it's Daniel E. Ho and Kevin M. Quinn, Improving the Presentation and Interpretation of Online Ratings Data with Model-Based Figures, The American Statistician, November 2008(doi:10.1198/000313008X366145).

The paper tackles one problem that irritates me --- aggregating online ratings --- and uses a solution I've previously considered and wanted to investigate --- profiling the raters. (It's always nice to discover you don't have to do ALL the work yourself.)

The problem is illustrated by the following snapshot from Amazon.

With only three ratings, no product merits a 5-star rating. To Amazon's credit, they clearly note there are only three ratings aggregated into a single 5-star rating. They also show a histogram of the individual ratings elsewhere on the product page.

A simple fix would be to apply well known statistical confidence intervals to the ratings and use the lower bound. Of these fixes, the most naive would be to add pseudo-counts for each possible rating. With three 5-stars, the pseudo-count score would be 3.75. Evan Miller proposes a more sophisticated technique based on Wilson's score in a recent blog post.

Ho and Quinn do something different. They propose a model that incorporates the raters behavior on the site and thus, all ratings are not equal. The types of behavior captured are

Uncritical --- easy to please,

Non-discriminating --- useless, and

Discriminating --- a "critic."

Go read the paper for the details of the model. The punch line is that the aggregate rating depends on all ratings submitted to the site.

After looking at lots of rating data from Netflix, LAUNCHcast, and even the smaller datasets from HelloMovies, incorporating this information seems critical to generate useful aggregate ratings.

One issue with their techniques is the lack of transparency. As a user, I would really have to trust your site to take the ratings seriously. The Amazon approach with the histogram allows me to evaluate the data myself; though I lack the critical context that the Ho/Quinn ratings provide. A second issue is computation. I did not fully check their paper, but I don't think the model is trivial to fit and could be done in real time.

]]>
CoinOr CLOP for Matlabtag:www.stanford.edu,2009:/~dgleich/notebook//1.332009-03-05T06:15:02Z2009-03-05T06:16:32ZJohan Lofberg wrote a nice function to interface the Clp linear program solver from the CoinOR project with Matlab. He distributes a precompiled version of the Clp Matlab wrapper for windows, but doesn't include one for linux. Nor are there...Johan Lofberg wrote a nice function to interface the Clp linear program solver from the CoinOR project with Matlab. He distributes a precompiled version of the Clp Matlab wrapper for windows, but doesn't include one for linux. Nor are there any directions on how to compile here. I managed to get it to compile with Clp-1.9.0. My directions follow. Issue these commands

For Matlab 2008b on Ubuntu 8.10, I found it necessary to install g++-4.1, setup mex with g++-4.1 and run

./configure CXX=g++-4.1
make
make install

before everything would work.

]]>
Latex Presentation Fontstag:www.stanford.edu,2008:/~dgleich/notebook//1.312008-11-09T20:19:25Z2008-11-09T20:31:15Z
I don't like the standard beamer fonts in LaTeX for linear algebra presentations. I find the "I" as a big vertical line quite annoying. Previously, I used the lxfonts set, which has gorgeous characters. However, lxfonts is too "wide" and makes long equations slightly challenging.

The arev package is a nice compromise: it has a nice I glyph. Continue reading for some screenshots.

]]>
Let’s start with the new arev fonts.

The next set is my old preference. It’s the lxfonts set for math, with Myriad Pro semi-bold for text. I think this makes a really nice presentation, but the equations are fairly wide.

]]>
Old posts?tag:www.stanford.edu,2008:/~dgleich/notebook//1.302008-11-09T05:31:10Z2008-11-09T05:31:13Z
I moved a few posts from my old "blog-like" site to this notebook. That's why there are posts going back to 2006.

Going forward, my goal is going to be a post a week.

]]>
kvm vs kqemutag:www.stanford.edu,2008:/~dgleich/notebook//1.252008-11-09T05:10:12Z2008-11-09T05:10:16ZAfter I updated my Ubuntu distribution to 8.04, I discovered that the virtualization software I used to compile MatlabBGL didn't work with a 64-bit version of Windows XP anymore.&#160; No one seems to mention this next fact, but I simply...
After I updated my Ubuntu distribution to 8.04, I discovered that the virtualization software I used to compile MatlabBGL didn't work with a 64-bit version of Windows XP anymore. No one seems to mention this next fact, but I simply couldn't get qemu/kqemu to work with Ubuntu 8.04 and XP64 with a network interface enabled. Instead, kvm --- the "new" and "official" kernel virtualization module --- worked instantly, it even uses the old qemu/kqemu images.

So don't bother with kqemu under Ubuntu 8.04 --- just use kvm.

]]>
A fun LaTeX Commandtag:www.stanford.edu,2007:/~dgleich/notebook//1.242007-12-09T01:47:08Z2007-12-09T02:19:40Z
A while back, I wanted to write "Google" in a presentation. (Given that I do some research on the PageRank system they proposed, I guess this isn't too surprising.)

I believe in making presentations fun and wanted to use the Google colored version of the term. After some googling, I found the RGB colors and painstakingly converted them to floating point values to use with the latex xcolor package. To make my life easier, I encoded everything into the following command.

Later, I learned that I could have used the RGB values with the color package directly. Oh well, it would have saved a little bit of time.

]]>
Getting a 64-bit copy of Windows XPtag:www.stanford.edu,2007:/~dgleich/notebook//1.232007-12-06T21:12:07Z2007-12-09T02:19:25ZI've been getting a few emails about folks interested in using my MatlabBGL software on 64-bit installations of Windows.&#160; Personally, I don't have a 64-bit installation of a Windows environment (XP, Server, or Vista) which always made testing a little...
I've been getting a few emails about folks interested in using my MatlabBGL software on 64-bit installations of Windows.

Personally, I don't have a 64-bit installation of a Windows environment (XP, Server, or Vista) which always made testing a little difficult. A while back, the ICME sysadmin (a really huge help!) setup a machine with a 64-bit copy of XP to test for one person. I got the code compiled and everything tested. (Unfortunately, I couldn't reproduce the problem he identified.)

In theory, I thought it would work quite simply. I compile a 64-bit libmbgl, and they compile the .mexw64 files on their system.

Life is never that simple.

]]>
Suffice it to say, I ran into tremendous problems on getting libmbgl and the mex files compiled on the other computer.

To attack the problem I set about getting a 64-bit copy of WinXP.

Sometimes, being at Stanford is great. I emailed our sysadmin. By the time I came back from my next class, a WinXP 64-bit edition CD was waiting for me. He need to run somewhere to get the volume license key, but in a short bit, I had my operating system.

I did not get another computer to install it on and instead, I setup QEMU on my ubuntu system.

sudo apt-get install qemu

Blah

To get things working, I created a 16GB image for the qemu virtual machine and an ISO of the winxp64 install disk.

... gave me a working virtual machine with a running windows installation extolling the benefits of 64-bit computing.

The install took ages. When it finished, XP rebooted itself and qemu did not boot from the virtual CD-ROM again. Quickly thereafter, I logged into the XP system and tried Internet Explorer.

Nothing... there was no network!

Blast.

Google showed that I should be able to setup a NE2000 PCI network card with QEMU, but some quick checking showed that XP64 didn't have a NE2000 PCI driver.

Double blast.

A few hours past and I considered the possibility that my "open-source" QEMU idea wouldn't cut it. After reading the QEMU FAQ, they posted a note on how to get the network working with Windows Server 2003. Inspiration struck when I remembered that XP64 is based on the Windows Server 2003 source code. I clicked the "howto" from the QEMU FAQ and the link was dead!

BLAST!

But, the link was to a realtek site, whereas the NE2000 driver should have been a novell site. Maybe... there was something else I could do? Indeed there was: QEMU can also emulate a realtek card as I found out on another page. I tried this and still had problems, but Windows recognized the driver and virtual network card. Another google search showed that I had to add just one more command line option to make things work.

]]>
Research Note 1tag:www.stanford.edu,2007:/~dgleich/notebook//1.22007-11-04T21:28:26Z2007-11-04T21:28:29ZI've decided to try and setup an online research notebook to keep track of a few issues as I encounter them.&#xA0; The first set of entries will revolve around Beamer used for a presentation....
I've decided to try and setup an online research notebook to keep track of a few issues as I encounter them. The first set of entries will revolve around Beamer used for a presentation.]]>
Reading Cluto Files in Matlab (Part 1)tag:www.stanford.edu,2006:/~dgleich/notebook//1.282006-04-11T06:26:08Z2008-11-09T05:26:44ZUpdate: I never completed this series, but I do have a Cluto sparse matrix reading for Matlab.&#160; Contact me for information, or check out readCluto on Matlab central. Today, let&#8217;s see a Matlab solution to reading and writing CLUTO data...Update: I never completed this series, but I do have a Cluto sparse matrix reading for Matlab. Contact me for information, or check out readCluto on Matlab central.

Today, let’s see a Matlab solution to reading and writing CLUTO data files. CLUTO is a clustering toolkit by George Karypis at U of Minn. There are FOUR possible input files CLUTO might see.

Dense Graph

Sparse Graph

Dense Matrix

Sparse Matrix

The difference between the dense and sparse files is simply a matter of header information. Let’s describe a few CLUTO files.

Suppose we have a 5-node line graph

v1 <-> v2 <-> v3 <-> v4 <-> v5

In dense graph format, this graph is

5 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0

In sparse graph format, this graph is

5 8 2 1 1 1 3 1 2 1 4 1 3 1 5 1 4 1

To wit, the dense graph format is merely an explicit specification of the adjacency matrix for the graph with a single line specifying the number of vertices. The sparse graph format is a sparse adjacency representation. The sparse adjacency is somewhat strange in that it uses 1 based columns, and implicit rows.

More formally, the sparse adjacency structure has 1 line of header information:

<number of vertices> <number of edges*2>

and the i+1th line of the file contains

adj_1 weight_1 adj_2 weight_2 … adj_d weight_d

where adj_j is the jth adjacent vertex and weight_j is the weight of that edge and d is the degree of the ith vertex. The input must be symmetric.

The sparse and dense matrix file formats are similar. The difference is the matrices involved are not square which changes the header.

The dense matrix format is just a row-by-row listing of the elements of the matrix. The sparse matrix format is a sparse row-by-row listing. In the sparse matrix format, the i+1th line of the file contains information about the non-zeros in row i.
column_1 value_1 column_2 value_2 … column_d value_d

Coming next, we’ll see how to read these files in Matlab using a combination of mex files and scripts.

]]>
Sorting two arrays simultaneouslytag:www.stanford.edu,2006:/~dgleich/notebook//1.292006-03-25T05:27:08Z2008-11-09T05:27:23ZSuppose, for the sake of this article, that you have two arrays in C++ with the same number of element.&#160; For example, int a1[] = {5, 8, 9, 1, 4, 3, 2}; double v[] = {3., 4., 5., 2., 1.,...
Suppose, for the sake of this article, that you have two arrays in C++ with the same number of element. For example,

Now, for reasons that perhaps only I care about. I want to sort the array a1, and permute v in according to the same permutation.

// a1 = {1, 2, 3, 4, 5, 8, 9} // v = {2., 8., 9., 1., 3., 4., 5.,}

(For those who care and who know what I'm talking about. I have a compressed sparse row matrix represented in the AIJ format and I want to sort the element of each row in increasing order so I can do O(log n) binary search to determine if an element exists. However, as always, the problem is more generic than the instance I care to solve.)

To wit, I wish to sort one array and "take the other" along for a ride.

This problem has a few trivial solutions:

1. Sort the array a1 implicitly by sorting a permutation vector that indexes into a1. Then permute a1, and v by this array.

2. Write a custom sorting routine that does the operation for this special case.

3. Try to shoehorn this problem into an existing sorting array.

In terms of performance, the fastest solution is probably 2. The second fastest is probably 1. Finally, 3. is likely the slowest.

Solution 3, however, has two huge advantages. First, I don't have to write my own sorting routine. This is important as writing a general purpose sort is somewhat non-trivial. Also, it is fairly likely that the input to the sort will be nearly sorted so I can't use a general purpose quicksort routine which has O(n^2) performance on a sorted array. The second advtange is that it does not require extra memory as solution 1 does. In fact, solution 1 requires quite a bit of extra memory. While it is a tractable amount, it is nonetheless superfluous.

Thus, I decided to look at solution 3. Between STL and Boost, there are quite robust and generic C++ sorting libraries. How hard could this be? Maybe 20, 30 minutes of work?

Thankfully, I did not impose any sort of "performance" requirement on myself. In fact, many of the arrays will be rather small; so performance for large arrays is not quite so important.

First, C++ STL sort does not work on boost's zip_iterators, which would have been the natural solution. In fact, there seem to be a number of debates on this matter; and more generally on the requirements of iterators, zip_iterators, and the STL Sort function.

Second, the fundamental problem is that "pairs" of array references do not behave like they should for things to work nicely. That is, there are no clean generalizations of a set of pointers. The best that exists is the boost::tuple class with a set of reference. However, that class fails because references and values are quite strange and do not behave quite like you think.

Instead, I simply decided to abuse the notation of an iterator and write something that works.

This involved writing, effectively, a non-conforming iterator where the reference of the value type is not the same as the reference type.

Here is the code for anyone who cares. (This is a slightly modified version from my code, so it may not compiler, but the fixes should be trivial.)

This paper addresses an important aspect of quantum computing. The authors show that designing a quantum circuit is equivalent to minimizing the difference between two linear operators. Further, the "cost" of the circuit (in number of gates) is proportional to the minimum geodesic (shortest path) distance between the trivial operator and the desired quantum operator.

Intriguingly, the authors make the statement that we can use their results to find the initial point to begin a search for the quantum circuit; but (and this is an important but) we do not know the initial velocity along the shortest path. In all likelihood, this problem will be resolved in the future. Nevertheless, this statement harks to the uncertainty principle. Suppose a similar result is true for these Riemannian spaces? We cannot know both the initial search position and the initial velocity. Again, probably my naïveté.