Tuesday, September 18, 2012

My father was writing a short obituary for a mathematician called
Friedrich Hirzebruch. This remarkable gentleman invented a new type of conference called "Arbeitstagung". Quoting from the New York Times obituary:

Dr. Hirzebruch began by establishing an informal yearly mathematical meeting that he called the Arbeitstagung (working meeting), which quickly grew from a group of seven to more than 200 attendees.
It was unusual in that it had no programs or invitations. On the first day, the participants would gather in an auditorium and call out topics, which Dr. Hirzebruch would assign to experts in the audience. In the pre-Internet world, this was the best way to keep up with the latest developments.

Tuesday, April 10, 2012

I have occasionally asked respected colleagues what percent of published neuroimaging findings they think would replicate, and the answer is generally very depressing. My own guess is *way* less than 50%.

I realized that I had never asked myself that question in a direct way. In the same discussion I phrased the question like this:

Let us say you took a random sample of papers using functional MRI over the last five years. For each study in the sample, you repeated the same experiment. What proportion of your repeat experiments would substantially replicate the main findings of the original paper.

I guessed that the answer to this question was 30%. Nancy Kanwisher gave her guess first, so I can't claim that my guess was entirely independent. I also started asking my colleagues, and got guesses between 20% and 50%.

One obvious criticism that I got was that I had just made that figure up. Of course that is true, and I didn't try to justify my guess. Then someone emailed me to ask me why I guessed 30%.

I like public discussion rather than private, so here is my answer, in all its sad ignorance.

The evidence that I have comes from my own experience. I have a moderate amount of experience of functional MRI (FMRI) analysis for my own studies and studies that I have collaborated on. I have advised other people on FMRI analysis almost since I started functional imaging myself in 1996. I have been teaching functional imaging analysis since about 1998. I have reviewed a moderate number of papers but these have nearly all been methods papers which are not typical of FMRI papers. I have attended many lab meetings and presentations of experiments that have typically been to do with higher-order motor tasks or attention and memory. To my shame I do not read many FMRI applications papers. For the last 4 years or so I have worked nearly full time on code for imaging analysis. That's the "nipy" in the title of this blog.

My own experience of analysis is that we make a huge number of mistakes. There are many stupid and not so stupid mistakes to make, from getting the data off the scanner to making the final activation maps. I would claim to be more careful than average with my analysis (if only because I am interested in analysis) but I have made many mistakes, sometimes the same mistake several times in the same analysis. I gave a talk on mistakes in functional MRI analysis at the Human Brain Mapping conference. I gave examples of real mistakes, and in order to be fair, I only used my own mistakes as examples. I didn't have much difficulty filling the time.

One of the problems is that there are many steps in the analysis. Some of these steps are automated and some are half-automated. For example, when I am doing an analysis I am likely to pick up some of my old scripts from a previous analysis. I wrote these scripts, but I forget the hacks I put into them to make them work with the previous data. The problem is much worse for new lab members and researchers who do not write code. They often find that someone has given them a script that they don't understand, perhaps with some help in modifying it. It's very easy to get lost inside these scripts and it's also easy to find yourself applying parameters that are not right for your own data without knowing it.

A recent example was slice-timing correction on Siemens scanners. It's probably more common than not to use interleaved slice acquisition for FMRI. Typically this means that the bottom slice in space (slice 1) is collected first then all the odd-numbered slices from 3 to the top of the brain. After that you acquire the even slices starting at the second from bottom in space. This is the classic 1,3 ... 2, 4 ... interleaved acquisition order. However, that's not right for some Siemens scanners; if there are an even number of slices, it turns out that Siemens acquires 2, 4, ... 1, 3 ... Who knew? It turned out some people did, but many people, including me, did not, and I'd been analyzing Siemens data for a long time. If the TR is the time to acquire one whole brain volume, then getting the slice timing wrong means that every slice in your volume is half a TR wrong in time (plus or minus). Now imagine you'd been using someone else's analysis script, and that the person who wrote the script didn't know either. The problem here is that we often use scripts we don't fully understand or haven't reviewed recently. This makes it less likely that we will go and check.

Then there are simple logic errors and typos. Working on code has helped me understand just how prone we are to error. Good practice in coding means testing every line of your code to convince yourself and others that it does what you say it does. When I do this, I find lots of errors. I never did this when I was writing analysis scripts, and I don't think many researchers do. I conclude that there must have been a considerable number of errors. I would have found the errors that made the data look funny, but I would likely have missed errors that left the data looking plausible.

To deal with this we would need to teach ourselves good practices for using and writing software, because that is what we are doing. But, this is almost never taught. I was certainly not taught that, and stumbled across the basics after I had been doing a lot of coding for a long time. Knowing what I know now, I would not let a student loose on FMRI data without a good basic knowledge of software engineering. I hope very much that that becomes routine over the next 10 years.

The last aspect of analysis in FMRI is just how many different times we tend to analyze the data. There are so many different things to try and we find ourselves taking many paths through the analysis. This is particularly marked when we get to the statistics. There are many different statistical models to apply to the data, and we often end up trying a large number. The great risk is that we will stop analyzing when we see a result we like. This must occur often in practice. That makes it very difficult to know whether the result is a real one or the result of trying many different analyses on data that has no real signal.

The difficulties of the analysis are compounded by the fact that we don't teach analysis very well. The people doing analysis are often people like me in that they have very little background in engineering mathematics. To understand the theory of the analysis means you have to understand some of Fourier analysis, numerical optimization, image processing, brain anatomy, filter theory, linear algebra and statistical inference. That is hard to teach in a short period, and we haven't had long to get this right - FMRI only started in 1992. The result is that many people doing imaging feel overwhelmed by their lack of background knowledge. I certainly suffered from that and still do.

Last there is the culture surrounding FMRI. FMRI is new. If is fashionable. It is expensive. It gets high profile publications and gets you into the news.

John P. A. Ioannidis wrote a famous paper called "Why Most Published Research Findings Are False".
Here are his 6 corollaries:

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.

Tuesday, March 27, 2012

I find forks interesting because I'm interested in the way that open-source projects break down.

Martin Sustrik appears to be the lead author and architect of ZeroMQ, at least up until March 2010 [buyout, switch]. He apparently developed ZeroMQ as part of the open-source company FastMQ. Around October 2009, a Belgian company called iMatix bought FastMQ [buyout] and switched from its own messaging protocols to ZeroMQ [switch]. Pieter Hintjens was and still is the CEO of iMatix.

From around 2010, Martin Lucina was "co-maintainer" of ZeroMQ with Martin Sustrik [cv]. He and Sustrik both live in Bratislava [beer-mail]:

Also, a lot of discussion between Martin Sustrik and myself got done in person, simply because we live and work in the same city (Bratislava, Slovakia).

In at least one of these, Hintjens complains about Lucina's "ego", but asserts that ego is a major or the major motivation for work on ZeroMQ [first-fork]:

I'll just point out that your own ego is dominant and not always
pleasantly for others. I vaguely recall an original proposal for
multiple release branches that was quashed without sympathy (though
now you are arguing for exactly that). However without egos, and the
desire to dominate the work we do, this community would not exist, so
let's embrace that rather than deny it.

In the same email he loses patience with one of Lucina's objections, points out that iMatix owns ZeroMQ, and suggests Lucina forks the code if he is unhappy [first-fork]:

Shrug. You are free to fork any 0MQ repository and make your own
versions and releases. It is LGPL licensed. The packages distributed
from the zeromq.org site, which iMatix owns, and labelled ZeroMQ, a
name that iMatix also owns, ultimately fall under iMatix's purvey. If
you feel iMatix has been a bad host to the 0MQ community, feel free to
fork. This is an essential freedom, don't hesitate if you think it's
necessary.

Luchina objected to proposals requiring pull requests on github, on the basis that it would lead to vendor lock-in [lucina-lockin], with some sympathy from Sustrik [sustrik-lockin].

Hintjens wrote a tradmark policy for the ZeroMQ name in May 2011 [tm-email, tm]. A long thread developed from the initial announcement [tm-email]

In December 2011, there was a private email conversation between Hintjens, Sustrik and Lucina [metadict]. During this conversation, Lucina says that Sustrik "resigned as the benevolent dictator". Lucina [lagree] and Sustrik [sagree] agreed to release the contents of this private email conversation, but Hintjens did not [hdisagree].

Around January 2012 Hintjens proposed a radically open model of development [policy]. He had the view that "Maintainers are not developers and they have no opinion on patches". Thus maintainers should push all or nearly all patches after confirming basic process [policy].

In January 2012, Lucina wrote to the ZeroMQ mailing list [wtf] pointing out some of the more radical aspects of Hintjens' proposal. He gave the opinion that this way of working was already causing a decline in code quality and finished with an appeal:

Thoughts? Most of us here are software engineers by trade, surely you can see where this is leading.

In an email on 4 February 2012, Lucina refers to a conversation with Hintjens the previous day in which Hintjens apparently asserted his leadership of the ZeroMQ community [metadict]. Lucina again appealed for support from other members of the community [metadict]:

If you care about this, the only way [4] to achieve change today is to be
vocal, and to criticise and argue those points of the process which you
care about. Having known Pieter personally for more than 10 years now, and spent many many hours arguing with him, I wish you luck!

This is footnote 4 from the same email:

[4] The other option is a fork (in the formal sense, not the Github sense). This is an option of *last resort*, and is not something to be taken lightly. However, it does put you on a fair playing ground; build your own community and process, and ultimately users will follow the leader with their feet.

Later in the same thread Hintjens withdrew from further discussion with [hdisagree]:

I'm retiring from this thread now and will by necessity ignore any
further discussion that isn't forward-focused and constructive.

Sustrik announced the release of Crossroads I/0 on March 15 2012 [fork]:

Crossroads I/O is a fork of the ZeroMQ project.

While we acknowledge forking can be a painful process, we felt the
ZeroMQ trademark policy to be overly restrictive.

Furthermore, the ZeroMQ community has also recently chosen to institute a light review process, which we feel is at odds with the technical quality and long-term goals we desire for the project.

To grow [a thriving commercial] ecosystem the project must be fully vendor neutral, and implement a liberal (e.g. Linux-style) trademark policy allowing use of the trademark for third party distributions of the software, as well as for plug-ins and extensions.

The development policy for the new project is hardline Linux kernel style : all changes must be submitted to the mailing list as a patch [harddev]. They do not accept pull requests because [nopull]:

Pull requests can change while being reviewed. This makes it impossible to ensure that the code being merged is the same code that has been reviewed and discussed, which compromises integrity of the codebase.

Pull requests are meant for delegation of work to sub-maintainers and require an established web of trust. We may consider moving to this model in future.

There are two interesting questions.

Was there anything that the community could have done to avoid this fork?

Monday, February 20, 2012

In that post, I alluded to the idea of using seniority to weight opinions. That is person A gives opinion X, person B gives opinion Y. Person A is more senior than person B and opinion X is accepted. I will call this "brass ring merit" [1]

The idea of brass ring merit in open source is to define seniority to be an index of how much the person has previously contributed to the project, and proceed as above. If you disagree with someone who has contributed more, you should be silent.

I think this is a terrible mistake and here I will try to explain why.

Many healthy societies use meritocracy, but, in a healthy society "merit" is not primarily defined by past achievement, but by quality of the argument.

In this world "merit" refers - primarily - to the thing that person A is saying now, and the sum of the qualities of the arguments of A over a recent period. I will call this "acute merit".

To some this kind of merit feels like it will lead to chaos. How will you know who is right without an index of seniority? Acute merit is hard work, because you have to read and understand the argument, no matter who sent the email. Brass ring merit is much simpler: keep the hierarchy in mind and check the author's email address.

That is not the only problem with acute meritocracy. In the brass ring world, if you are touching brass, then you do not have to worry about what other people say. In an acute world, you must justify your opinion to any and all. That takes time and energy. You might turn out to be wrong, and you might have to change what you say or what you do or both.

... the unmediated participation of all community members in the process of formulating problems and negotiating decisions ...

This does not mean we give each opinion equal weight. It means we judge the merit of the argument, not the merit of the author.

Is it true that healthy societies judge by acute merit?

Exhibit A: the authority of Linus Torvalds over Linux is contingent

In fact, for [Linus'] decisions to be received as legitimate, they have to be consistent with the consensus of the opinions of participating developers as manifest on Linux mailing lists. It is not unusual for him to back down from a decision under the pressure of criticism from other developers. His position is based on the recognition of his fitness by the community of Linux developers and this type of authority is, therefore, constantly subject to withdrawal. His role is not that of a boss or a manager in the usual sense. In the final analysis, the direction of the project springs from the cumulative synthesis of modifications contributed by individual developers. George Dafermos, interview on governance of open source

Exhibit B: early Microsoft cared more about argument than seniority

[An important person called] Greg called a BIG MEETING and proceeded to complain about how the Excel team (meaning me) was screwing up the macro strategy. We pressured him to come up with some specific reasons but his arguments just weren't convincing. I thought it was nice that here I was, a new hire pipsqueak right out of college, arguing with employee number 6 and apparently winning the argument. (Can you imagine that happening at a Grey Flannel Suit company?) Joel Spolsky on his work on Excel

Exhibit C: Hewlett-Packard managed by Bill and Dave

Although the partners were demanding taskmasters, they created an environment in which managers were free to speak their minds. Steve Gomo was a young midlevel manager when he was asked to present the capital budget to the board of directors in 1978, because the executive that normally did the job was not available. Terrified, Gomo slaved to create a 12-slide presentation, and remembers practicing in front of a mirror for hours. Finally, he was called into to the meeting. Gomo's presentation went well until he showed a slide filled with financial data. Suddenly Hewlett piped up. "What is this gross asset and accumulated depreciation stuff? We only show one thing: net assets"

There was total silence. Gomo swallowed hard, and said, "Actually that's wrong Bill, we don't just look at net assets."

"Yes we do," said Hewlett firmly. Then turning to CFO Ed van Bronkhorst, he said, "We never look at anything but net assets, right Ed?"

"No Bill, he's right," said von Bronkhorst.

Suddenly the whole room burst out laughing at the thought of this rookie showing up Hewlett - except for Gomo who just wanted to get the heck out of there. Fumbling with his papers, he moved towards the door, when suddenly Packard stood up, dominating the room with his huge physical presence. "Oh shit, this can't be good,", Gomo thought.

Gomo's fear quickly vanished. "I just want it recorded in the minutes of this meeting that this was the best presentation on the capital budget that this board has ever received," said Packard, shaking Gomo's hand. Backfire, by Peter Burrows, p 63

Exhibit D: Intel's management style

Intel Senior Vice President Ron Whittier notes that Grove preferred to keep open channels of communication between employees, and encouraged people to speak their minds: "People here aren't afraid to speak up and debate with Andy." They termed this style "constructive confrontation." According to Grove's successor at Intel, Craig Barrett, "It's give and take, and anyone in the company can yell at him. He's not above it." Grove insisted that people be demanding on one another, which fostered an atmosphere of "ruthless intelligence."

Exhibit E: vigorous debate is characteristic of successful companies

Quoting here from page 75 of "Good to Great" by Jim Collins, in a section entitled "Engage in dialog and debate, not coercion". Collins is describing the atmosphere at the US company Nucor, as it transformed itself from a company making nuclear energy products into a company making steel. Ken Iverson was the CEO at the time.

... Iverson dreamed of building a great company, but refused to begin with "the answer" about how to get there. Instead he played the role of a Socratic moderator in a series of raging debates. "We established an ongoing series of general manager meetings, and my role was more as a mediator," commented Iverson. "They were chaos. We would stay there for hours, ironing out the issues, until we came to something ... At times the meetings would get so violent that people almost went across the table at each other.... People yelled. They waved their arms around and pounded on tables. Faces would get red and veins bulged out".

Iverson's assistant tells of a scene repeated over the years, wherein colleagues would march into Iverson's office and yell and scream at each other, but then emerge with a conclusion. Argue and debate, then sell the nuclear business; argue and debate, then focus on steel joists; argue and debate, then begin to manufacture their own steel; argue and debate, then invest in their own mini-mill, and so forth. Nearly all the Nucor executives we spoke with described a climate of debate, wherein the company's strategy "evolved through many agonizing arguments and fights."

Don't shut down the discussion, you'll kill the project

Acute merit leads to long, difficult, tiring discussion. It annoys people and tires them and makes them angry. But it is the essential engine of productive work.

Sunday, February 19, 2012

I came to see, in my time at IBM, that culture isn't just one aspect of the game - it is the game (Louis V Gerstner, "Who says elephants can't dance?" p182)

I started a thread about governance on the numpy mailing list. The thread didn't go very well. It occurred to me that discussion on the numpy list has often been poor, and that this is due to the culture in numpy.

The problem as I see it is that numpy has a weak culture of participation. There is a fairly explicit culture on the numpy list of listening to opinions not on the basis of the argument, but on the basis of the person's perceived or measured importance. In fact, Scott Sinclair codified this in a semi-serious suggestion on the mailing list thread, where importance was to be measured by number of code commits.

I believe this is why discussions on the numpy mailing list are often unsatisfying and disorganized, with people talking past the point and offering opinions without addressing the issues. This follows logically from the fact that opinions are assessed by the importance of the person delivering them. Therefore, there is no need for the opinion to build on the argument that has been put forward, or advance from any point but the initial view of the author. The discussion then becomes a series of impressions, and does nothing but point out that some people's impressions are different from others. We listen to the impression of the person who is most important.

I have sometimes felt that this way of doing business is based on the idea that successful open-source communities are based on meritocracy. I think this is a serious misunderstanding, and I try and justify that in my next post about meritocracy in open source.