The Microfinance Mystery

For the last two years, there has been a mystery about the evidence supporting the past favorable assessments of the scope for reducing poverty using microfinance instruments such as the famous Grameen Bank (GB). The chances for many poor people to benefit from access to this form of credit rest, in part, on solving that mystery.

To understand the mystery we need to go back to an influential paper by Mark Pitt and Shahidur Khandker (PK), published in the 1998 volume of the Journal of Political Economy. PK documented research supported by the World Bank—research that came to provide the most cited scholarly evidence yet to support the view that microcredit helps reduce poverty.

The PK paper was based on their analysis of a sample of about 1800 households in 87 villages in Bangladesh—using the GB eligibility criteria based on landholding to identify the program’s impact. Based on their analysis, PK claimed that providing small amounts of credit, especially to women, could help poor families get out of poverty. They found economically and statistically significant gains in household consumption for women borrowing from GB. It is this finding that led Muhammad Yunus, the founder of the GB, to claim often that 5% of GB borrowers escaped poverty per year.

The mystery appeared in a potentially damaging critique of the PK paper by David Roodman and Jonathan Morduch (RM) in a 2009 Working Paper of the Center for Global Development. Strikingly, RM claimed that PK’s main finding could not be replicated. (RM also raise some concerns about a previous paper by Morduch and a paper by Khandker on the same topic; here I focus solely on their critique of the original PK paper, which gets the bulk of RM’s attention and has clearly been the most influential paper in the literature.) Based on their analysis of the same data set used by PK, RM found that GB borrowing actually made women worse off! However, they distance themselves from this disturbing conclusion, preferring instead to question whether anything could be claimed one way or the other. Thus they conclude that “30 years into the microfinance movement we have little solid evidence that it improves the lives of clients in measurable ways.”

In a posting on his microfinance blog, David Roodman lays down the gauntlet in no uncertain terms: “…I think my paper with Jonathan is the academic equivalent not of a citation but an indictment... It is a long document packed with logic and evidence that the flaws are not merely possible but provable in academic court and important enough to generate wrong results.” Indeed, Roodman goes even further to conclude on the basis of his paper with Morduch that “a lot of research published in prestigious journals is wrong.” This is strong stuff indeed.

But we are still left with the mystery. Either PK or RM must be wrong; they simply cannot get different results with the same model and data. (RM made some changes to the PK dataset, but these matter little.) It would be unwise to make too much of the matter until that mystery is solved.

On March 26, Mark Pitt offered a solution. His new paper refutes RM’s main findings and confirms the main result reported by PK, namely that GB borrowing by women significantly increases household consumption.

Pitt makes two main points. First, he questions the appropriateness of the estimation method used by RM in the real-world situations in which there is a positive minimum to borrowing; his simulations using synthetic data suggest that RM’s method performs poorly in this case. It would seem that RM should have known this about their estimator.

Second, Pitt points to inconsistencies between the control variables used by RM and those used in the original PK study. The control variables are naturally key in any non-experimental study, but the changes made by RM—also not evident to their readers—make a big difference to the results.

Based on Pitt’s findings, it looks like RM did not actually test the PK specification, as they claim. They got different results simply because they estimated a different model, and Pitt questions their model on a number of counts.

It will be interesting to hear how Roodman and Morduch respond to Pitt’s rebuttal of their findings. The ball is now on their side of the academic court.

Comments

Yes it is all very dramatic, isn't it?! Jonathan Morduch and I hope to have a preliminary response available soon. The task is made harder by my need to finish a book manuscript. For now, I can say that we disagree with many specific points in Pitt's response, but we have also learned from him; and that he has helped us improve our replication. And I would emphasize that our central contention was not about whether the coefficients are right but whether the instruments are valid, i.e., whether impact has been shown. Pitt does not discuss that, but we will continue to.

Also, I would like to provide a fuller version of the quote above, because it is less harsh:

I think my paper with Jonathan is the academic equivalent not of a citation but an indictment (not to impugn the professionalism of the authors of the original papers). It is a long document packed with logic and evidence that the flaws are not merely possible but provable in academic court and important enough to generate wrong results.

David,
I am sure your book manuscript is important, but resolving this mystery is surely a matter of high priority—not least at this time in which microfinance is facing a crisis. So we all look forward to your reply. But I hope it will not be about the instruments used by Pitt and Khandekar, which was a much studied topic prior to your paper with Morduch. (See my discussion of PK’s identification strategy in my review of evaluation methods in the Handbook of Development Economics- http://bit.ly/g5hS4O.) The headline issue of your paper—the issue that grabbed much attention, especially amongst critics of Grameen Bank—was your claim that PK’s data did not in fact indicate any consumption gains to GB borrowers—negative gains in fact. This directly refutes PK. Yet, Pitt has clearly gone deeply into the issue, and come back with a proposed solution to the mystery, indicating that the main mistakes are in your paper, and that the original PK findings are robust. That is the issue at hand.
Martin

The technical issues and academic back and forth are fascinating, and I look forward to Roodman's reply.
But I would also love to hear some discussion on how this debate continues to inform our understanding of microfinance, particularly now that results from randomized evaluations are becoming available. Whoever is more technically correct, I am skeptical of any work where the qualitative results are so clearly dependent on the technical assumptions involved.
Thank you.

I'm baffled by Martin's suggestion that the most important question here is whether the changes associated with borrowing are positive or negative rather than whether the changes are causal.
How can it possibly matter what the effect is if there is no reason to believe in the proposed causal basis of the effect?
If there is to be reflection and discussion of the impact of microcredit than that conversation should be held primarily around the Banerjee, Duflo, Glennerster et. al. Spandana study, the Banerjee, Duflo Al-Amana study and the Karlan, Zinman Philippines study which are much more likely to provide reliable evidence of causality, and not just around the PK vs RM debate.

You surely can’t contend that only randomized trials can be believed? That would seem to be more a claim of faith than science. Yes, randomization can sometimes be a useful tool. But it can hardly claim to be the only rigorous tool for evaluation. Observational studies will continue to have an important role, especially in the (many) situations in which randomization is not feasible, or it generates results of doubtful (internal or external) validity. This is a well-rehearsed issue; see my paper, “Should the Randomistas Rule?” in the Feb 2009 issue of the Economists’ Voice. Here is the link: http://bit.ly/dGLX9S.
Whatever identification strategy is employed, the claims made should follow from the data and assumptions. In this case, two prominent studies, ostensibly using the same data and methods, have come to radically different conclusions on an important issue. That is the mystery. It turns out that one of the studies got it wrong, as explained in Roodman's post cited above. Of course, we all make mistakes. But I only wish the authors had been a bit more humble about their findings, and checked more carefully before going public; it would not seem to have been too hard for them to have seen their errors. That is a lesson for us all, including the randomistas.

Indeed I'm not claiming that only randomized trials can be believed. But where they are possible and have been done, they should play a large role in the discussion of causality. It no longer makes sense to discuss PK and RM without reference to a large body of evidence that has come along since.
I find the vast majority of the arguments against the use of RCTs to be attacking strawman positions that randomistas don't make, or to be equally applicable to all forms of evidence.
But that's a side point. The real question here is on what we should be focusing on in this particular debate.
To paraphrase from your comments, Martin, I would put it this way:
I only wish the authors (Pitt and Khandker) had been more humble about their findings of causality and made their data and programming public when they announced their findings. It would not seem to have been too hard for them to have seen the errors in their causal claims and prevented misunderstandings and misinterpretations. That is a lesson for us all, including randomistas and observationists.

Sorry if you wanted me to write about something else, but the issue I was raising in my post above was NOT identification. That is not to say that I think identification is unimportant; far from it! And I did at least refer readers to other things I have written on the identifying assumptions in studies of microfinance, including the seminal study by PK.
But since you are so keen to make identification the issue, let me emphasize that I am not arguing against the use of RCTs per se (and I do use them myself at times). I do, however, question whether RCTs deserve the status that their recent followers in development economics have given them, particularly when we face so many large knowledge gaps that cannot be addressed with RCTs. Important and interesting questions should drive our development research, not methodological preferences.

Martin, what I want you to write about is why you are making such a big issue out of the least important part of this whole conversation.
1) Yes its important to get the math right and it is good to find where the math errors were, wherever they were. But I don't understand how you can find more fault with RM after their valiant efforts to recreate what PK did, apparently without any support from PK despite numerous requests. While recognizing that there were errors in RM, why is there not equal blame to PK for not making their math and data available which would have avoided this whole situation in the first place.
2) As I understand RMs underlying point on the math, the calculations involved in work like this is so complex there is significant chance that someone's got it wrong. I think the fact that it took RM so long and then PK so long again to sort through all this only makes that point. If the errors were easy to avoid, why did it take so long to find them, especially given all the attention you claim was being given to their paper?
3) Finally, why is getting the math right on a calculation that does not do what it claims--establish causality--apparently more important than the original claim? In other words, why are you so interested in pressing this particular point? It seems to me the equivalent of US budget hawks complaining about the foreign aid budget.

Timothy,
I think you exaggerate the degree of nonsupport from PK. Their JPE paper is technically very clear, and I understand they did try to help RM. David claims that they were not as open about their data and code as they should have been. Maybe PK can respond on that point, as I don’t think either you or I know the details.
It seems to me that one of the two points made by Pitt could have been easily verified by RM, namely the missing control variable. The other point was more subtle, and required skill in advanced econometrics to see. That took time to figure out. Some of what we do in economic research is unavoidably complicated. But this was not RM’s critique; indeed, they claimed they had used essentially the same (complicated) econometric method. As it turns out, their method does not handle properly the inherent censoring, as Pitt explains.
However, as I just pointed out in a response on David’s blog, we must surely have more empathy for PK here than your comment suggests. For two years now, the RM paper has been circulating, with the authors drawing ample attention to it (ranging from a JEL paper by Morduch to testimony before the US Congress by Roodman), saying that PK’s main findings could not be replicated–that PK got it wrong. In the last week we have learnt that the key mistakes were in RM’s replication effort. This is not harmless stuff. Surely we can agree that this should not have happened this way.
Martin

Martin,
certainly we agree that this should not have happened this way. What I don't think should be in any doubt is that Pitt and Khandker were not as helpful as they could have been. RM have shown just how easy it is to "show your work."
Assessing the difference between the openess of RM and of PK is not a matter of subjective judgment. It's a very objective question.
What I hope for from this conversation (not ours, but the larger conversation) is that what is ultimately discredited is bit PK or RM, but an approach to development research that is anything other than "open data."
Tim

Ack! Brilliant.
Why did Pitt and Khandker refuse to publish their code and obstruct replication efforts? Clearly because they were worried others would be able to prove they made a mistake. Clearly, then, who is responsible for any inability to replicate PK? Pitt and Khandker. When did they finally publish their code? After two years scrutinizing RM's code and work, all of which RM published as a way to invite validation and testing. How do they respond?
"Yeah, we'll see who ends up discredited," writes Khandker, finally making the taunt he's been waiting 2 years to make in the comments section of an old blog post of Roodman's (http://goo.gl/1u399).
Then they enlist the funder of their research in a blatant effort to discredit RM, all of which has been cleverly made possible through a combination of their own obstruction and RM's transparency.
There are definitely some key lessons to be learned here, but I'm not really sure what they'll turn out to be. The "academic court" needs to realize very clearly who should be walking away with egg on their face after all this.

The claim made in this post - Pitt and Khandker resisted the replication efforts because they were worried others would find out their mistakes - is simply absurd as it has been proved by now that it is Roodman and Morduch, not Pitt and Khandker, who made mistakes. What is ridiculous - this claim is made after Roodman already admitted his mistakes. Also, Roodman himself has to take the blame for his inability to replicate PK, because once the mistakes in Roodmane's code are corrected, RM results match PK results extremely well (translation, replicate well), as shown by Roodman himself.
A mistake is a mistake, period. A delay of two years in exposing it does not make it anything else. Yes, PK should take some responsibility for not responding earlier, as the erroneous findings of RM have been widely circulated with far-reaching implications during the last two years.
Finally, Roodman deserves credit for admitting his mistakes in his first reaction. And it would be more creditable if he, in the light of the revelation of his mistakes, mentions during various dissemination opportinities that his replication exercise in fact substantiates PK's results despite their disagreement about identification.

One lesson I have learnt from this debate is about the art of deflection. RM first tell us that PK got it wrong and announce this widely and prominently. Then we learn that it was in fact RM who goofed up and that PK’s results still hold. But Roodman and supporters turn it all back on PK—blaming them for RM’s mistake! And when a third party such as myself points out that PK’s reputation has been salvaged, I am deemed to have been “enlisted” by PK “in a blatant effort to discredit RM.” Remarkable!

I appreciate the insightful discussion about who is right, and some more cheers for use of RCTs for all types of evaluations - good stuff for students in econometrics class.
But, aren't we missing the big issue here?
As development economists, don't we have the responsibility to be careful in producing results that may affect millions of lives?
What were the direct and indirect implications of PK or RM's papers? Imagine a situation where most donors and policy makers really believed in these results and put them to use? It would have been a flip flop of program / policy on women focused issues. The losers would be the poor women in developing countries - not PK or RM - who have their jobs and fame.

Abstract: After Pitt (2011) pointed out the flaws in the RM replication effort, Roodman subsequently notes “that when we fix our regressions, they continue to fail tests of the assumptions needed to infer causality. So improving the match to the original greatly strengthens our conclusion that this study does not convincingly demonstrate an impact of microcredit on poverty.” This claim is based on RM's tests of overidentifying restrictions, which the current response demonstrates are fundamentally flawed. New results presented below provide strong support to the hypothesis that microfinance causally improves the lives of participant.
Link- http://www.pstc.brown.edu/~mp/papers/Overidentification.pdf

Microfinance is micro finance---it is finance writ small. If finance works, microfinance works.
Surely the only meaningful question is the actual magnitude of the gains from MF.
Much of the statistical jumble arises from the failure to recognise that the data comes from MFI's---who give out significant amounts of non-productive loans---and from borrowers---who many times use money in ways they were not supposed to.
In completing the first study of MF in Urban Bangladesh, I had many specific MF type enterprises interviewed. The 'profit' rates were never below 100%.The rates fall sharply as the capital increases.
As long as small loans have these spiked productivities, MF must be beneficial.
btw, Jonathan Swift got MF to work in Ireland in the 1720's and Dugald Stewart wondered about the high productivity of small enterprise in the 1790's. Neither the success of MF, nor the productivity pattern that sustains it, is new.

Most authors (no matter their position in the debate) seem to rely on a common assumption, namely that there actually is a methodological approach that allows a rigorous assessment of the types of causal questions we are interested in. In other words, the starting point adopted by most scholars and practitioners consists in claiming that there is a way to isolate the influence of a particular causal factor and to measure its consequent effects.
The striking point here is the apparent absence of any discussion over how warranted the latter assumption might be when dealing with policy questions requiring the analysis of complex systems. When detailed and correctly exposed, arguments related to causal complexity are generally well accepted (e.g. impossibility of isolating a particular factor acting as part of a much wider set of interacting influences -virtually most cases in social sciences). However, this supposedly collective wisdom seems to vanish as we step into individual argumentations.
My point is quite simple: by focusing on which particular methodology should be favoured under a given set of conditions, we totally ignore and discard the possibility that the causal question being assessed might in fine be unaddressable. Shifting the focus from methodologies to policy questions appears to allow solving redundant concerns, such as the one pointed out above.
Why do we observe opposite results when employing different methodologies? Is it that one particular methodology is less appropriate than the other in capturing reality? Or is it that the several layers of uncertainty underlying both methodological approaches result in the generation of large approximations? Moreover, many interventions entail only small changes (particularly in microfinance), making approximations even more hazardous.
In brief, while the methodological debate is a healthy one, wider attention should be given to the types of causal questions we can actually address with a limited range of uncertainty. Anyone working with causal inference instruments in the study of social phenomena knows how much uncertainty lies behind any causal assessment, and knows how many questionable (and often unwarranted) assumptions are generally required to support the most basic inferential conclusion. Despite this awareness, the debate remains focused on the study of ‘methodologies’ rather than turning to the study of ‘policy questions’ and their underlying causal structure.

Martin, I do believe our overhauled replication of PK resolves any remaining mysteries. The study has many problems. The most serious appears to be instrument weakness caused by trying to separately instrument male and female borrowing. Combined with undiagnosed outliers, this makes the estimator radically bimodal and unstable. See
http://blogs.cgdev.org/open_book/2011/12/bimodality-in-the-wild-latest-on-pitt-khandker.php
Pitt's corrections to our initial attempt at replication allowed us this deeper insight. This instability most likely helps explain why leaving out one control variable led us to such different results in the initial attempt. The process of convergence to the truth has been awkward. But human beings are not perfect, and our openness in sharing data and code, created a feedback loop that helped us overcome our limitations.
--David

Mark Pitt and Shahid Khandker have written a detailed and convincing reply to the concerns of David Roodman and Jonathan Morduch. This is now posted on the Bank's working paper site. In their new paper Pitt and Khandker argue that the methods in the Roodman-Morduch paper are biased against finding impact. Amongst their new findings, they show that an alternative method of calculating the standard errors suggests that the original Pitt and Khandker paper could well have underestimated the impact of women's credit on household consumption. You can find the new Pitt-Khandker paper here.