Fixing the Fraud: Special Announcement

Exciting news! Next month, at SpotOn London, fellow blogger and scientist Suzi Gage and I are co-ordinating a discussion session on academic misconduct. There's been tons of stuff in the news about this recently, and we feel that now is the time to actually start doing something about it. Over the next month, Counterbalanced and Sifting the Evidence will be hosting guest posts from some of our session panellists, as well as insight, analysis and links from around the web on dodgy deeds that have been going on in science, plus how we might go about fixing the problems. We'd like to make our session at SpotOn as open and accessible as possible, so feel free to leave comments, suggestions, or questions for the us or the panellists. You can follow us on twitter @DrPeteEtchells and @Soozaphone, and use the hashtag #solo12fraud.

8 Responses to “Fixing the Fraud: Special Announcement”

Question for Ginny Barbor - why does COPE not insist that the journals who subscribe to the guidelines, actually do the right thing when the proverbial doodoo hits the rotational coolng device? COPE has no teeth. What use is a set of guidelines when journals can blatantly ignore them without penalties. I can provide numerous examples of this if asked.

I saw your article, I'm commenting here because I don't want to sign up for a guardian account.

This is more a casual musing than an actual suggestion, as I don't know enough about the inner workings of academia, but here are my thoughts for what little they're probably worth:

Closed access journals ruin everything.

Aside from charging anyone who isn’t affiliated with a subscribed institution for knowledge, they creates this illusion of quality assured by peer review. But peer review is no substitute for replication.

I think the best model would be an open access journal that only allows publication of studies that have been independently replicated.

This alleviates pressure to publish novel results by grouping novelty with replication such that both progenitor and replicator get publishing credit, eliminates the need for pricey peer review (which provides journals an excuse to charge for access to findings that might not even be accurate), and most importantly, makes replication —a much more reliable sort of peer review— the most important publishing criterion.

Because of your interesting initiative I mail to you my paper as below, written earlier on the relation between science and ethics. I notice that all footnotes get lost this way, I will try to find a way to send you the paper by e-mail. Greetings, Michel van Hulten.

1. Introduction Allow me to begin with three examples of a lack of scientific integrity in governance in which researchers played a role.

1. In 1999, at the University of Amsterdam, promovendus Michiel Roscam Abbing presented his dissertation thesis regarding research related to the Dutch Government’s proposal to build a railway line, to be known as the ‘Betuwelijn’, beginning at the harbour of Rotterdam and crossing the country to the German border, from where it would link in particular to the German industrial Ruhr area. According to Roscam Abbing, this decision was based on an analysis prepared by ‘researchers who knew well what the Government wanted to read’. His concluding sentence was that ‘their daily honoraria have to be qualified as hush-money (silence-money)’. His promoter asked, in the same academic session, whether he thought that this would enhance his likelihood of obtaining a function as a political scientist.

2. The ESB (a professional weekly publication for Dutch economists) on 6 November 1998 published a ‘reconstruction’ of growth figures over the years for Schiphol Airport at Amsterdam in order to try answering the question as to whether the prognoses had been realistic or not? The major conclusion was that the growth prognoses for Schiphol had been strongly influenced by political and societal desires.

3. The NRC Handelsblad, an influential daily newspaper in the Netherlands, on 5 November 1998 reported, on the basis of information obtained from one of the highest ranking officers of Rijkswaterstaat (the national authority in charge of dyke building and sea defences), that constructing the dam in the Oosterschelde had been a mistake. This dam was designed to protect the south-west of the Netherlands against flooding (earlier, in 1953, nearly 2000 residents had died following a failure in the sea defences). The newspaper report showed that expenditure on the new Oosterschelde defences (around €4 billion) was eight times higher than would have been required with an alternative solution. Cheaper solutions had been presented at the time but by scientists opposing the Government’s plans which placed more emphasis on the political requirements of the time.

2. Error or intention? Research is not always pure science. It is not always only the use of objective facts in sound analyses to produce logical conclusions; its use can be influenced by emotions and by powers related more to money than to knowledge, and serve other interests than fact-finding and searching for the best possible solutions to a problem. This can simply be an error, and unintentional, which may be pardoned. However, one can also knowingly ignore the truth, intentionally serving other purposes. Scientists should not be involved in such unethical activities.

3. Disqualifying the CPI Against this more general background, I summarise my observations regarding the annual Corruption Perceptions Index (CPI) , as used by Transparency International (TI) , in producing a yearly ranking since 1995 of the relative corruption of over 150 countries. The scale stretches from near-zero corruption, according to the perceptions of respondents, to the other extreme where nearly all transactions are sensed as corrupt.

My conclusion, based on an analysis of the CPI results published between 1995 and 2005 , is that the research underpinning the yearly figures does not meet scientific standards, and does untold harm to countries that are unfairly classified as more corrupt than others. A growing number of researchers agree that using the CPI to qualify and quantify corruption is not of sufficient rigour to justify the results that are published and the subsequent actions. This would not be such a problem if there existed a sincere wish to improve work in the field of defining and measuring corruption.

4. Unethical behaviour What is more alarming is that the makers of the CPI, Professor Lambsdorff and his collaborators, and TI, the leading non-governmental agency fighting worldwide corruption, that publicises the results and uses them for advocacy purposes, do not intend to improve their approach, even though they are aware of these shortcomings. I also see this as unethical behaviour.

5. What follows? In this paper, I summarise my criticisms of the CPI , the caveats as given by Lambsdorff and TI, and the use made of the CPI by governments and enterprises.

6. The Corruption Perceptions Indices Around 1995, the volume of corruption, as estimated by the World Bank, stood at US$50 billion. In 2000, this estimation had risen to US$400 billion, and in 2005 it reached the US$1 trillion mark. It is somewhat remarkable that this growth in the estimated total of corrupt payments has seen little change in the ranking of most countries in the TI Corruption Perceptions Indices published since 1995. This implies that the growth in observed and/or registered corruption is proportionally spread over all countries: everywhere the same improvements in observation and registration, or the same increase in corruption. This seems highly unlikely!

7. Definition of corruption The CPI is a composite index, a poll of polls if you like, drawing on corruption-related data from expert and business surveys carried out by a variety of ‘independent and reputable’ institutions. Here the problem is that these underlying polls and expressed perceptions do not share a single definition of corruption. Indeed, it is rare to find a poll with a clearly defined concept of ‘corruption’. A great range of aspects feature in various polls: - improper practices (such as bribery or corruption) in the public sphere, - the level of corruption, - spread and amount of corruption in public and private business, - estimated losses caused by corruption, - likelihood of social and illegal payments being demanded by high and low levels of government, - extent of misuse of public power for private benefits.

8. TI’s definition TI explains in its ‘frequently asked questions’ document attached to the CPI-2008 documentation how, for the purposes of the CPI, corruption is defined. “The CPI focuses on corruption in the public sector and defines corruption as the abuse of public office for private gain. The surveys used in compiling the CPI ask questions relating to the misuse of public power for private benefit. These include for example: bribery of public officials, kickbacks in public procurement, embezzlement of public funds or questions that probe the strength and effectiveness of anti-corruption efforts, thereby encompassing both the administrative and political aspects of corruption.” It cannot be put more clearly, and the conclusion cannot be otherwise than that a great variety of concepts are crudely lumped together, like a basketful of very different fruits, all being labelled as apples.

9. TI’s caveats Later in the ‘frequently asked questions’ document, TI admits this failure to use a single definition of corruption with a description of the example of Somalia, ranked last in 2008: “Example: What is implied by Somalia’s ranking in the CPI 2008? Corruption in Somalia has been perceived to be the highest in the CPI 2008. This does not, however, indicate that Somalia is the ‘world’s most corrupt country’ or that Somalians are the ‘most corrupt people’. While corruption is indeed one of the most formidable challenges to good governance, development and poverty reduction in Somalia, the vast majority of the people are victims of corruption. Corruption by powerful individuals, and failure of leaders and institutions to control or prevent corruption, does not imply that a country or its people are most corrupt.”

10. Why then the CPI? Why, given these admitted reservations, does the annual CPI research exercise result in a published list suggesting that there is a least corrupt and a most corrupt country, and that these have been identified? Does what is admitted about Somalia above also mean that countries placed in positions 1 to 10 are not necessarily less corrupt than the bottom ten on the list?

11. Number of surveys The data used in the CPI have, over the years, come from 24 independent institutions, while the composite indices have made use of 95 different polls and surveys. TI argues that this large number improves the overall result. However, increasing the number of surveys does not automatically increase the validity of the outcome: adding bad data to good data only confuses.

12. Selection of surveys More worryingly, the underlying surveys and polls appear to have been chosen haphazardly. For Africa, the most interesting illustration is the use made of the Africa Competitiveness Reports of 1998, 2000 and 2004 produced by the World Economic Forum (WEF). The 2000 report was used in compiling the CPI for 2000, 2001 and 2002, but not for 2003 as the methodology adopted by Lambsdorff and TI does not allow a survey to be used for more than three years. However, when a new WEF report on Africa was published in 2004, it was not used in compiling later indices. Why one may ask was this source ignored?

13. Reliability of ‘data’ If scores and rankings were based on hard facts, it would not be so crucial what sources were used. However, since the polls and assessments used here are based on perceptions, selecting sources becomes very important. Ideally, one would consistently use the same sources, especially if one is going to draw conclusions based on ‘changes’ from one year to the next. Galtung illustrates this by referring to TI’s 2003 CPI press release that states: “Norway, France and Germany improved their scores in recent years.”

However, as Galtung observes, “a significant percentage of the upward trend of the scores for these and other countries would be accounted for by the exclusion of PRS – Political Risk Services, used from 1996 till 2000, from the sources.”

In 2008, TI alludes to a consequence of this in the methodology document accompanying the 2008 CPI: “This year […..] the data by the United Nations Economic Commission for Africa dropped out of the index. Their data used in 2007 were dated and their new report is not yet available. With this source dropping out, but also with some sources changing and expanding their coverage of countries, some countries were affected by slight changes in the composition of sources.”

14. The respondents All surveys rely on people being willing to express their perceptions. In many countries, this can be a problem as those involved in corrupt dealings have good reasons, and often the power, to keep their corrupt activities secret. Respondents willing to cooperate in the various surveys as such constitute biased samples as they come from a subset made up of those not afraid of the dangers of disclosing their true perceptions, and those who lie for reasons of fear or potential gain. The samples surveyed often share other characteristics. It is not that all respondents are white, or all from prosperous northern countries, or all male. Rather it seems that they are all working at senior and managerial levels in their companies, and that they tend to be well-educated and fairly well-paid experts, expatriates and seniors in their chosen professions. I suspect they are nearly all men, between 25 and 50 years of age, and most with an MBA or similar background. Very few women, consumers, elderly, young, sick, poor or powerless. No trade unionists, no investigative journalists.

Statistically, the respondents are not representative of the population of the countries being assessed, and for which the extent of corruption is then quantified.

15. So what is being reported? The findings as presented in the yearly CPI publications are admittedly accompanied by many caveats and great prudence. Nevertheless, it is difficult to know exactly what is being reported. In the first place, we see that the CPI for 2008 expressly mentions that data have been used that were obtained for 2007 and for 2008. This means that if a country’s score and rank changes between the CPI 2007 and 2008, it is uncertain whether this is genuine since at least some data contribute to both scores. Politicians tend to read such results in a very narrow way, immediately claiming victory or defeat in their fight against corruption. (See Paragraph 18, later in this paper, for a sample view by the Dutch Minister of the Interior in 2007).

16. Methodological Validity – or rather lack thereof Secondly, in an attachment to the CPI 2008 , we read in the full-length text on ‘methodology’: “Unbiased, hard data are difficult to obtain and usually raise problematic questions with respect to validity. [.....] International surveys on perceptions therefore serve as the most credible means of compiling a ranking of nations. The goal of the CPI is to provide data on extensive perceptions of corruption within countries.”

However, if we compare this to the shorter explanation of the ‘Methodology 2008’ which reads: “All sources measure the overall extent of corruption (frequency and/or size of bribes) in the public and political sectors” we get a different impression. In particular, the terms ‘perceptions’ in the longer version and ‘measure overall extent’ in the shorter convey different messages. The shorter explanation at the very least suggests that hard data are being used.

17. Confidence range Thirdly, small shifts in the ranking of a country may have no real meaning once one factors in the published ‘confidence ranges’ of the final scores. Once one takes this ‘uncertainty’ into account, the relative ranking of countries with similar scores is statistically unjustifiable, and probably has no meaning at all (see also Paragraph 18 below).

The longer version of the ‘Methodology’ reads: “A ranking of countries may easily be misunderstood as measuring the performance of a country with absolute precision. This is certainly not true. Since the first CPI was produced in 1995, TI has provided data on the standard deviation and the number of sources contributing to the index.”

Several high and mighty people in the anti-corruption hierarchies quote the CPI findings in other publications without caveats and prudence, and present the rankings as ‘the truth and nothing but the truth’. It is little wonder that the media follow this, as this is much easier for them to present, and for their audience to (mis)understand.

The political acceptability of the yearly CPI results is remarkable given that the same politicians recognise the lack of scientific validity in the methodology leading to these results. Maybe this is because their own, relatively affluent, countries appear near the top of the list!

18. Dutch self-satisfaction The Dutch Minister of the Interior, Mrs Guusje ter Horst, on the occasion of the Integrity conference held at the Vrije Universiteit in Amsterdam on 26 April 2007, noted proudly in her official address that the Netherlands had climbed from 11th to 9th in TI’s Corruption Perceptions Index. (The next year she would have been even more delighted as the Netherlands reached Number 7.)

And why not? For a note of caution we again turn to the TI ‘methodology document’ for CPI 2008 which is very explicit on making year-to-year comparisons: “Comparisons to the results from previous years should be based on a country’s score, not its rank. A country’s rank can change simply because new countries enter the index and others drop out.”

Nevertheless, if we look at the scores for these three years we also see an upward trend. The scores for the Netherlands for the years 2005, 2006 and 2007 were 8.6, 8.7 and 9.0 respectively, suggesting at a second glance that things are indeed moving in a good direction. This might even suggest that TI’s caveat on interpreting ranking is unnecessary. However, if we look at the ‘confidence ranges’ for the same three years we see the scores presented as: 8.3-8.9, 8.3-9.0 and 8.8-9.2. As these overlap substantially, there seems to be little solid reason for pride or rejoicing – indeed the ‘true’ score for the Netherlands may have slipped from 8.9 to 8.8 over the three-year period. The mood deepens if we add the data for 2008, when the score fell to 8.9 with a confidence range of 8.5-9.1, although the Netherlands did retain 7th place in the ranking.

Essentially, the scores suggest nothing has really changed in the Netherlands despite the significant climb up the rankings.

19. Ministerial disinterest I wrote and addressed this information to the secretariat of the minister to no avail. The response was that they are aware of this and that they would forward my information to those concerned in the Ministry!

20. Ministerial complacency Later, on 5 November 2007, the Minister of Foreign Affairs was asked in a written question from Parliament whether he was familiar with my criticism of the CPI, that the results reflect perceptions and not real facts. He was also asked whether he made use of the results as portrayed in the CPI.

In his answer , the Minister writes that my ‘critique of the CPI is known and methodologically correct’ (!). He confirms that ‘the CPI deals with perceptions of Western enterprises and that this is also recognised by TI’. ‘Nevertheless’, the Minister continues, ‘the CPI influences the investment climate and through this the economic growth’. That is, the CPI mirrors how companies see corruption in the countries portrayed.

In terms of using the CPI, the Minister mentioned some reservations. To summarise, perceptions change slowly and, therefore, the international donor community should not attach immediate consequences to the findings. A bad score may mean no more than that the international business community has to work on image-building. Governments that are acting against corruption, but failing to see visible results in terms of the CPI index, should not give up but continue their fight since perceptions will change over time.

Anyone reading his answers will question the quality of the information delivered to Parliament. The more so when we see what the consequences are: it is all about image-building.

21. Ministerial satisfaction My third recent experience is with the Dutch Ministry of Justice. One of the directors answered my letter, dated 30 January 2009, about the insufficiencies and erroneous nature of the CPI, and in his response recognised the limitations of the CPI. The letter however continued that, nevertheless, the CPI gives an insight into how companies see corruption in the Netherlands (!), and that this is important because of the influence of this perception on the investment climate. He also refers to evaluations of Dutch corruption policies, by OESO, GRECO and others, as providing a sufficient basis for policy development on corruption in the Netherlands.

The only conclusion we can draw is that there is no governmental support for research efforts that try to improve the qualification and quantification of corruption.

22. Conclusions All three ministerial respondents seem satisfied with the CPI as it is. Naturally, the opinions presented by these public officials of various political backgrounds have been formulated, or at least influenced, by their staff. Supposedly, such people are academically trained and aware of what is required in scientific research to achieve quality results.

When the Dutch Minister for Development Cooperation decided, in 1998, to stop Dutch development assistance to Cameroon, this was not a purely impulsive reaction. The CPI had just listed that country as ‘the most corrupt’. For the minister, it was politically prudent to cease giving such aid. Such a conclusion would have been developed by ‘scientists’ in her back-room staff. Should they not have looked more closely into the methodology behind the CPI and, having studied the documentation from TI in Berlin, given other advice? Cameroon again came bottom in 1999, leaving an indelible image of a country rife with corruption. More recent CPIs see Cameroon rapidly climbing the index such that more than one-third of other nations are apparently now ‘more’ corrupt. However, given my earlier comments, who really knows?

If it is true that this ingrained image influences the investment climate in the country, then Cameroon will have paid a heavy price. Maybe it is because the CPI is generally accepted around the world that scientists seem afraid to attack its validity. Why raise your head above the parapet if your political masters believe in this tool? However, is it not the foremost duty of scientists not to accept results that do not stand up to scientific scrutiny?

Thank you very much for bringing up these very important questions and organizing a session during a conference to discuss them. I think that we cannot only blame the rules of the “reward system” for the increasing scientific misconduct, it certainly contributes but the problem has deeper roots in the materialistic and productivistic culture of modern society, of which these rules are the product. If a scientist is exclusively judged for his scientific achievements, without considering his human value, the deeper reasons of his motivation, his management skills in leading a research team, his involvement in society, he will very easily attribute his value only to his position and do anything in order to achieve it or keep it. During my four years of post-doc I have seen all kinds of abuses and manipulations, not only of data, but also of people. The two cannot be kept separate, if you don’t respect reality as it is, and you force a result, how can you respect a human being who is working on the result? Science and people simply become tools to keep a power position. For this reason I think that it is not enough to change the rules of the scientific editorial system, we need a more radical change in the way we select the PIs and the way in which we structure the research work. In these years I have seen people work on plenty of results that will never be published, a huge waste of money, time, and human resources. Every student and post-doc is pushed to become independent and do his own experiments, no one wants to waste his precious and grant limited time into helping someone else’s project, especially because this would imply to lose the important first author position in the article. The result is that everyone works for himself, spends his nights, days and week ends in the lab, sometimes with successful results, sometimes not. Since he has worked alone on the project most probably once his two-year fellowship finishes nobody will continue his work, because nobody knows exactly what he did, and in the lucky case that someone is taking over the project most probably he won’t be able to reproduce the result, since this individualistic science discourages from writing detailed lab journals and protocols, and makes it very difficult to reproduce results, not only between different labs, but even inside the same lab. I have seen results published on which someone has worked for more than one year without being able to reproduce the result inside the same lab; yet no one tried to understand what was the difference between the two persons performing the experiments and the results achieved by the first were kept because more convenient. Not to mention how easily the second person was excluded from the article, in spite of his work, and how difficult it is to protest against this if you don’t want to receive bad references from the PI in your future employments. Unfortunately even in the lucky case where someone takes over your project and manages to reproduce your results, still there will be injustice. Reviewers and editors ask for more and more completing experiments, sometimes to write almost a second article in the supplemental data. The publication procedure is so long that it cannot be covered during the short period of a post-doc fellowship, and sometimes not even during a Ph.D. Therefore one very easily sees the first position of an article slide into an unimportant position because the person who takes over the project will have to complete with so many experiments that he will end up to take the first position, also because he would never agree to take over the project if he was not properly recognized for it. How can we expect collaboration between different labs to be productive when we cannot even achieve it within a single lab? How can we expect to achieve solid reproducible data when we don’t even let verify the data by different persons inside the same lab? The first thing we need to change is the management of research within the academic labs! Some suggestions:

1) Collaboration and team spirit Why is only the first author position considered important? I suggest that we start to give an official value to the second author position. This would help to change the structure of how research is performed, and oblige people not only to work on their own project, but to become involved in someone else’s project. It would also help to detach from the self-centered vision of science, to focus more on the relevance and quality of the results achieved, and it would encourage collaboration and exchange between team members, documentation of the results and protocols, reproducibility. Using the shared first authorship does not really incite this. It is important to encourage people to do a work where not their ego, but their goal is in the centre. Of course one should have first author articles as well, but with this system it would not be enough, you would have a better record with for example one first author and one second author article than with two first author articles.

2) Reproducibility Create national and European institutions/agencies, or include them within the existing national research grant agencies, with the task to identify relevant scientific results that need to be reproduced and to accomplish them. There would be committees of scientists who come from the academic world but decide to continue their career in these institutions; once identified important questions they would either give grants to the labs that have the skills and expertise to reproduce the experiments on a protocol decided by the institution (not by the lab), and maybe also a reward grant for their own research to reward the fact that they are doing an extra work because they are being asked. An additional possibility is to have laboratories in the institution/agency for the kind of studies that don’t require a very specialized expertise and could be performed without an academic collaborator.

3) Selection criteria for the PIs The PIs are not only the ones who direct the research, and therefore determine the choice of the subjects and the quality of the science being performed, but they are also the ones who educate the future generation of scientists and PIs. Therefore the criteria used to choose the PIs are very important. What I have seen in my experience in France (but not in Sweden, where the situation is much better) is that the only criterion used is numbers: publications, impact factor, grants. No attention at all to human qualities: for example scientists who decide to do 50% teaching are considered inferior because they have less time for research. No one takes into account that the desire to transmit knowledge is a sign that someone is open to other human beings and to share, instead of keeping what he is doing just for himself. If someone moves out from academics to industry or other institutions, he will very hardly be welcome back to the academics. This is also a prejudice: learning to work in a different environment using different criteria can only enrich your experience and vision of science and open it to possible applications and to its role in society. Furthermore this would also help a PI direct his employees to other careers than the academics: many post-docs who don’t manage to publish are quite disoriented at the end of their grant, because no one has ever shown them other possible careers. No one takes into account the management skills of a PI, the satisfaction of his team members, the career of the team members once they leave the lab: if for example, a PI has many high impact publications, but many people come out from his lab without publishing and achieving positions there is something wrong. Of course not everyone is meant to do science, but there should be some proportionality between the success of the boss and the success of the employees, otherwise it means that he is exploiting the people without recognizing their contribution. The ability to deal with conflicts within the team and encourage team spirit and collaboration is also very important, but I don’t really have ideas on how to select it. Maybe one could at least encourage it by introducing a compulsory course in leadership for all PIs. I know that Karolinska Institutet in Stockholm has such a leadership program for its PIs.

These are just a few examples of problems that I have met in science, and just a few suggestions of solutions. I hope that you will find something of interest that you will bring up during the session on academic misconduct at SpotON London.

I used to conduct cognitive and social psychology research projects but after experiencing many of the trappings of the publication process first-hand I became disenchanted and lost much of my interest in advancing into academia. Anyhow, this one idea for altering the publication process has been on my mind since then and I hope you'll be able to bring it up at the SpotON session.

In essence - Why don't journals agree to publish some papers based on theoretical strength and experimental design alone, BEFORE they see the results of the research?

In this way, if an investigator can secure a publication agreement before they carry out their experimental trials, all incentive to falsify data, or torture the data until it speaks, manipulating it for any significant results, will be eliminated.

This method would also mitigate damage caused to all labs by the file drawer problem. In my experience, many different labs would study the same literature, acknowledge the same theories, and then design similar studies to test similar hypotheses. If the results of these studies turn out to be non-significant, it's likely they'll be deemed uninteresting by the editors and will never make it into a journal. Ideally, a journal would accept the strong theoretical backing of studies like these, agree to publish it before the trials are conducted, then publish the non-significant results once the data is collected. Instead of so many labs wasting their time repeating this same experiment, failing to have it published, and stashing it in file drawers, they'll be able to read that it has already be attempted in a journal.

Obviously, not all publishing should be conducted on this basis because surprising, serendipitous results are one of the most vital and inspirational forces in science. For starters, however, I would like to see journals include a section of their publications devoted to pre-results approved publications. They could even publish the theoretical backings, hypotheses and designs of those pre-results studies, then publish a follow-up with the data a few months later once the trials have been conducted and all the subscribers could follow along and have their curiosity satisfied.