Tuesday, June 18, 2013

There is a really interesting article at Slate.Com from Mary Ann Mason, the author of "Do Babies Matter" which I have written about here before. The post is titled "In the Ivory Tower, Men Only". The post tells some of the background behind the book and discusses issues about graduate school, post doctoral positions, applying for faculty jobs and more. The article also has some very good guidance for universities that would like to level the playing field:

We all know what structural changes would help to level the playing field in all of these careers and they are quite similar: paid family leave for both mothers and fathers, especially for childbirth, a flexible workplace, a flexible career track, a re-entry policy, pay equity reviews, child care assistance, dual career assistance. Those universities and corporations who have actively created these policies have found an advantage in recruitment and retention. For instance, at Berkeley, after enacting several new policies to benefit parents, including paid teaching leaves for fathers, job satisfaction scored much higher among parents, and more babies are being born to assistant professors.

Some good guidance for some of the activities at UC Davis as part of the ADVANCE program in which I am involved. And she ends by recommending

It is time for women to “lean in” and demand family policies that will at least give them a fighting chance to have both a successful career and babies.

I agree. But it is also time for men to do the same. The more that men also support and demand such policies the quicker things will change.

For many many years I have been raising a key questions in relation to open access publishing - how can we measure how open someone's publications are. Ideally we would have a way of measuring this in some sort of index. A few years ago I looked around and asked around and did not find anything out there of obvious direct relevance to what I wanted so I started mapping out ways to do this.

When Aaron Swartz died I started drafting some ideas on this topic. Here is what I wrote (in January 2013) but never posted:

With the death of Aaron Swartz on Friday there has been much talk of people posting their articles online (a short term solution) and moving more towards openaccess publishing (a long term solution). One key component of the move to more openaccess publishing will be assessing people on just how good a job they are doing of sharing their academic work.I have looked around the interwebs to see if there is some existing metric for this and I could not find one. So I have decided to develop one - which I call the Swartz Openness Index (SOI).

Let A = # of objects being assessed (could be publications, data sets, software, or all of these together).

Let B = # of objects that are released to the commons with a broad, open license.

A simple (and simplistic) metric could be simply

OI = B / A

This is a decent start but misses out on the degree of openness of different objects. So a more useful metric might be the one below.

A and B as above.

Let C = # of objects available free of charge but not openly

OI = ( B + (C/D) ) / A

where D is the "penalty" for making material in C not openly available

This still seems not detailed enough. A more detailed approach might be to weight diverse aspects of the openness of the objects. Consider for example the "Open Access Spectrum." This has divided objects (publications in this case) into six categories in terms of potential openness: reader rights, reuse rights, copyrights, author posting rights, automatic posting, and machine readability. And each of these is given different categories that assess the level of openness. Seems like a useful parsing in ways. Alas, since bizarrely the OAS is released under a somewhat restrictive CC BY-NC-ND license I cannot technically make derivatives of it. So I will not. Mostly because I am pissed at PLoS and SPARC for releasing something in this way. Inane.But I can make my own openness spectrum.

And then I stopped writing because I was so pissed off at PLOS and SPARC for making something like this and then restricting it's use. I had a heated discussion with people from PLOS and SPARC about this but not sure if they updated their policy. Regardless, the concept of an Openness Index of some kind fell out of my head after this buzzkill. And it only just now came back to me. (Though I note - I did not find the Draft post I made until AFTER I wrote the rest of this post below ... ).

To get some measure of openness in publications maybe a simple metric would be useful. Something like the following

P = # of publications

A = # of fully open access papers

OI = Openness index

A simple OI would be

OI = 100 * A/P

However, one might want to account for relative levels of openness in this metric. For example

AR = # of papers with a open but somewhat restricted license

F = # of papers that are freely available but not with an open license

C = some measure of how cheap the non freely available papers are

And so on.

Given that I am not into library science myself and not really familiar with playing around with this type of data I thought a much simpler metric would be to just go to Pubmed (which of course works only for publications in the arenas covered by Pubmed).

From Pubmed one can pull out some simple data.

# of publications (for a person or Institution)

# of those publications in PubMed Central (a measure of free availability)

Thus one could easily measure the "Pubmed Central" index as

PMCI = 100 * (# publications in PMC / # of publications in Pubmed)

Some examples of the PMCI for various authors including some bigger names in my field, and some people I have worked with.

Name

#s

PMCI

Eisen JA

224/269

83.2

Eisen MB

76/104

73.1

Collins FS

192/521

36.8

Lander ES

160/377

42.4

Lipman DJ

58/73

79.4

Nussinov R

170/462

36.7

Mardis E

127/187

67.9

Colwell RR

237/435

54.5

Varmus H

165/408

40.4

Brown PO

164/234

70.1

Darling AE

20/27

74.0

Coop G

23/39

59.0

Salzberg SL

107/162

61.7

Venter JC

53/237

22.4

Ward NL

24/58

41.4

Fraser CM

78/262

29.8

Quackenbush J

95/225

42.2

Ghedin E

47/82

57.3

Langille MG

10/14

71.4

And so on. Obviously this is of limited value / accuracy in many ways. Many papers are freely available but not in Pubmed Central. Many papers are not covered by Pubmed or Pubmed Central. Times change, so some measure of recent publications might be better than measuring all publications. Author identification is challenging (until systems like ORCID get more use). And so on.

Another thing one can do with Pubmed is to identify papers with free full text available somewhere (not just in PMC). This can be useful for cases where material is not put into PMC for some reason. And then with a similar search one can narrow this to just the last five years. As openaccess has become more common maybe some people have shifted to it more and more over time (I have -- so this search should give me a better index).

Lets call the % of publications with free full text somewhere the "Free Index" or FI. Here are the values for the same authors.

Name

PMC

%

Pudmed

PMCI

Free

%

Pubmed

5 years

FI - 5

Free

%

Pubmed

All

FI-ALL

Eisen JA

224/269

83.2

178/180

98.9

237

88.1

Eisen MB

76/104

73.1

32/34

94.1

83

79.8

Collins FS

192/521

36.8

104/128

81.3

263

50.5

Lander ES

160/377

42.4

78/104

75.0

200

53.1

Lipman DJ

58/73

79.4

20/22

90.9

59

80.8

Mardis E

127/187

67.9

90/115

78.3

135

72.2

Colwell RR

237/435

54.5

31/63

49.2

258

59.3

Varmus H

165/408

40.4

21/28

75.0

206

50.5

Brown PO

164/234

70.1

20/21

95.2

185

79.0

Darling AE

20/27

74.0

18/21

85.7

21

77.8

Coop G

23/39

59.0

16/20

80.0

28

71.8

Salzberg SL

107/162

61.7

54/58

93.1

128

79.0

Venter JC

53/237

22.4

20/33

60.6

85

35.9

Ward NL

24/58

41.4

18/27

66.6

30

51.7

Fraser CM

78/262

29.8

9/13

69.2

109

41.6

Quackenbush J

95/225

42.2

54/75

72.0

131

58.2

Ghedin E

47/82

57.3

30/36

83.3

56

68.3

Langille MG

10/14

71.4

11/13

84.6

11

78.6

Very happy to see that I score very well for the last five years. 180 papers in Pubmed. 178 of them with free full text somewhere that Pubmed recognizes. The large number of publications comes mostly from genome reports in the open access journals Standards in Genomic Sciences and Genome Announcements. But most of my non genome report papers are also freely available.

I think in general it would be very useful to have measures of the degree of openness. And such metrics should take into account sharing of other material like data, methods, etc. In a way this could be a form of the altmetric calculations going on.

But before going any further I decided to look again into what has been done in this area. When I first thought of doing this a few years ago I searched and asked around and did not see much of anything. (Although I do remember someone out there - maybe Carl Bergstrom - saying there were some metrics that might be relevant - but can't figure out who / what this information in the back of my head is).

Full Citation: Willmott, MA, Dunn, KH, Duranceau, EF. (2012). The Accessibility Quotient: A New Measure of Open Access. Journal of Librarianship and Scholarly Communication 1(1):eP1025. http://dx.doi.org/10.7710/2162-3309.1025

Here is the abstract:

AbstractINTRODUCTION The Accessibility Quotient (AQ), a new measure for assisting authors and librarians in assessing and characterizing the degree of accessibility for a group of papers, is proposed and described. The AQ offers a concise measure that assesses the accessibility of peer-reviewed research produced by an individual or group, by incorporating data on open availability to readers worldwide, the degree of financial barrier to access, and journal quality. The paper reports on the context for developing this measure, how the AQ is calculated, how it can be used in faculty outreach, and why it is a useful lens to use in assessing progress towards more open access to research.METHODS Journal articles published in 2009 and 2010 by faculty members from one department in each of MIT’s five schools were examined. The AQ was calculated using economist Ted Bergstrom’s Relative Price Index to assess affordability and quality, and data from SHERPA/RoMEO to assess the right to share the peer-reviewed version of an article.RESULTS The results show that 2009 and 2010 publications by the Media Lab and Physics have the potential to be more open than those of Sloan (Management), Mechanical Engineering, and Linguistics & Philosophy.DISCUSSION Appropriate interpretation and applications of the AQ are discussed and some limitations of the measure are examined, with suggestions for future studies which may improve the accuracy and relevance of the AQ.CONCLUSION The AQ offers a concise assessment of accessibility for authors, departments, disciplines, or universities who wish to characterize or understand the degree of access to their research output, capturing additional dimensions of accessibility that matter to faculty.

I completely love it. After all. it is directly related to what I have been thinking about and, well, they actually did some systematic analysis of their metrics. I hope more things like this come out and are readily available for anyone to calculate. Just how open someone is could be yet another metric used to evaluate them ...

And then I did a little more searching and found the following which also seem directly relevant

Thursday, June 13, 2013

As I have posted about before - I am involved in the UC Davis ADVANCE project funded by NSF. From the project website:

UC Davis ADVANCE is a newly funded Institutional Transformation grant that began in September of 2012. Our program is supported by the National Science Foundation’s ADVANCE Program which aims to increase the participation and advancement of women in academic science and engineering careers.

My role in this project is as a member (and now Co-Chair) of one of the "Policies and Practices Review Initiative" Committee. As part of my work on this committee I am trying to read various papers on related topics. And I figured I would simultaneously post about these papers as much as I can because it would be great to get a broader discussion going on these topics.

This paper presents part of the results of a completed study entitled A Longitudinal Study of Minority Ph.D.s from 1980-1990: Progress and Outcomes in Science and Engineering at the University of California during Graduate School and Professional Life. It focuses particularly on the graduate school experience and degree of preparation for the professoriate of African American doctoral students in the sciences and engineering, and presents the results of a survey of 33 African American STEM Ph.D.s from the University of California earned between 1980-1990. Relationships with thesis advisors and principal investigators are evaluated by the study participants in fifteen specific areas from highly-ranked intellectual development to low-ranked training in grant writing. Deficits in training and socialization are discussed along with the tension between being both an African American and a graduate student. Career choices and outcomes are presented. These findings, in conjunction with current analyses of graduate education in STEM, suggest ways in which graduate training for all could be improved.

Lots of interesting information in there. Perhaps most important for my current goals is what she describes at the end in terms of a Proposed Development Program. She starts this section by commenting on the general situation in terms of training scientists in the US today. She then identifies what she refers to a "discontinuities" in federal and local policy which can hinder "developing faculty of color." These include "compartmentalized, externally mandated sets of programs" and the "nature of Ph.D. training". Of the 33 Ph.D.s surveyed in the study, nearly all of them recommended diversity training for faculty. They also recommend better laying out of expectations and requirements for students and more involvement of current faculty in recruiting. They also made many recommendations for improving the life of current students of color.

Anyway - a lot of this material and the concepts involved are bit new to me so I am still digesting the article. But I thought I would share it with others in the hope that this will help catalyze more open discussion of issues involved women and underrepresented minorities in STEM fields.

... cause and effect between spores and asthma may remain a challenge, metagenomictechnologieswill ...Metagenomics data are likely to provide a very different understanding of the potential diversity ...(2006) Reconstructing the early evolution of fungi using a six-gene phylogeny. ...

The second paper listed there takes one to a very strange site. It appears to be a pseudo-mirror of the journal site all embedded within the domain "Hea1thandFitness.Com". Note - this domain name has the number "1" replacing the letter "l" in the domain name - I assume as a trick of sorts. Clicking on the link takes you to a site for which I have done a screen shot below

The whole thing is weird - with the number instead of the letter and the weird formatting of the site. Just a tiny glitch? Well, I don't think so since in some of my other Google Scholar alerts other links to this same domain came up. See examples below:

Ocean infections are usually all-pervasive and also numerous and also play the game mainjobs relating to be the inside overseas biogeochemical menstrual cycles thru their mortality,horizontally gene transfer, and also mind games about put in relationship metabolism. ...

... Several studies have demonstrated that some solid sur- faces such as clay minerals are ableto stimulate microbial... Indigenous microbial cells of Whitley Bay sediments were iso- lated andproliferated via several subcultures prior to use for laboratory biodegradation studies. ...

... be in charge created by for the focusing on sense made out created by the comprehensiveforensics education a ... Previous SectionNext Section Designer cellulosome technology also hasalso been recommended to your indigenous microbial enzymatic destruction created by ...

"Just as we have unwittingly destroyed vital microbes in the human gut through overuse of antibiotics and highly processed foods, we have recklessly devastated soil microbiota essential to plant health through overuse of certain chemical fertilizers, fungicides, herbicides, pesticides, failure to add sufficient organic matter (upon which they feed), and heavy tillage."

OK - sounds serious. But is it really true? Have pesticides really devastated soil microbiota? What about tillage? Seems possible, but also seems possible that this would not be true. Would be nice to see the evidence behind this claim.

How about this one:

"Reintroducing the right bacteria and fungi to facilitate the dark fermentation process in depleted and sterile soils is analogous to eating yogurt (or taking those targeted probiotic "drugs of the future") to restore the right microbiota deep in your digestive tract."

Sounds good too. But way too overly simplistic. I mean - probiotics for people are a bit of a complicated mess right now. Some work. Most probably don't. Most of the claims are overblown. So to say we know how to do this well in "soil" definitely seems to be an overstatement. Again, specific evidence for this would be nice.

And then this:

"Due to new genetic sequencing and production technologies, we have now come to a point where we can effectively and at low cost identify and grow key bacteria and the right species of fungi and apply them in large-scale agriculture."

Soil is a very very complicated place in terms of microbes. I personally think we are really far away from this utopian view of growing the key species to apply them to large scale ag. Evidence that this is true? I don't know of much. Yes we can sequence things. We can sequence a lot of things. But "identify and grow key bacteria and the right species of fungi" - I think we are far from being able to do this robustly.

Another claim in the article has some ring of truth:

We can sow the "seeds" of microorganisms with our crop seeds and, as hundreds of independent studies confirm, increase our crop yields and reduce the need for irrigation and chemical fertilizers.

Yes, this has a ring of truth. Certainly there are studies - many of them - involving adding microbes to seeds and how that impacts yield and nutrient and water requirements. And without a doubt in many cases such inoculation can help in many ways. But the "hundreds of independent studies" claim is a bit misleading as there are also many cases where inoculation does not help. So we should be cautious before adding microbes to seeds becomes the equivalent of probiotics for people. Not all probiotics that are claimed to help people actually do anything. And not all microbes added to seeds will do much of anything useful either.

How about the claim:

Thus the microbial community in the soil, like in the human biome, provides "invasion resistance" services to its symbiotic partner. We disturb this association at our peril

Sounds good. And has a ring of truth too. And in general I agree with the sentiment that we should not screw with ecosystems without recognizing that the microbes in those systems may play important and useful roles. However, just because SOME microbes play important and useful roles in systems does not of course mean that ALL are ones we want to keep. There will be some in the soil that damage plants and hurt yield and pathogen resistance just as there will be some that are "good" from our point of view.

And then there is this

We are now at a point where microbes that thrive in healthy soil have been largely rendered inactive or eliminated in most commercial agricultural lands; they are unable to do what they have done for hundreds of millions of years, to access, conserve, and cycle nutrients and water for plants and regulate the climate.

And also

The mass destruction of soil microorganisms began with technological advances in the early twentieth century.

Sounds nice. But I don't really know of much evidence that the microbes have been rendered inactive or eliminated in commercial agricultural lands.

I suppose this is all building up to the following

Fortunately, there is now a strong business case for the reintroduction of soil microorganisms in both small farms and large-scale agribusiness. Scientific advances have now allowed us to take soil organisms from an eco-farming niche to mainstream agribusiness. We can replenish the soil and save billions of dollars.

and

For all these reasons, bio fertility products are now a $500 million industry and growing fast. The major agricultural chemical companies, like Bayer, BASF, Novozymes, Pioneer, and Syngenta are now actively selling, acquiring or developing these products.

So --- this is in a way an article promoting the financial benefit of adding microbes to soil. I think this is reasonable although not completely convincing. Alas, after reading the article I discovered this about one of the authors

Mike Amaranthus is the chief scientist at Mycorrhizal Applications, Inc., a company working on innovations in soil biology.

This is not to say that someone with a financial role in convincing the world to add microbes to soil cannot be trusted to provide a good guide about microbes in the soil. But it would have been nice for this to be mentioned more prominently in the article. Many of the claims in this article do not pass the smell test to me. And all of them seem to be pointing towards a solution involving a company that one of the authors is involved in. If this were about human medical treatments many many people might get bent out of shape by this. Again, not to say people with financial interests cannot write good articles. But the potential for conflicts in such cases, as in the case here, is great. And thus we should view with a tint of extra skepticism some of the claims made by such individuals. And in this case here I already felt uncomfortable with many of the claims. I think the Atlantic could do better and could certainly require the author to make more clear in the article itself what the author's personal interest in the claims are.

On this episode, Jonathan talks about "evolvability," the probability that organisms can invent new functions. To do this, he has been using genome data in conjunction with experimental information to try and understand the mechanisms by which new functions have originated.

Another area of interest for Eisen is the "built environment." We live and work in buildings or structures which are non-natural environments, new to microbes. These "new" environments represent a controlled system in which to study the rules by which microbial communities form.

Jonathan is interested in these environments as basic science vehicle and he shares the importance of studying the built environment for science and human health.Finally Jonathan explains his interest in "open science," the ways in which science is shared. At it's core, Eisen wants to leverage cheaper technologies to accelerate the progress of science in a positive way.

This episode was recorded at the American Association for the Advancement of Science Meeting in Vancouver, British Columbia on February 18, 2012.

I went to a talk yesterday by Nancy Moran at UC Davis. Nancy is one of my science heroes. I have worked on a few projects with her and am just a big fan of her body of work on symbioses. I have written about her work her on this blog many times before including

I wondered - where else might one find Biology themed preprints. And a little google searching let me to this new PLOS Biology paper which somehow I had missed a few weeks ago: The Case for Open Preprints in Biology

Wow - how perfect. In their paper they not only lay out the case for why preprints would be a good thing in biology but discuss some of the options. And in addition to PeerJ and arXiv they point to Figshare, Github, and ResearchGate.

Below is Figure 1 from their paper:

Figure 1. It can take several months before a submitted paper is officially published and citable.. Meanwhile, few people are aware of the research that has been done since, typically, only close colleagues are given access to the preprints. With public preprint servers, the science is immediately available and can be openly discussed, analyzed, and integrated into current research. doi:10.1371/journal.pbio.1001563.g001

They also show that in arXiv submissions in the qBio section are going up but not nearly as much as submissions in other fields

I think this paper is worth a look for anyone interest in scientific publishing. I like their last line and will end my post with it:

Preprints are simply bypassing this model for what we believe is the progress of science: they speed up the dissemination of scientific discoveries and put on readers' shoulders the responsibility to judge originality and pertinence.

It reports on an effort by various scientific publishers to create something they call "CHORUS" which stands for "Clearinghouse for the Open Research of the United States." They claim this will be used to meet the guidelines issued by the White House OSTP for making papers for which the work was supported by federal grants available for free within 12 months of being published.

This appears to be an attempt to kill databases like Pubmed Central which is where such freely available publications now are archived. I am very skeptical of the claims made by publishers that papers that are supposed to be freely available will in fact be made freely available on their own websites. Why you may ask am I skeptical of this? I suggest you read my prior posts on how Nature Publishing Group continuously failed to fulfill their promises to make genome papers freely available on their website.

We need to make sure such papers are freely available permanently and the only way to do this is via making them available outside of the publishers own sites. Pubmed Central seems to be a good solution for this. I would be happy to hear other possible solutions - but leaving "free" papers under the control of the publishers is a bad idea.

UPDATE 6/27/2013

Saw this Tweet

We just published the story yesterday about the 700.000 year old horse that we sequenced. Check it out ! http://t.co/jAym3HLAC0

Seemed potentially really interesting. Read the story and got pointed to a new Nature paper on the ancient horse genome. I guess not so surprisingly, despite the fact that they report a new genome sequence, it is not openly available. We really cannot trust Nature on this can we? They could say "Well, this is a draft genome, and we did not mean to apply our policy to draft genomes." Well, that would be weird since, well, they have applied this to draft genomes before.
And then I decided to search for other examples ... and in about ten minutes I found a few. See

Monday, June 03, 2013

Today I am happy to have a guest post from my friend and colleague Jake Scott. The topic of the day is preprints in biology and medicine.Hi - I'm Jake Scott. I met Jonathan last year when he and I spoke at TEDMED 2012. Both Jonathan and I have posted recently about the need for, and (slowly) growing movement in the biological sciences to post #preprints of manuscripts in openly accessible fora to circumvent some problems associated with standard academic publishing. Most worrisome are the issues surrounding #openaccess and the length of time it takes to get information from one's brain to the literature - drastically slowing down the pace of science.

This has worked GREAT in the physics community, where this trend really began quite some time ago when the high energy physicists started the arXiv. Now, the precedent is set, and no one in physics bats an eye about sticking their paper on the arXiv, and cite other works presented there as standard publications.

The climate in biology, sadly, is much different. Whether this is because of a more competitive climate for funding, or just a field diluted by more talented scientists, I don't know. But there is a pervasive attitude of fear and mistrust around the idea of preprints.

Before you read on (and become biased by my opinions) take a few second (really, probably 1.5 minutes) and take this quick survey: