I am misquoted about our current ability to predict height from genomic information, despite an hour on the phone with the Harvard-educated fact checker. There is also some confusion in that paragraph concerning the correlation between height and IQ and its relevance to the broader discussion :-(

Chris Chang has the wise, last words in the article :-) Click for larger version.

Some notes for my own use on research rankings by discipline for Michigan State University. The ranges are 95% confidence intervals using the R-rank (including, among other things, publications, citations, grants and awards) from the 2010 NRC evaluation (see, e.g., here). Rank ordered by, e.g., central value of R range, most of the departments below come out in the top 20-30 in the US. More data: S rankings in various fields.

In the NSF HERD (total R&D expenditures) data MSU is ranked 36th in the US, with 2012 expenditures just over $500 million. If research hospital numbers are excluded (MSU has historically had a teaching medical school, although this is changing), the MSU rank would be quite a bit higher.

Two old photos I discovered at my mom's house. These would be from 1986 (the graduation photo) and probably a year or two after that for the other one (taken in our garage in Iowa during Xmas break, check out the penny loafers!).

The Scientist: It was less than a year ago that scientists first applied CRISPR, a genome-editing technique, to human cells. In short order, the technique has taken off like wildfire. And now, two papers appearing in Cell Stem Cell today (December 5) show that CRISPR can be used to rewrite genetic defects to effectively cure diseases in mice and human stem cells.

“What’s significant about this is it’s taking CRISPR to that next step of what it can be used for, and in this case, it’s correcting mutations that cause disease,” said Charles Gersbach, a genomics researcher at Duke University, who was not involved in either study.

CRISPR stands for clustered regularly interspaced short palindromic repeats. These RNA sequences serve an immune function in archaea and bacteria, but in the last year or so, scientists have seized upon them to rewrite genes. The RNA sequence serves as a guide to target a DNA sequence in, say, a zygote or a stem cell. The guide sequence leads an enzyme, Cas9, to the DNA of interest. Cas9 can cut the double strand, nick it, or even knock down gene expression. After Cas9 injures the DNA, repair systems fix the sequence—or new sequences can be inserted.

In one of the new papers, a team from China used CRISPR/Cas9 to replace a single base pair mutation that causes cataracts in mice. The researchers, led by Jinsong Li at the Shanghai Institute for Biological Sciences, designed a guide RNA that led Cas9 to the mutant allele where it induced a cleavage of the DNA. Then using either the other wild-type allele or oligos given to the zygotes repair mechanisms corrected the sequence of the broken allele.

Li said that about 33 percent of the mutant zygotes that were injected with CRISPR/Cas9 grew up to be cataract-free mice. In an e-mail to The Scientist, Li said the efficiency of the technique was low, “and, for clinical purpose, the efficiency should reach 100 percent.”

Still, this was the first time CRISPR had been used to cure a disease in a whole animal, an advance that Jennifer Doudna, a leader in CRISPR technology at the University of California, Berkeley, said was encouraging. Both studies “show the potential for using the technology to correct disease-causing mutations, and that’s what very exciting here,” she said.

Hans Clevers, a stem cell researcher at the Hubrecht Institute in Utrecht, The Netherlands, led the other study, which used CRISPR/Cas9 to correct a defect associated with cystic fibrosis in human stem cells. The team’s target was the gene for an ion channel, cystic fibrosis transmembrane conductor receptor (CFTR). A deletion in CFTR causes the protein to misfold in cystic fibrosis patients.

Using cultured intestinal stem cells developed from cell samples from two children with cystic fibrosis, Clevers’s team was able to correct the defect using CRISPR along with a donor plasmid containing the reparative sequence to be inserted. The researchers then grew the cells into intestinal “organoids,” or miniature guts, and showed that they functioned normally. In this case, about half of clonal organoids underwent the proper genetic correction, Clevers said.

For both studies, the researchers did not have to make significant modifications to existing CRISPR protocols. Clevers said in an e-mail to The Scientist that, compared with other gene editing techniques, CRISPR was straightforward. “We tried TALENs [transcription activator-like effector nucleases] and Zinc finger approaches. CRISPR is exquisitely fast and simple,” Clevers said. Li agreed. “I think CRISPR/Cas9 system may be the easiest strategy to cure genetic disease than any other available gene-editing techniques,” he said.

One limitation of CRISPR is that the approach can create off-target effects—alterations to sites other than the target DNA. In both studies, off-target effects were relatively rare, said Gersbach. “While reducing off-target effects is a priority, it’s unrealistic to think you’d be able to get rid of all off-target effects,” he told The Scientist.

While the approach is far from ready for prime time, the results of both these studies show promise for future clinical potential. “I think each time an advance like this is made, people are more sure that this is a technique that is likely to be useful in treating humans,” said Doudna.

Friday, December 20, 2013

Good evening. The organizers have seen fit to place me on the schedule between you and dinner. So I will be mercifully brief.

As Terry mentioned, we had the awesome task of reviewing 150 promotion files last year. The word "awesome" can mean many things. Here it means humbling. It was humbling to review the impressive files of the full professor candidates. Our university is fortunate to have such tremendously accomplished faculty.

Terry, the Provost, and I upheld the highest standards in full professor promotion cases. Each and every person being honored here tonight has made important contributions to scholarship and learning, and is an invaluable member of our university community.

A final remark before we eat our rubber chickens. It is often said that we are entering or have entered the age of the knowledge economy -- that a person's contributions will be from their knowledge, intelligence, and creativity. But when an attorney prepares a case it is for her client. When a Google engineer develops a new algorithm, it is for Google -- for money. Fewer than one in a thousand individuals in our society have the privilege, the freedom, to pursue their own ideas and creations. The vast majority of such people are at research universities. A smaller number are at think tanks or national labs, but most are professors like yourselves. It is you who will make the future better than the past; who will bring new wonders into existence. Your work may be largely invisible to our fellow citizens, but the future owes its greatness to you. My sincere congratulations.

This figure is from the Supplement (p.62) of a recent Nature paper describing a high quality genome sequence obtained from the toe of a female Neanderthal who lived in the Altai mountains in Siberia. Interestingly, copy number variation at 16p11.2 is one of the structural variants identified in a recent deCODE study as related to IQ depression; see earlier post Structural genomic variants (CNVs) affect cognition.

From the Supplement (p.62):

Of particular interest is the modern human-specific duplication on 16p11.2 which encompasses the BOLA2 gene. This locus is the breakpoint of the 16p11.2 micro-deletion, which results in developmental delay, intellectual disability, and autism5,6. We genotyped the BOLA2 gene in 675 diverse human individuals sequenced to low coverage as part of the 1000 Genome Project Phase I7 to assess the population distribution of copy numbers in homo-sapiens (Figure S8.3). While both the Altai Neandertal and Denisova individual exhibit the ancestral diploid copy number as seen in all the non-human great apes, only a single human individual exhibits this diploid copy number state.

My recollection from the earlier (less precise) neanderthal sequences is that the number of bp differences between them and us is few per thousand. Whereas, for modern humans it's 1 per thousand with an additional +/-15% variation due to ethnicity. So, I think it's fair to say that they are qualitatively much more different from us than we (moderns) are from each other. See also The genetics of humanness.

My colleague James Lee (I note he is too modest to list his Harvard Law degree on his faculty page!) describes the current era in genomics as an "age of wonder" :-) We can anticipate tremendous discoveries in the next decade.

Wednesday, December 18, 2013

CNVs (structural genomic variants) associated with increased autism and schizo risk are found to depress cognitive function in carriers who do not present for either condition. There are also effects on physical brain structure.

This is the future of neuroscience: read out the genome and look for the direct effect on phenotype. Assuming the results hold up, we can conclude that these mutations cause abnormal cognitive function in humans. We are just at the beginning of this line of research: mutations of smaller effect size will require larger samples to detect, but they almost certainly exist.

In a small fraction of patients with schizophrenia or autism, alleles of copy-number variants (CNVs) in their genomes are probably the strongest factors contributing to the pathogenesis of the disease. These CNVs may provide an entry point for investigations into the mechanisms of brain function and dysfunction alike. They are not fully penetrant and offer an opportunity to study their effects separate from that of manifest disease. Here we show in an Icelandic sample that a few of the CNVs clearly alter fecundity (measured as the number of children by age 45). Furthermore, we use various tests of cognitive function to demonstrate that control subjects carrying the CNVs perform at a level that is between that of schizophrenia patients and population controls. The CNVs do not all affect the same cognitive domains, hence the cognitive deficits that drive or accompany the pathogenesis vary from one CNV to another. Controls carrying the chromosome 15q11.2 deletion between breakpoints 1 and 2 (15q11.2(BP1-BP2) deletion) have a history of dyslexia and dyscalculia, even after adjusting for IQ in the analysis, and the CNV only confers modest effects on other cognitive traits. The 15q11.2(BP1-BP2) deletion affects brain structure in a pattern consistent with both that observed during first-episode psychosis in schizophrenia and that of structural correlates in dyslexia.

This figure shows impairment in population SDs for different groups. V IQ and P IQ are Verbal and Performance IQ (Wechsler), IIUC.

From the Supplement -- check out the p values ;-)

My guess is that most intelligence alleles have negative effect. That is, the majority of genetic variation in cognitive ability is determined by the number and type of somewhat deleterious mutations we all carry around. (There are probably also minor alleles of positive effect, but fewer of them.) Note the CNVs in this article, while having a significantly (1 SD) negative effect on IQ, do not prevent reproduction (fecundity is reduced, but not to zero), so clearly mutations of large effect can linger for some generations. Mutations of smaller effect might even be neutral due to pleiotropy, etc.

New Yorker ...Spike Jonze’s movie, which was shot in Los Angeles and Shanghai, is set in a near but dateless future, where the rough edges of existence have been rubbed away. The colors of clothes and furnishings, though citrus-bright, are diluted by the pastel softness of the lighting, so that nothing hurts the eye. People ride in smoothly humming trains, not belching cars. And Theodore’s cell phone reminds you of those slender vintage cases for cigarettes and visiting cards; if the ghost of Steve Jobs is watching, he will glow a covetous green.

This little flat box, plus an earpiece that Theodore plugs in whenever he wakes up or can’t sleep, is his portal. It links him to OS1, “the first artificially intelligent operating system,” which is newly installed on his computer. More than that, “it’s a consciousness,” with a voice of your choice, and a rapidly evolving personality, which grows not like a baby, or a library, but like an unstoppable alien spore. Theodore’s version is called Samantha, and practically her first request is: “You mind if I look through your hard drive?” She tidies his e-mails, reads a book in two-hundredths of a second, fixes him up on a date, and, when that goes badly, has sex with him—aural sex, so to speak, but Theodore will take what he can get. No surprise, really, given that the role of Samantha is spoken by Scarlett Johansson.

... And it is romantic: Theodore and Samantha click together as twin souls, not caring that one soul is no more than a digital swarm. Sad, kooky, and daunting in equal measure, “Her” is the right film at the right time. It brings to full bloom what was only hinted at in the polite exchanges between the astronaut and hal, in “2001: A Space Odyssey,” and, toward the end, as Samantha joins forces with like minds in cyberspace, it offers a seductive, nonviolent answer to Skynet, the system in the “Terminator” films that attacked its mortal masters. We are easy prey, not least when we fall in love.

Sunday, December 15, 2013

@7 min "I was unprincipled enough to put down Russian in all my official paperwork because, obviously, it made it much easier to get into college." [ Slezkine is half Jewish; his father is Russian. See They take students like you there and I'm not Asian. ]

@29:30 The evolution of anti-semitism in the Soviet Union. From an overrepresentation of Jews in the early Soviet leadership, to (post-WWII and -Stalin) an era of quotas and overt discrimination, and an increasing identification of the Soviet state with Russian nationality.

From the introduction to The Jewish Century:

The Modern Age is the Jewish Age, and the twentieth century, in particular, is the Jewish Century. Modernization is about everyone becoming urban, mobile, literate, articulate, intellectually intricate, physically fastidious, and occupationally flexible. It is about learning how to cultivate people and symbols, not fields or herds. It is about pursuing wealth for the sake of learning, learning for the sake of wealth, and both wealth and learning for their own sake. It is about transforming peasants and princes into merchants and priests, replacing inherited privilege with acquired prestige, and dismantling social estates for the benefit of individuals, nuclear families, and book-reading tribes (nations). Modernization, in other words, is about everyone becoming Jewish.

Some peasants and princes have done better than others, but no one is better at being Jewish than the Jews themselves. In the age of capital, they are the most creative entrepreneurs; in the age of alienation, they are the most experienced exiles; and in the age of expertise, they are the most proficient professionals. Some of the oldest Jewish specialties-commerce, law, medicine, textual interpretation, and cultural mediation-have become the most fundamental (and the most Jewish) of all modern pursuits. It is by being exemplary ancients that the Jews have become model moderns. ...

Claude Lévi-Strauss has been called the father of modern (social) anthropology. He lived to be 100 years old, passing in 2009. I recommend his Tristes Tropiques, a memoir of time in the jungles of Brazil (full text in English). The lectures in the book below were delivered in Tokyo in 1986.

(p. 98-99) And who knows whether aggressive or contemplative dispositions, technical ingenuity, and so on, are not partly linked to genetic factors? None of these traits, as we apprehend them at the cultural level, can be clearly linked to a genetic foundation, but we cannot rule out a priori the distant effects of intermediate links. If such effects are real, it would be true to say that every culture selects genetic abilities that, by retroaction, influence the culture and reinforce its orientation.

Tuesday, December 10, 2013

A toy model of the dynamics of scientific research, with probability distributions for accuracy of experimental results, mechanisms for updating of beliefs by individual scientists, crowd behavior, bounded cognition, etc. can easily exhibit parameter regions where progress is limited (one could even find equilibria in which most beliefs held by individual scientists are false!). Obviously the complexity of the systems under study and the quality of human capital in a particular field are important determinants of the rate of progress and its character.

In physics it is said that successful new theories swallow their predecessors whole. That is, even revolutionary new theories (e.g., special relativity or quantum mechanics) reduce to their predecessors in the previously studied circumstances (e.g., low velocity, macroscopic objects). Swallowing whole is a sign of proper function -- it means the previous generation of scientists was competent: what they believed to be true was (at least approximately) true. Their models were accurate in some limit and could continue to be used when appropriate (e.g., Newtonian mechanics).

In some fields (not to name names!) we don't see this phenomenon. Rather, we see new paradigms which wholly contradict earlier strongly held beliefs that were predominant in the field* -- there was no range of circumstances in which the earlier beliefs were correct. We might even see oscillations of mutually contradictory, widely accepted paradigms over decades.

It takes a serious interest in the history of science (and some brainpower) to determine which of the two regimes above describes a particular area of research. I believe we have good examples of both types in the academy.

* This means the earlier (or later!) generation of scientists in that field was incompetent. One or more of the following must have been true: their experimental observations were shoddy, they derived overly strong beliefs from weak data, they allowed overly strong priors to determine their beliefs.

Why Science Is Not Necessarily Self-Correcting

(DOI: 10.1177/1745691612464056)

John P. A. Ioannidis
Stanford Prevention Research Center, Department of Medicine and Department of Health Research and Policy, Stanford University School of Medicine, and Department of Statistics, Stanford University School of Humanities and Sciences

￼￼￼￼The ability to self-correct is considered a hallmark of science. However, self-correction does not always happen to scientific evidence by default. The trajectory of scientific credibility can fluctuate over time, both for defined scientific fields and for science at-large. History suggests that major catastrophes in scientific credibility are unfortunately possible and the argument that “it is obvious that progress is made” is weak. Careful evaluation of the current status of credibility of various scientific fields is important in order to understand any credibility deficits and how one could obtain and establish more trustworthy results. Efficient and unbiased replication mechanisms are essential for maintaining high levels of scientific credibility. Depending on the types of results obtained in the discovery and replication phases, there are different paradigms of research: optimal, self-correcting, false nonreplication, and perpetuated fallacy. In the absence of replication efforts, one is left with unconfirmed (genuine) discoveries and unchallenged fallacies. In several fields of investigation, including many areas of psychological science, perpetuated and unchallenged fallacies may comprise the majority of the circulating evidence. I catalogue a number of impediments to self-correction that have been empirically studied in psychological science. Finally, I discuss some proposed solutions to promote sound replication practices enhancing the credibility of scientific results as well as some potential disadvantages of each of them. Any deviation from the principle that seeking the truth has priority over any other goals may be seriously damaging to the self-correcting functions of science

Sunday, December 08, 2013

NYTimes: Federal authorities have obtained confidential documents that shed new light on JPMorgan Chase’s decision to hire the children of China’s ruling elite, securing emails that show how the bank linked one prominent hire to “existing and potential business opportunities” from a Chinese government-run company.

The documents, which also include spreadsheets that list the bank’s “track record” for converting hires into business deals, offer the most detailed account yet of JPMorgan’s “Sons and Daughters” hiring program, which has been at the center of a federal bribery investigation for months. The spreadsheets and emails — recently submitted by JPMorgan to authorities — illuminate how the bank created the program to prevent questionable hiring practices but ultimately viewed it as a gateway to doing business with state-owned companies in China, which commonly issue stock with the help of Wall Street banks.

Thursday, December 05, 2013

When someone first described to me the evidence-based medicine movement, I responded (shocked): "Is that like science-based science? What were they doing before?"

Nature News: In biomedical science, at least one thing is apparently reproducible: a steady stream of studies that show the irreproducibility of many important experiments.

In a 2011 internal survey, pharmaceutical firm Bayer HealthCare of Leverkusen, Germany, was unable to validate the relevant preclinical research for almost two-thirds of 67 in-house projects. Then, in 2012, scientists at Amgen, a drug company based in Thousand Oaks, California, reported their failure to replicate 89% of the findings from 53 landmark cancer papers. And in a study published in May, more than half of the respondents to a survey at the MD Anderson Cancer Center in Houston, Texas, reported failing at least once in attempts at reproducing published data (see 'Make believe').

The growing problem is threatening the reputation of the US National Institutes of Health (NIH) based in Bethesda, Maryland, which funds many of the studies in question. Senior NIH officials are now considering adding requirements to grant applications to make experimental validations routine for certain types of science, such as the foundational work that leads to costly clinical trials. ...

A comment from a researcher quoted in the article notes

... a broader need to shift biomedical research from categorical statements and simple schematics to quantifiable hypotheses backed up by modeling and computation, open access to data (itself requiring new approaches and infrastructure) and better application of probability theory and statistics.

Monday, December 02, 2013

WDIST is now PLINK 1.9 alpha. WDIST (= "weighted distance" calculator) was originally written to compute pairwise genomic distances. The mighty Chris Chang then amazingly re-implemented all of PLINK with significant improvements (see below).

PLINK 1.9 even has support for LASSO (i.e., L1 penalized optimization, a particular method for Compressed Sensing).

This is a comprehensive update to Shaun Purcell's popular PLINK command-line program, developed by Christopher Chang with support from the NIH-NIDDK's Laboratory of Biological Modeling and others. (What's new?) (Credits.)

It isn't finished yet (hence the 'alpha' designation), but it's getting there. We are working with Dr. Purcell to launch a large-scale beta test in the near future. ...

Unprecedented speed

Thanks to heavy use of bitwise operators, sequential memory access patterns, multithreading, and higher-level algorithmic improvements, PLINK 1.9 is much, much faster than PLINK 1.07 and other popular software. Several of the most demanding jobs, including identity-by-state matrix computation, distance-based clustering, LD-based pruning, and association analysis max(T) permutation tests, now complete hundreds or even thousands of times as quickly, and even the most trivial operations tend to be 5-10x faster due to I/O improvements.

We hasten to add that the vast majority of ideas contributing to PLINK 1.9's performance were developed elsewhere; in several cases, we have simply ported little-known but outstanding implementations without significant further revision (even while possibly uglifying them beyond recognition; sorry about that, Roman...). See the credits page for a partial list of people to thank. On a related note, if you are aware of an implementation of a PLINK command which is substantially better what we currently do, let us know; we'll be happy to switch to their algorithm and give them credit in our documentation and papers.

Nearly unlimited scale

The main genomic data matrix no longer has to fit in RAM, so bleeding-edge datasets containing tens of thousands of individuals with exome- or whole-genome sequence calls at millions of sites can be processed on ordinary desktops (and this processing will usually complete in a reasonable amount of time). In addition, several key individual x individual and variant x variant matrix computations (including the GRM mentioned below) can be cleanly split across computing clusters (or serially handled in manageable chunks by a single computer).

Command-line interface improvements
We've standardized how the command-line parser works, migrated from the original 'everything is a flag' design toward a more organized flags + modifiers approach (while retaining backwards compatibility), and added a thorough command-line help facility.

Additional functions
In 2009, GCTA didn't exist. Today, there is an important and growing ecosystem of tools supporting the use of genetic relationship matrices in mixed model association analysis and other calculations; our contributions are a fast, multithreaded, memory-efficient --make-grm-gz/--make-grm-bin implementation which runs on OS X and Windows as well as Linux, and a closer-to-optimal --rel-cutoff pruner.

There are other additions here and there, such as cluster-based filters which might make a few population geneticists' lives easier, and a coordinate-descent LASSO. New functions are not a top priority for now (reaching 95%+ backward compatibility, and supporting dosage/phased/triallelic data, are more important...), but we're willing to take time off from just working on the program core if you ask nicely.