About Me

Wednesday, August 17, 2011

@Google: Genetics and Intelligence

I'll be giving a talk at Google tomorrow (Thursday August 18) at 5 pm. The slides are here. The video will probably be available on Google's TechTalk channel on YouTube, perhaps after some delay.

The Cognitive Genomics Lab at BGI is using this talk to kick off the drive for US participants in our intelligence GWAS. More information at www.cog-genomics.org, including automatic qualifying standards for the study, which are set just above +3 SD. Participants will receive free genotyping and help with interpreting the results. (The functional part of the site should be live after August 18.)

Title: Genetics and Intelligence

Abstract: How do genes affect cognitive ability? I begin with a brief review of psychometric measurements of intelligence, introducing the idea of a "general factor" or IQ score. The main results concern the stability, validity (predictive power), and heritability of adult IQ. Next, I discuss ongoing Genome Wide Association Studies which investigate the genetic basis of intelligence. Due mainly to the rapidly decreasing cost of sequencing, it is likely that within the next 5-10 years we will identify genes which account for a significant fraction of total IQ variation.

We are currently seeking volunteers for a study of high cognitive ability. Participants will receive free genotyping.

Angela, haven't you been paying attention to this blog? :) If you had been, surely you would've remembered Steve discussing how physicists, with their superior cognitive horsepower, often dabble in other fields such as economics, psychometrics, biology, computer science, etc, but that the converse seldom happens, if ever. And also, as Steve has discussed, many of these other fields could benefit from the superior cognitive horsepower of physicists, as some of the conceptual muddle could certainly be cleared up by intellectually superior minds . ;)

Heh, yes well biology has certainly benefited from the involvement of physicists.

"According to Crick, the experience of learning physics had taught him something important—hubris—and the conviction that since physics was already a success, great advances should also be possible in other sciences such as biology. Crick felt that this attitude encouraged him to be more daring than typical biologists who tended to concern themselves with the daunting problems of biology and not the past successes of physics."

I don't think Google is stupid enough to step into the political minefield of the genetic basis of IQ. All they need is high IQ employees, not the genetic basis of high IQ. Frankly I'm surprised Steve Hsu still has a job at Univ. of Oregon considering how controversial his views are. He certainly has no shortage of hubris in maintaining an HBD blog using his real name and identity. That takes cajones, considering how liberal and politically correct academia is.

I'm interested in participating. In addition to the raw SNP data, will there be an interpretation like that provided by 23andme, or at least links to places where we can find known associations between SNPs and phenotype?

Something that came to my mind when looking at this and some of the earlier posts. Aren't IQ scores censored data? Surely a sufficiently large population that gains, on average, a total of 105 points on an IQ test over one hour, but a total of 120 over two, is smarter than one that gains a total of 105 over one hour but makes no improvement in the second hour. Why is this aspect never considered? I suppose one can argue that how `fast' one can think is an important component of IQ, but in that case you can penalize the score gain by the additional time spent...

LSAT should definitely be used, in my opinion, it is a much stronger indicator of high-end g than GRE. (I tried them both.) GRE-M will be maxed out by 6% of all students who try to apply to grad schools in the country (which means that the score of 800 only indicates +2 SD or so). In contrast, LSAT is much tougher to max out. The "perfect 180" is achieved by one out of ten thousand (!) test takers.

How many people have super high LSATs but can't make the SAT/ACT cutoff?

I want to stress again that people can qualify without meeting any of the automatic criteria. The survey form on the site is sufficiently general that you can make your case: e.g., I got 180 on the LSAT; here is a scan of my score report ...

Very true. I guess, what I'm trying to say is, automatic SAT/ACT/especially GRE cutoffs are rather low. You're saying that automatic qualifiers are set above +3 SD, but, in reality, they will allow lots of people below 2.5 SD and occasionally even below 2 SD. If you just want to get a bunch of smart guys, SAT and ACT will suffice. (Or you could just run an ad in a Mensa newsletter.) If you really want a list of people above 3 SD without shelling out $1000/person on psychometrist to retest everyone using real, in-person IQ tests, that's where LSAT comes useful.

Seems to me like Steve's study is geared more towards people with high mathematical ability, given the various qualifying criteria. It's possible that using the LSAT will lead to some high M people perhaps missing the cutoff.

I think that in terms of V and M scores, it's that the study is all about finding g, and having high V and M subtests is a better indicator of having higher g than having having a lower M but higher V or conversely a lower V but higher M, which are more indicative of having a lower g and a higher subfactor related to maths in some way. Which subfactors may or may not be heritable (current evidence for subfactor heritabiliy I believe being low within population) but isn't what's being looked for here in any case.

There shouldn't be too much difference since g is the largest common factor, but it's probably better to optimise for high g using high V and M.

The LSAT does seem like a fairly general test that isn't strongly skewed towards verbal capabilities, assuming the http://en.wikipedia.org/wiki/Law_School_Admission_Test list of scores is accurate, even if the pool of test takers might be.

In an earlier thread I raised the problem of gene-environment correlations. Here's a specific one to consider.. Suppose that parents with high IQ genes are able to impart better environments for their children. If these environments have any effect, then they'll be correlated with the signals being picked up by Steve's study. Furthermore, there's no reason to believe this correlation wouldn't exist in other societies, so replicating in other populations wouldn't avoid this issue.

What you are saying is that the "average excess" will exceed the "average effect" as a result of a kind of population structure: ability-increasing alleles are confounded with beneficial environments. See Ronald Fisher's paper on the subject for definitions of these terms and explications of their meanings. I do not think it can be rigorously shown that Fisher's implicit regression of the phenotype on all loci in the genome properly isolates the effect of an actual causal locus (see this book for what this means)--but it is certainly a very reasonable notion. In any case many tools devised by statistical geneticists (e.g., EIGENSTRAT and EMMAX) can be seen as approximations of Fisher's ideal, and these have proven to be extremely successful in the control of population structure. See this new paper on multiple sclerosis for an impressive exampe.

The GIANT Consortium has confirmed that the great majority of their height loci discovered in population samples replicate in within-family designs. Since nature randomly selects which allele a heterozygous parent passes on to an offspring, within-family designs are immune from population structure. In the future, as GWAS expands for any given phenotype, this kind of confirmation of population associations in (smaller) samples of families will be highly desirable.

The relation between time taken and ability is rather complex. A rough generalization is that more able people take less time on easy times and more time on hard items; less able people tend to give up quickly on harder items.

Psychometricians have proposed using the time taken on a given item to update the provisional estimate of an examinee's ability in computer-adaptive testing. Taking into temporal information should thus extract more information from a fixed number of items. I do not know if any operational testing programs have actually incorporated a proposal of this kind.

ROFL! Y'all are amusingly obsessed (OBSESSED!) with your own brilliance and the need for proof thereof. I look forward to watching the talk. A thought on the use of LSAT scores -- I smoked the LSAT... but that was way back in 1990. Year by year the LSAT has become much harder. Perhaps the exam administrator (is that LSAC?) has a meaningful way to compare scores from different years but I suspect you'll run into the same "low ceiling" problem with old LSAT scores that you have with SAT scores.

First, is my example really a case of population stratification? I thought population stratification required that there be sub-populations with systematic differences in allele frequencies outside of the genes that have an effect. What I raised would be a problem even if the high IQ people only correlated on the IQ effect genes.. So I don't see how it can be controlled for in the same way as typical population stratification.

Second, isn't using principle components to control for population structure not without controversy? For example, see this criticism of the method http://www.cell.com/AJHG/retrieve/pii/S0002929711002187

The letter addresses a slightly different issue than the control of population structure in determining the effect of a single locus. The letter criticizes a method introduced by Goddard and colleagues for estimating the total genetic variance associated with the SNPs that happen to be present on a genotyping chip (without regard for individual loci). They point out that the method produces a massively biased estimate if there is extreme population structure, a bias that is only partially removed by the use of PCs as regression covariates. (I use the term "population structure," here, to mean any confounding of genotype with other causes of the trait, including environmental causes. The reply by Goddard et al. makes finer distinctions.)

What are the implications of this for identifying individual causal variants? Well, the letter cites the EMMAX method as being appropriate in this context, so according to the letter--not much. Thinking more about your own example of ability being confounded with the environmental boost of being raised by smart parents, I am no longer certain that genomic background can fully control for it. (Even God would not be able to predict the ability of your parents with complete accuracy from just your genome.) However, as I said, family designs are immune to confounding, and in the future I anticipate that such designs will be used to verify any results in samples of unrelated individuals.

The separate issue of whether the letter casts any doubts on applications of SNP-based heritability estimation is also an interesting one. In their reply, I think Goddard et al. get the better of the argument.

Taking the 2010 stats, ~130k took the AMC 10/12 (about 60k each). About ~500 qualify for the USAMO/USAJMO, or about 1 in 250. There's definitely some self-selection bias for people taking the test, so it seems reasonable to me. Oddly enough, if you go back further, there were more students taking the AHSME (240k in 1999) with fewer USAMO qualifiers (around 200 in 1999, I think).

I wonder if the self-selection bias has increased over time? Or maybe the increase of standardized testing has edged out the AMC?

What makes you think the cutoffs are low?2010 stats for the SAT (http://professionals.collegeboard.com/profdownload/sat-percentile-ranks-composite-cr-m-2010.pdf) indicate 4646 with 1560+ for V+M out of ~1500k. Even if you suppose all of those scorers are 800M, that's still +2.7 SD.

Also keep in mind that there's huge selection bias for the GRE already, so the percentiles are not what they would be for the general population.

There have been plenty of papers showing heritability, so looking for a genetic basis is a logical next step. I don't see it as a political minefield unless you start throwing in race and gender (see Larry Summers).

Again, Angela is correct. I taught at colleges and universities for 37 years Nowadays, each institution has an administrative office dedicated to detecting and suppressing politically correct ideas, usually imposing such punishments as outright dismissal, loss of salary or tenure or rank, or public humiliation. Prof. Hsu has a job only because the office at UO hasn't found his blog yet.

Why is genetic basis of intelligence controversial? This is the mainstream view in psychology and biology. Whether you or millions of average Americans like this or not will not prevent other countries from studying this to benefit their people and prevent the truth from coming out. Last I checked, even scholar like Philippe Rushton is still tenured with the University of Western Ontario. If you call Steve's research controversial, what do you call Phillippe Rushton's research?

Not quite. The gre-m is taken by college students applying to graduate school. So that means that those taking the test not only got into college, but then had higher than average gpa's there. So it may be closer to a 2.5+ for the math. You need to account for the fact that those applying to grad school probably have an average iq of 110 at least. Well, hopefully.

We cannot give many additional details regarding the design of this study (or others we are carrying out) for several reasons. One is that our potential and actual collaborators may not want to be disclosed at the moment.

I'm an automatic, but in the consent form there's this: At an advanced stage of the study, BGI-CGL may provide you access to your genetic dataand interpretations thereof with respect to ancestry, disease risk, and predicted trait levels(including level of cognitive ability).Is that estimate of cognitive ability for the non-automatics only. If not there's a problem. If you

I'm an automatic, but in the consent form there's this: At an advanced stage of the study, BGI-CGL may provide you access to your genetic data and interpretations thereof with respect to ancestry, disease risk, and predicted trait levels (including level of cognitive ability). Is that estimate of cognitive ability for the non-automatics only. If not there's a huge problem.

The discrepancy between the trait prediction and your actual phenotype is an (increasingly noisy, as the relative contribution of environment increases) estimator of how much is still unknown about your genome.

I'm interested in the answer to point #1.. On that note, what if you have a curiosity gene, which made you want to be a case? Or a teaching gene that made you want to be in a PhD program? Anything that separates the case group from smart people as a whole could confound the study.

Your genetic data will probably never provide as much information about your phenotype as measurements of the phenotype itself. If you want to know how fast you are, use a stopwatch; don't bother to measure your ACTN3 genotype. That said, even elite athletes often *are* curious about their ACTN3 genotype, and there seems to be no harm in allowing that itch to be scratched.

Hopefully we can tell whether an association arises from population stratification.

I was thinking about prospects of genetic engineering (what would a person with all IQ genes turned on look like?) and estimates on page 28 of the slides made me realize something.

It assumes that intelligence is determined by many (10^3) genes of equal small effect. But it can't work like that! Either the number N must be much smaller, or some genes are significantly more important than others.

Suppose that there are in fact N genes of equal effect. For simplicity, assume that they all have normal allele frequencies of 50%. Then we should be able to construct an "intelligence measure" equal to the share of positive alleles among these N, which correlates linearly with physically measurable quantities, e.g. the speed of solving Raven's matrices of fixed difficulty.

If N=10^3, then the average person has 50% of positive alleles and the person at +3 SD has 57% of positive alleles. It means that the person at +3 SD would only be 10-15% better/faster on any such test. But that is obviously not the case.

The very difficulty of devising tests that measure IQ much further than that would suggest that people at +3..4 SD have their abilities nearly "saturated", which could happen if N is rather low. For example, if N=50, then the median person has 25 positive alleles and the person at +4 SD has 47 positive alleles (and the remaining 3 would not matter much).

Of course, the very idea of additive IQ is rather crude, because, at some points, there are qualitative shifts. Hence IQ is hard to reduce to a linear performance measure like height or a 100-meter sprint time. But still, that is a useful perspective.

If I think about it some more, I should be able to come up with estimates of N and tests that help us measure it.

The model in the slides is just a toy model to illustrate scaling. In reality there will be distributions in effect sizes and allele frequencies in a particular population. See the height results which are starting to flesh this out for a different quantitative phenotype.

Are you aware of any studies that quantify the relationship between processing speed and IQ? (Preferably the problem-solving processing speed, and not things like reaction time.) I'm trying to quantify it, and I'm getting curious results (see image). The huge dynamic range leads me to suspect that the mean frequency of IQ-positive alleles is very low (maybe 10-20%). But it's hard to reproduce the high-end behavior, regardless of the model I try to use.

On the second thought, that high end behavior is EXACTLY what we should expect ... Suppose that we break down the time to execute a task into N pieces, and time to execute each piece depends on a single gene, and total time is a simple sum of all pieces. Having a few "strong" genes which can significantly reduce the processing time, and a lot of "weak" genes, each of which independently shave off a percent or two, would produce the relationship between 'g' and processing speed as shown above.

"Then we should be able to construct an "intelligence measure" equal to the share of positive alleles among these N, which correlates linearly with physically measurable quantities."

If you wanted to, but there'd still be a bell shaped curve.

"It means that the person at +3 SD would only be 10-15% better/faster on any such test."

It doesn't mean that. The samll effect is in IQ points. That's the measure of better worse. With 500,000 SNPs all with +1/2 or -1/2 point effect for homos and 0 points for heteros with probs 1/4, 1/4, 1/2 the SD is sqrt(500,000)*1/2 = A LOT assuming no covariance.

On the "Volunteer" page, when I enter my email address and click submit, I receive the following message: "Check your email for instructions." It's been a week since I submitted my email address, and I have yet to receive any instructions. When can I expect them?

But a surprisingly large part of those 25k is responsible for brain development or functioning of the nervous system. There was an article a few years ago that estimated that 58% of human transcriptome is expressed in brains of at least 5% of humans. The human brain map at http://human.brain-map.org identifies around 1000 genes which may be relevant here.

Hmm, I could swear I made a response to this, but it's not visible any more?

Anyway. Have you ever heard of the protein domain DUF1220? This is a protein domain of unknown function that is encoded independently by at least 30 and possibly over 60 different genes (some of them also do it multiple times); it's highly specific to humans (we have 6 times the number of copies of higher apes and it's almost nonexistent in other mammals); it's expressed primarily in regions of the brain responsible for higher cognitive function, and its copy number variation is correlated with things like brain size, the risk of autism, and the risk of schizophrenia. I'd expect to see a correlation with IQ as well. That's 60 genes right there. And it's just one pathway of many.

Steve,Are open discussions allowed on your blog? I felt compelled to comment on this study of intelligence and posted some points on what I felt was an incongruence with your intended study of intelligence and the qualifying criteria listed, yesterday, but post seemed to have been deleted.