The above results are from a supervised admixture analysis of my family and myself. The fact that there are three replicates of me is due to the fact that I converted my 23andMe, Ancestry, and Family Tree DNA raw data into plink files. Notice that the results are broadly consistent. This emphasizes that discrepancies between DTC companies in their results are due to their analytic pipeline, not because of data quality.

The results for my family are not surprising. I’m about ~14% “Dai”, reflecting East Asian admixture into Bengalis. My wife is ~0% “Dai”. My children are somewhere in between. At the low fraction you expect some variance in the F1.

Now below are results for three Swedes with the sample reference panel:

Group

ID

Dai

Gujrati

Lithuanians

Sardinian

Tamil

Sweden

Sweden17

0.00

0.09

0.63

0.28

0.00

Sweden

Sweden18

0.00

0.08

0.62

0.31

0.00

Sweden

Sweden20

0.00

0.05

0.72

0.23

0.00

All these were run on supervised admixture frameworks where I used Dai, Gujrati, Lithuanians, Sardinians, and Tamils, as the reference “ancestral” populations. Another way to think about it is: taking the genetic variation of these input groups, what fractions does a given test focal individual shake out at?

The commands are rather simple. For my family:bash rawFile_To_Supervised_Results.sh TestScript

Here is what the scripts do in two different situations. Imagine you have raw genotype files downloaded from 23andMe, Ancestry, and Family Tree DNA.

Download the files as usual. Rename them in an intelligible way, because the file names are going to be used for generating IDs. So above, I renamed them “razib_23andMe.txt” and such. Leave the extensions as they are. You need to make sure they are not compressed obviously. Then place them all in ancestry_supervised/RAWINPUT.

The script looks for the files in there. You don’t need to specify names, it will find them. In plink the family ID and individual ID will be taken from the text before the extension in the file name. Output files will also have the file name.

Aside from the raw genotype files, you need to determine a reference file. In REFERENCESFILES/ you see the binary pedigree/plink file Est1000HGDP. The same file from the earlier post. It would be crazy to run supervised admixture on the dozens of populations in this file. You need to create a subset.

For the above I did this:grep "Dai|Guj|Lithua|Sardi|Tamil" Est1000HGDP.fam > ../keep.keep

When the script runs, it converts the raw genotype files into plink files, puts them in INDIVPLINKFILES/. Then it takes each plink file and uses it as a test against the reference population file. That file has a preprend on group/family IDs of the form AA_Ref_. This is essential for the script to understand that this is a reference population. The .pop files are automatically generated, and the script inputs in the correct K by looking for unique population numbers.

The admixture is going to be slow. I recommend you modify runadmixture.pl by adding the number of cores parameters so it can go multi-threaded.

When the script is done it will put the results in RESULTFILES/. They will be .csv files with strange names (they will have the original filename you provided, but there are timestamps in there so that if you run the files with a different reference and such it won’t overwrite everything). Each individual is run separately and has a separate output file (a .csv).

But this is not always convenient. Sometimes you want to test a larger batch of individuals. Perhaps you want to use the reference file I provided? For the Swedes I did this:grep "Swede" REFERENCEFILES/Est1000HGDP.fam > ../keep.keep

Please note the folder. There are modifications you can make, but the script assumes that the test files inINDIVPLINKFILES/. The next part is important. The Swedish individuals will have AA_Ref_ preprended on each row since you got them out of Est1000HGDP. You need to remove this. If you don’t remove it, it won’t work. In my case, I modified using the vim editor:vim Sweden.fam

You can do it with a text editor too. It doesn’t matter. Though it has to be the .fam file.

After the script is done, it will put the .csv file in RESULTFILES/. It will be a single .csv with multiple rows. Each individual is tested separately though, so what the script does is append each result to the file. If you have 100 individuals, it will take a long time. You may want to look in the .csv file as the individuals are being added to make sure it looks right.

The convenience of these scripts is that it does some merging/flipping/cleaning for you. And, it formats the output so you don’t have to.

I originally developed these scripts on a Mac, but to get it to work on Ubuntu I made a few small modifications. I don’t know if it still works on Mac, but you should be able to make the modifications if not. Remember for a Mac you will need the make versions of plink and admixture.

For supervised analysis, the reference populations need to make sense and be coherent. Please check the earlier tutorial and use the PCA functions to remove outliers.

Comments Off on Tutorial to run supervised admixture analyses

November 11, 2012

I was at ASHG this week, so I’ve followed reactions to the election passively. But one thing I’ve seen is repeated commentary on the fact that Asian Americans have swung toward the Democrats over the past generation. The thing that pisses me off is that there is a very obvious low-hanging fruit sort of explanation out there, and I’m frankly sick and tired of reading people ramble on without any awareness of this reality. We spent the past few months talking about the power of polls, and quant data vs. qual (bullshit) analysis, with some of my readers going into full on let’s-see-if-Razib-is-moron-enough-to-swallow-this-crap mode.

In short, it’s religion. Barry Kosmin has documented that between 1990 and 2010 Asian Americans have become far less Christian, on average. Meanwhile, the Republican party has become far more Christian in terms of its identity. Do you really require more than two sentences to infer from this what the outcome will be in terms of how Asian Americans will vote?

October 18, 2012

Last week the GSS was down. I was very sad. The SDA team explains the situation:

Part of the popularity of our demonstration archive is that it is free for end users. We are happy to provide this service. It is a valuable resource for the academic community and it also publicizes the value of our SDA software. However, the flip side of providing this free service is that it does not generate any income to offset the cost of providing the infrastructure required. We receive no funding from GSS for hosting their datasets — which is often a surprise to our users. Almost all of our income comes from the fees provided by licensing the SDA software to other data archives (like ICPSR and IPUMS), and virtually all of that income goes to support the programming and technical support that we provide them. We obviously need some additional sources of revenue.

Comments Off on The general social survey: information is not free

October 7, 2012

Despite the real estate bubble bursting, it looks as if Florida will surpass New York in population by the next Census. I once made some quick money by betting an older gentleman that Texas had a larger population than New York. I suspect there’s even more money to be made by betting people that Florida has a larger population than New York in a few years. The reality is that most people don’t check statistics in their free time, so some “facts” get frozen in their minds. A great number of adults alive today were told in elementary or secondary school that New York was the second largest state in population. They are unlikely to update their views as they age. Unfortunately, I suspect these confusions are going to lead to public policy problems as well. I am not confident that our elected officials are any more aware of statistics than their constituents.

Comments Off on The relative decline of New York

October 2, 2012

If you had the sense that Paul Ehrlich and Garrett Hardin are very much figures of the 1970s nexus of environmentalism and population control, it seems you are right. According to Google Ngrams mention of these topics has been declining since peaking during the oil crisis, in the afterglow of the influence of the late 1960s counter-culture. The general social survey has a variable, POPGRWTH, which asks:

And please circle one number for each of these statements to show how much you agree or disagree with it. The earth cannot continue to support population growth at its present rate.

The question was asked in the year 2000 and 2010. Demographic breakdowns below for the pooled responses….

September 15, 2012

In the post below I took the time out to link to the GSS, as well as posting my exact queries. As payment for this consideration the first comment was absolute drivel. I understand people have political opinions, but I’m not too interested in your opinions. You may be interested in your opinions, but I’d rather have more data. Most people don’t know enough for me to have interest in their opinions (most != all, many readers do have opinions in their specialties which I seek out).

I was trying to make a point that anger and even violence in reaction to actions which offend are actually comprehensible as the modal human response. The community reacts to punish those who violate taboos. The taboos may differ, but the response to the action of violation is normal and natural. A primary issue that needs to be considered is that taboos differ from society to society, so one is often not conscious of the act of violation (e.g., if you show the bottom of your shoes to people when you sit down, that’s an offensive act in some societies).

An implication here is that American norms of free speech near absolutism, ...

Comments Off on Intelligence challenged people and free speech

September 1, 2012

After the post on fatness and homophobia I decided to query the GSS on the extent to which people think that fatness has a strong biological element, similar to homosexuality. There’s a variable, GENENVO1. It asks:

Character, personality, and many types of behavior are influenced both by the genes people inherit from their parents and by what they learn and experience as they grow up. For each of the following descriptions, we would like you to indicate what percent of the person’s behavior you believe is influenced by the genes they inherit, and what percent is influenced by their learning and experience and other aspects of their environment. The boxes on handcard D1 are arranged so that the first box on the LEFT (which is numbered 1) represents 100% genetic influence (and 0% environment). The next box (numbered 2) represents 95% genes (and 5% environment), and so on. The RIGHTMOST box (numbered 21) represents 100% environmental influence (and no genetic influence). After each description, please type the number of the box that comes closest to your answer. Please use the numbered scale on handcard D1 to indicate, FOR EACH OF THE BEHAVIORS DESCRIBED, what percent of the person’s behavior ...

Comments Off on The educated and conservative think fatness is a choice

August 29, 2012

Over at Econlog Bryan Caplan bets that India’s fertility will be sup-replacement within 20 years. My first inclination was to think that this was a totally easy call for Caplan to make. After all, much of southern India, and the northwest, is already sup-replacement. And then I realized that heterogeneity is a major issue. This is a big problem I see with political and social analysis. Large nations are social aggregations that are not always comparable to smaller nations (e.g., “Sweden has such incredible social metrics compared to the United States”; the appropriate analogy is the European Union as a whole).

So, for example, India obviously went ahead with its demographic transition earlier than Pakistan. But what this masks is that the two largest states in terms of population in India, in the far north, actually resemble Pakistan in demographics, not the rest of India. Uttar Pradesh, with a population 20 million larger than Pakistan, has similar fertility rate as India’s western neighbor. Bihar currently has a slightly higher fertility rate than Pakistan when you look at online sources (though the proportion under 25 is a little lower, indicating that its fertility 10-15 years ago was lower than Pakistan’s, ...

Over at Econlog Bryan Caplan bets that India’s fertility will be sup-replacement within 20 years. My first inclination was to think that this was a totally easy call for Caplan to make. After all, much of southern India, and the northwest, is already sup-replacement. And then I realized that heterogeneity is a major issue. This is a big problem I see with political and social analysis. Large nations are social aggregations that are not always comparable to smaller nations (e.g., “Sweden has such incredible social metrics compared to the United States”; the appropriate analogy is the European Union as a whole).

So, for example, India obviously went ahead with its demographic transition earlier than Pakistan. But what this masks is that the two largest states in terms of population in India, in the far north, actually resemble Pakistan in demographics, not the rest of India. Uttar Pradesh, with a population 20 million larger than Pakistan, has similar fertility rate as India’s western neighbor. Bihar currently has a slightly higher fertility rate than Pakistan when you look at online sources (though the proportion under 25 is a little lower, indicating that its fertility 10-15 years ago was lower than Pakistan’s, ...

Comments Off on The future of the three “Pakistans”

August 26, 2012

Over at Darwin Catholic a commenter asked whether a pro-choice commenter on this weblog also supported the death penalty. I presume that they were here pointing to the consistent life ethic issue. Many liberals who oppose capital punishment support abortion rights, and many conservatives who support capital punishment oppose abortion rights. These camps both have their viewpoints, which I’m not interested in re-litigating in the comments. But I was curious as to the overall societal support for the combinations of positions.

So I looked at the GSS, using the CAPPUN and ABANY variables (capital punishment, and abortion for any reason). In this post I will show you screenshots of the GSS output. It’s ugly, but it shows you deviation away from the expected proportions. Basically, if two variables are independent you can predict what you’d expect to be the crossed percentages over the four cells. If the results deviate from that you can ascertain particular associations. In the GSS output red means that the cell has a higher value than it should, and blue a lower value. Additionally, the intensity signals the magnitude of the deviation. I limited all results to the year 2000 and later.

First, the general aggregate ...

Comments Off on Non-whites consistent on “life” issues

August 21, 2012

It’s basically impossible to avoid hearing about Todd Akin right now. My Twitter and Facebook feeds are kind of swamped. But it did make me wonder: what percentage of Americans reject abortion in cases of rape and incest? The GSS has a handy variable, ABRAPE, which asks respondents about the possibility of abortion if a woman gets pregnant as a result of rape (let’s stipulate that it’s possible to get pregnant as a result of rape!). I also limited the sample to the year 2000 and later, and non-Hispanic whites (to clear out confounds). Demographic breakdowns below….

Before people start complaining, the scale below goes from 0% to 50%, NOT 0% to 100%!

Comments Off on Who rejects right to abortion in cases of rape?

August 20, 2012

Long time readers know that one of my pet hobby-horses is to try and convince more pundits that they should use the GSS. Opinions based on opinions may be fun, but opinions based on facts may be useful. In general my appeals have fallen on deaf ears. But today I notice that Will Saletan is using GSS data to discussion the Todd Akin case. You may not agree with Saletan’s take on the results, but at least he bothered to generate some results.

August 19, 2012

Reihan Salam has a post up on the alignment of racism and political orientation. He begins:

Recently, Chris Hayes, host of MSNBC’s UP with Chris Hayes, made the following observation:

It is undeniably the case that racist Americans are almost entirely in one political coalition and not the other.

Chris is a good friend of mine, and we grew up in the same milieu. I can attest to the fact that the view he expressed is very widely held in the circles in which we both travel….

Salam then links to Alex Tabarrok, who uses party identification data to indicate that actually racism is split between the two groups, while John Sides suggests that there is a definite lean toward Republicans being more racist, using a few indicator variables. Overall I think Sides is about right, all things equal conservatives are more racist than liberals. At least in the modern context of the two ideologies.* I say conservative/liberal rather than Republican/Democrat, because my experience with the GSS data set is that ideology is a more powerful predictor of social views among whites. This holds true with the variables which Tabarrok and Sides query from what I can see; the gap between ...

Comments Off on More racist: white liberals or white conservatives?

August 15, 2012

There was a question below in regards to the high fertility of some extreme (“ultra”) religious groups, in particular Haredi Jews. The commenter correctly points out that these Jews utilize the Western welfare system to support large families. This is not limited to just Haredi Jews. The reason Somalis and Arabs have fertility ~3.5 in Helsinki, as opposed to ~1.5 as is the norm, is in part to due to the combination of pro-natalist subcultural norms, and a generous benefits state. Of course we mustn’t overemphasize economics. Israel’s decline in Arab Muslim fertility but rise in Jewish fertility in the 2000s has been hypothesized to be due to different responses to reductions in child subsidies by Muslims and the Haredi Jews. In short, the former reacted much more strongly to economic disincentives in relation to the latter.

A bigger question is whether exponential growth driven by ideology can continue indefinitely. I doubt it. Demographics is inevitable, but subject to a lot of qualifications. Haredi political power in Israel grants some benefits, but at the end of the day basic economics will serve as a check on the growth of the population of this sector. Similarly, barring ...

Comments Off on Who shall inherit the earth?

August 9, 2012

Also, in modern society, doesn’t just about everyone reproduce, such that not only is any particular advantage competing against other countervailing pressures as you note, but also that the “less fit” genomes are not removed from the overall population, but rather are added back to the mix? In other words, the less-preferred short males don’t die and have zero kids, they also get married and their genes get thrown back into the pot.

First, let’s not get caught in the assumption that for genes to be disfavored one has to have zero fitness in individuals carrying those genes. If, for example, in a situation of demographic expansion you had individuals who had eight children vs. those who had one child, there would be selection for the traits which were passed by those with eight children in relation to those who had one child. But, it did make me realize I wasn’t intuitively aware of the distribution of number of offspring in the population. I assumed that the median was around two, but that’s about it.

So, I looked at the GSS CHILDS variable for individuals born in 1950 or earlier from the year 2000 on (COHORT and ...

Comments Off on What is the distribution of offspring per individual?

July 29, 2012

There’s a cliche, which isn’t totally false, that more education tends to lead one toward heterodox viewpoints which challenge conventional norms. But one issue that has been coming to the fore over the last 10 years or so is that college educated Americans tend toward social liberalism, and yet often continue to live very bourgeois lives. In other words, the freedoms which they favor are those freedoms which are ever operative in their own lives. In contrast those Americans without college educations tend to have a less libertarian attitude toward personal mores, but have lives characterized by greater disturbance and disastrous choices.

Though she wasn’t entirely surprised. Ever since her divorce three years ago, Ms. Thomas said, she has been antisocial, “nervous about what people would say.”

After all, she had gone from Park Slope matron, complete with involved husband (“We had cracked the code of Gen X peer parenthood”) and gut-renovated brownstone, to “a Red Hook divorcée,” she said, remarried with a new baby and two children-of-divorce barely out of preschool. “All of a sudden, this community I’d lived in ...

Comments Off on College makes you believe in marriage!

July 16, 2012

The readers of this weblog are relatively non-fecund, at least going by reader surveys. But I was curious nonetheless about the attitudes toward number of children, and realized goals of number of children, in the General Social Survey. I decided to look at two variables:

CHILDS

CHLDIDEL

The former asks the respondent how many children they had, the latter how many they’d like to have. I restricted the sample to whites ages 45-65 for every survey year. I then combined all the years of a particular decade, so you have 1970s, 1980s, 1990s, and 2000s. For demographics I looked at highest educational attainment, and household income indexed to 1986 real value dollars (so they are comparable across decades).

Two major takeaways:

1) Education matters more than income in terms of number of children. Having lots of education tends to reduce family size. No great surprise.

2) Ideal number of children increased in the 2000s, but the decline in average number of children continued.

There is often talk in the literature on the disjunction between ideal family size in Third World nations and the realized family size, with a larger number of children than women may want. What is less discussed is the inverse discussion. It seems that ...

Comments Off on People wanted more children in 2000s, but had fewer

July 13, 2012

Scientists should be allowed to do research that causes pain and injury to animals like dogs and chimpanzees if it produces new information about human health problems. (Do you strongly agree, agree, disagree, or strongly disagree?)

(variable ANSCITST)

I was curious because I ran into some stuff on pro-life sites today about how animal rights activist don’t oppose abortion, and how hypocritical that is. So naturally I was curious about how attitudes varied on that issue.

What the results above show is that there is almost no difference in attitudes toward animal research when you vary attitudes toward abortion on demand.In other words, 22 percent of pro-choice people oppose such research strongly, while 23 percent of pro-life people do. How does this vary by demographic?

June 24, 2012

Prompted by a comment below I was curious as to the correlation between intelligence and income. To indicate intelligence I used the GSS’s WORDSUM variable, which has a ~0.70 correlation with IQ. For income, I used REALINC, which is indexed to 1986 values (so it is inflation adjusted) and aggregates the household income. Finally, I limited my sample to non-Hispanic whites over the age of 30 (for what it’s worth, this choice also limited the data set to respondents from the year 2000 and later).

The results don’t get at the commenter’s assertions, because 10 out of 10 on WORDSUM does not imply that you’re that smart really. But the trendline is suggestive. Note that aggregated 0-4 because the sample size at the lower values is small indeed.

Comments Off on Higher vocabulary ~ higher income

June 23, 2012

In the further interests of putting quantitative data out their instead of vague impressions, I noticed two GSS variables which might be of interest. One queries the impression of effect on the environment of genetically modified crops. The second asks about whether science does more harm than good. The latter question exhibited almost no year to year variation of note, so I just threw them in a pot together. But for the environment and genetically modified crop question I show responses for the year 2000 and 2010. As you can see there is a modest difference in regards to the first where liberals are more skeptical.