Please note that due to changes in the chip design used by 23andMe for their DNA tests, I can no longer accept new data submissions for the X-chromosome Project, either from 23andMe or from any other lab.

The project was designed for the old v2 chip formerly used by 23andMe, and due to the inclusion of many more SNPs on the later chip versions, these newer chips do not produce data output in a format which can be easily matched up with the old chip versions' data.

I will maintain the old data on the following page, for any researchers who are interested in viewing it:

I will be submitting my 23andme data in due course, and my decodeme data if I can figure out a way to do it.

What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK. It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match. Already this data has taught me that there are specific locations where blocks join that are found in some populations but not others. I have yet to come up with a molecular biological explanation of these observations but the data is very clear (I will add a posting on the subject later). This data was analyzed by Anders Palsen and will be submitted with his permission. The question is, how do I go about doing it. The data is presently in a Zip format - can this be uploaded to WF (it is too large to upload to my personal website and link from there). Thanks.

I will be submitting my 23andme data in due course, and my decodeme data if I can figure out a way to do it.

What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK. It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match. Already this data has taught me that there are specific locations where blocks join that are found in some populations but not others. I have yet to come up with a molecular biological explanation of these observations but the data is very clear (I will add a posting on the subject later). This data was analyzed by Anders Palsen and will be submitted with his permission. The question is, how do I go about doing it. The data is presently in a Zip format - can this be uploaded to WF (it is too large to upload to my personal website and link from there). Thanks.

David,

The PLINK results you are referring to seem like they could be a fantastic resource. I am still trying to figure out myself what we can do with data uploads on the project website. There is the ability to attach data in relatively small files (under 1024 KB) to any given forum message, but what you are talking about sounds far larger.

I currently have the ability to post two large spreadsheets on the project website (just how large I don't know, but we can experiment with that), and this capacity might be able to expanded in the near future (I have to talk to Terry Barton about that). One of the available slots is the spreadsheet that I am currently already testing here (the one for which I've asked for feedback), but if people don't find that spreadsheet useful, then we could certainly replace it with something else.

The other slot is currently available for whatever spreadsheet people here feel would be the most useful (e.g., your PLINK data). I think I'd have to know what the file size is, to see if I can get it to work in the frame that's available. I'd also need to know what the ZIP file unzips into (e.g., an Excel file, PDF, etc.).

You're welcome to e-mail it to me if you want me to take a look at it, or else you can just describe it to me further.

...What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK. It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match. ...

Couple things here: Can PLINK parameters be set to tighter limits, say .5Mb or 50 SNP's? Is there a reason for the broader limits? Cannot haploblocks come in smaller sizes?And, how about a PLINK tutorial?

...What I also have is the full analysis of the HGDP-CEPH panel of 52 populations using the program PLINK. It consists of over 25,000 lines of data and allows folks see the matches within this dataset of over 1000 indivduals and about 5 blocks each (huge variation here) when the bar is set at 1 Mb and 100 SNP exact match. ...

Couple things here: Can PLINK parameters be set to tighter limits, say .5Mb or 50 SNP's? Is there a reason for the broader limits? Cannot haploblocks come in smaller sizes?And, how about a PLINK tutorial?

I will send it to you by e-mail GhostX, assuming that I can find your address since I don't think a PM would work. It unzips into an Excel file. My data is included in the mix, as if I was a member of "the panel". I have 6 matches (about average - my Xibo match has 8).

Alas tomcat I am not the PLINK expert, actually I haven't even read the literature on this program let alone experimented with it and burned up days of computer time on my laptop doing the analyses. Only Anders has done this. I hope that he will join our group.

There are many different programs, each does something a little different, but after trying most of them Anders seems to have found PLINK to best meet our objectives. All of these programs are available online for those willing to download them and experiment a bit. I am not quite ready for this. At the moment my focus is on collecting references and outlining specifics about the X as background to understanding the output. Perhaps someone with a solid math - stats background would be willing to get into the act here.

Couple things here: Can PLINK parameters be set to tighter limits, say .5Mb or 50 SNP's? Is there a reason for the broader limits? Cannot haploblocks come in smaller sizes?And, how about a PLINK tutorial?

Yes you can but the output file will be several hundred megabytes large and almost unmanagable in addition you must be able to manage an enourmous amount of usable and unusable block information, mostly the latter.

I am a bit unclear here GhostX (not an uncommon occurence). I submitted the complete 23andme dataset to Ben and he trimmed it and included the X part. I have no concerns about privacy issues surrounding SNPs embedded in genes - my DNA is an open book. Can you obtain the data directly from Ben's site (it would seem to be the simplest approach) since I have given here and now my permission to include my data in any analysis you or others here care to perform?

Secondly, it would appear that the vast majority of people here have either tested with 23andme or with both decodeme and 23andme. Hence there does not seem to be a good reason to send the decodeme file - although you are welcome to it if it helps in any way.

I am a bit unclear here GhostX (not an uncommon occurence). I submitted the complete 23andme dataset to Ben and he trimmed it and included the X part. I have no concerns about privacy issues surrounding SNPs embedded in genes - my DNA is an open book. Can you obtain the data directly from Ben's site (it would seem to be the simplest approach) since I have given here and now my permission to include my data in any analysis you or others here care to perform?

Secondly, it would appear that the vast majority of people here have either tested with 23andme or with both decodeme and 23andme. Hence there does not seem to be a good reason to send the decodeme file - although you are welcome to it if it helps in any way.

David,

Yes, in your case I can get your data for the various haploblocks from Ben's spreadsheet, and I'll be happy to do that as soon as I'm done extracting the remaining haploblocks from the dna-forums discussions (just so I can do it all at once rather than having to keep going back to your data with each block that I post). Feel free to remind me in a couple of days if I forget.

I didn't want to offer to automatically do that with everyone though, for various reasons (partly because it's just too much work for me to go back and do that for everybody just yet). Once I'm done getting the project website all set up, then maybe I'll go back and try to reassign data from people who have de-anonymized themselves (if I can keep everybody straight--it's getting confusing with people listing their names in different ways, and with different instances of the same name--sometimes it's different family members, and sometimes it's just a different chromosome for the same person!). Incidentally, people who are only listed by first name at this point (or by a common surname) are at risk of getting lost in the shuffle with all the new names that I keep adding to the results sheet, so people can let me know if they want to be listed in a more specific fashion. On a couple of instances where two or more different family members are listed, I've had to guess which result went with which family member.

In the meantime, if anybody sends me SNP sequence(s) via PM or e-mail and tells me how they want to be listed (by name or otherwise), I'll add it to the results chart immediately, or move it from your anonymous listing to a listing by specific name.

Regarding your second question: No, I don't need anybody to send me their DeCODEme file. What I meant in my earlier post is that if somebody wants to write out the procedure for extracting data for the DeCODEme raw data, then I'll post a link to that writeup (or just paste the procedure in my original post), so that other DeCODEme customers will know how to extract theirs.

Thanks for asking for the clarification--I probably could have been more clear in my original message.