Sunday, May 16, 2010

What's Hard to Call?

It looks like heterozygous base pairs are harder for 23andMe (or its Illumina chip) to call than homozygous base pairs.

I tested with 23andMe twice.  Of the autosomal SNPs that were no-calls on one of my tests but were genotyped on the other test, about half were heterozygous and about half were homozygous.

This is in contrast to the fact that about 68% of the autosomal SNPs overall were homozygous.

So my heterozygous SNPs were disproportionately represented among the no-calls.

Call Me Sometime

When you test with 23andMe, there are always some no-calls: locations at which the genotyping chip wasn't able to get a good reading, so no result is reported.

I tested twice with 23andMe. The first time I tested, there were 2607 no-calls, and the second time there were 3260 no-calls.

But these generally weren't the same SNPs. Most of the time, a no-call on one test was resolved by the other test. Only 459 SNPs were no-calls both times.

Merging the results from the two tests actually gives 544 no-calls, since I have to add as new no-calls the 85 SNPs that were reported differently on the two tests.

There are 578,320 SNPs that 23andMe reports on. So merging the two tests brings the no-call rate from 0.45% and 0.56% individually all the way down to 0.094% for the combo data.

By the way, it's possible that some of the 459 repeated no-calls don't represent inadequacies in the test at all but instead indicate microdeletions—short fragments of DNA that most people have but that are missing in my genome. (For the autosomes, this would require my having inherited a microdeletion from both parents, which seems unlikely.)

Saturday, May 15, 2010

My Spitting Image: 23andMe error rate

The company 23andMe offers a DNA testing service. Send them a sample of your spit, and they'll test your DNA at nearly 600,000 nucleotide locations.

These locations are among the single-nucleotide polymorphisms, or SNPs, the spots in human DNA which are known to differ frequently among individuals — unlike the vast majority of our genetic code which is identical among all humans.

I've been curious what the 23andMe error rate is. So I took advantage of their DNA Day offer to test a second time.

The new results from 23andMe just came in (amazingly fast -- only 10 days after their receipt of my spit kit, even though they said it would take 6-8 weeks).

Comparing the raw data, 85 SNPs were called differently by 23andMe in the two tests. Of these 85 SNPs, 73 were called as homozygous one time and heterozygous the other time. The remaining 12 were homozygous but opposite on the two occasions. (None of the differences were on the X, Y, or mitochondrial unpaired locations.)

This suggests a 23andMe error rate of 0.0074%, based on calls that differed on the two occasions.

The actual error rate is presumably somewhat higher than this, since some of the SNPs that were called one time but were no-calls the other time may also be incorrect, but I have no way of identifying those. It's also possible that there are a few systematic errors, which would be wrong nearly every time.

I also have Family Finder results from Family Tree DNA; they have some SNPs in common with 23andMe. The two companies use different chips: 23andMe uses an Illumina chip, and FTDNA went with Affymetrix.

On the SNPs that 23andMe disagreed on and that Family Finder also covered, FTDNA always agreed with one or the other of the 23andMe results. (It was conceivable that all three results would sometimes be different, with one heterozygous genotype and two opposite homozygous genotypes, but that didn't happen at all in my data.)

All the above data excludes no-calls. I'll put something up on those in a separate post. By the way, in case anyone who uses 23andMe is wondering, my alter ego shows up as my identical twin in Relative Finder!