Why does 23andMe show that I share an unusually high amount of DNA (50%) with my full-sibling?

Alternate title/misleading answer: 23andMe counts FIR twice

Scientists who aren’t familiar with genetic genealogy will be very confused by the above question. After all, 50% is the expected amount, or average, that two siblings share.

AncestryDNA doesn’t report the amount of fully-identical regions (FIR) that two people share with each other. To avoid confusion, I’ll note that they do count and use FIR in identifying and labeling full-siblings. And here’s one thing that some people won’t believe when you say it: AncestryDNA counts FIR as half-identical regions (HIR). They’ll riposte that AncestryDNA ignores FIR, but that isn’t true. When you count the number of of centiMorgans (cM) from HIR and FIR, but you count FIR as if it’s HIR, a population of full-siblings will share 37.5% DNA, on average. This has lead a large percentage of genetic testing consumers to believe that 37.5% is the average amount of shared DNA for full-siblings.

Here’s the part that people get right, kind of. When asked why 23andMe reports an average of 50% between siblings, they answer that 23andMe counts FIR twice. While not technically wrong, I find this answer to be misleading. Why? Well, FIR is a match on both chromatids of a chromosome, so it’s a double match, i.e. it should be counted twice. In fact, in order for a population of full-siblings to have 50% identical by descent (IBD) sharing, on average, which is what geneticists know to be a fact, FIR have to be counted twice. When I see someone answer the question with “23andMe counts FIR twice,” I see a lot of other people say that it’s a misleading way to do it and that it over-counts or double-counts. But it isn’t misleading; it’s correct. The best answer to the original question is that “AncestryDNA counts FIR as if it’s HIR.” Not that AncestryDNA ignores FIR, because that wouldn’t be a true statement.

If for some reason you don’t believe me, consider this Java code I wrote and use daily. The following code counts HIR between two individuals. I’ve never run this code on full-siblings and gotten an average of anything close to 50%. It’s always 37.5%. Please note that this code only works on genomes in which the only shared base-pairs are indeed IBD. This can be done in simulated data when the farthest back shared ancestors have uniquely labeled DNA (distinct from each other). In these genomes, humans don’t share over 99% of their DNA, as it makes it very easy to know the exact amount of IBD sharing between two individuals.

You might be able to tell that there’s no part of the above code that checks if two individuals share FIR, and therefore there’s no way for the code to completely exclude FIR sharing. It simply checks every base-pair of the genome and adds a 1 to the count if people match on either their maternal or paternal chromosomes. Please note that this code won’t work for relationships in which person 1, via their paternal side, matches the maternal side of person 2, or vice versa.

Conversely, here’s an algorithm that I never thought of writing until I saw a social media admin. “correct” someone for saying that AncestryDNA counts FIR as HIR. They said that it doesn’t count FIR at all—it completely ignores it for reporting purposes. But it turns out that if AncestryDNA completely ignored FIR for reporting purposes that full-siblings would only share 25% by that metric. I wrote the following code solely for the purpose of testing this and I’ve run it only once. This code isn’t otherwise very useful because ignoring FIR wouldn’t usually help us with our research.

The code doesn’t show the average that you’d expect if AncestryDNA used this metric. It doesn’t produce the 37.5% average that AncestryDNA reports, so it isn’t the way they count HIR sharing. The code resulted in 25% shared DNA between full-siblings when ignoring FIR. Using this method would be confusing as there would be no difference in reported DNA sharing between half-siblings and full-siblings. I haven’t used the code for anything since. But it showed me that AncestryDNA doesn’t completely ignore FIR in its cM or percentage reporting, otherwise it would report 25% shared DNA, on average, between full-siblings.

I hope you’ve found this information helpful. And I really hope that I start hearing people answer the above question with “AncestryDNA counts FIR as if it’s HIR” and not the other answer, which insinuates that 23andMe reports misleading information by double-counting shared DNA.

Cover photo by Sharon McCutcheon. Feel free to ask a question or leave a comment. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. Or, try a calculator that lets you find the amount of an ancestor’s DNA you have when combining multiple kits. I also have some older articles that are only on Medium.

Why does 23andMe show that I share an unusually high amount of DNA (50%) with my full-sibling?

Alternate title/misleading answer: 23andMe counts FIR twice

Related

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives