A DNA tester has a few different options to see the possible relationships for their matches, including at the testing sites themselves. A new one has functionality like no other.
I used to not be a fan of relationship prediction for genetic genealogy, wondering why people didn’t just check what ranges of shared DNA were possible. It’s good to admit when you’re wrong, which I’m doing now. When you get a DNA match, checking the shared centiMorgans (cMs) or percentage in a relationship predictor is the fastest way to see how you might be related to that person. Doing so isn’t going to do any harm and it will usually save you a whole lot of time.
Since April of 2020 there has been a very accurate relationship predictor available to you for free. It’s the first and only tool to show you the differences between in-group relationship types, i.e. the differences between, say, a grandparent/grandchild relationship and a half-sibling relationship, which are sometimes quite large. These groups of relationship types have the same average, but much different ranges. The above tool is also the only relationship predictor to show you the differences between paternal and maternal matches for 1st cousins and closer.
Update 3 Feb. 2023: A new relationship predictor allows you to enter the # of segments along with total cMs for far better predictions!
- Traditional relationship predictions
- A double cousin relationship predictor (added 9 Mar. 2022)
- Relationship predictions to help validate known relatives (no population weights)
- Relationship predictions for X-DNA matches (ignoring atDNA, added 29 June 2022)
Here’s why I was wrong about relationship predictions. Not only is it the quickest way to get a full list of possible relationships, but it also orders them from top to bottom by the highest probability. Maybe when relationship prediction was new there were people who scoffed at the idea of separating all of the relationship types and listing their individual probabilities rather than just listing each one that’s possible. If so, those would be the same people who would today scoff at the idea of separating in-group relationship types and even paternal/maternal relationships. But, again, listing those separately and showing the individual probabilities costs you no time (it costed me thousands of hours, but that’s ok), rather it saves you from doing time-consuming genealogical and match clustering work on less likely relationships before checking the more likely ones.
I hope that you recognize how incredibly valuable it is to have accurate relationship prediction like I now do. We really can’t do genetic genealogy without first knowing what relationship types to expect for a given cM or percentage value, and there’s no better and faster way to do that than with a relationship predictor. So which one should you use? I’m going to provide some examples that show the importance of the new features.
The most striking difference between in-group relationship types is that of grandparent/grandchild to the other relationships sharing 25%, on average. That includes aunts/uncles/nieces/nephews and half-siblings. Here are some examples I’ve seen of people who had questions about the amount of cM they shared between a grandparent or grandchild.
This first one was something I had been hoping to see. A person already knew that two matches were paternal grandparent and grandchild to each other and the amount shared was 1,417 cM at Ancestry. This value is fairly well below average. Farther away from the averages is where the big differences show up.
Confirming what this person already knew, and regardless of the question being asked, I saw that not only was grandparent/grandchild the most likely relationship type, but there was a clearly higher probability that this was a paternal match! The paternal grandparent/grandchild relationship type has the largest in-group differences of any relationship type. Not surprisingly, the most accurate relationship predictor got it right. Imagine if a person didn’t know the relationship, which is usually the case, and they had gotten started with investigating a half-sibling, avuncular, or maternal relationship first. That would be a big waste of time.
These results aren’t cherry-picked and it isn’t an accident that the paternal values are all higher in this case. True, I don’t often see cM values for when people already know the relationship type. And that’s why I saved it. More often people don’t know their match and, when it shows that it’s a very high likelihood of being a grandparent or grandchild, I try to explain that we now have this capability. Since the relationship predictor’s input data are very accurate, the above cM value will be grandparent/grandchild 16.4% of the time. That’s not even close to a majority of the time—a consequence of splitting up in-group types as well as paternal and maternal sides—but it’s higher than any other individual relationship type by 1.9 percentage points.
In another case, a woman had an unknown match at 23andMe who was 50 years her senior. They shared 2,306 cM, including one full X Chromosome copy. A full X Chromosome copy will always be shared with a paternal grandmother, but that will only happen about 7% of the time with a maternal grandparent. The age and X-DNA suggests that the person may be a paternal grandmother. The two cM input boxes on the left are designed for autosomal DNA (atDNA) only: subtracting 180 cM of X-DNA led to 2,126 cM of shared atDNA.
In this case paternal grandparent/grandchild was the most likely relationship, 3.3 percentage points higher than for a maternal grandparent. I don’t want to minimize the fact that this should be confirmed by looking at shared matches. If clustering reveals that two grandparent lines are shared, then the next thing this woman should consider is a paternal half-sister or an aunt or uncle. But plugging this value into an accurate relationship predictor takes only a few seconds and it fulfills the first step of finding the possible relationship types.
In the example directly above, the grandparent and grandchild shared a below-average amount of DNA. In cases when the amount shared is well above average, the predictions will be even stronger. Here’s a case that I made up to show that. It would be great to compile a dataset of matches sharing this much cM.
Other Close Relationship Examples
Another person posted a match between two women of similar ages. An image from the chromosome browser showed that they shared one full X Chromosome copy. This is always the case with paternal half-sisters and only occurs about 1% of the time for maternal half-sisters. They are likely paternal half-siblings. How does the relationship predictor perform? Subtracting the cM from X-DNA, resulting in 1,456 cM, and plugging it into the relationship predictor showed the following:
As I mentioned before, these results aren’t cherry-picked. The most likely relationship here is grandparent/grandchild. But these woman are almost definitely paternal half-sisters. It isn’t possible for one to be the other’s grandmother. While it’s possible that one could be the other’s paternal aunt, sharing one full X Chromosome copy in that case would only occur about 1% of the time, making that much less likely than half-sisters. The relationship predictor gave paternal half-siblings the next highest likelihood after grandparent/grandchild. And this will often happen, as grandparent/grandchild should occur less than a third of the time, despite having the most likely individual probability.
It isn’t too often that people ask about their close matches when they already have a known relationship. It helps when a full X Chromosome is shared, as there are a couple of relationships that are much more likely in that case. It also helps if the ages are the same or quite different, although age can often be misleading in genetic genealogy. A very similar age rules out a grandparent/grandchild relationship and very different ages bolster an already strong prediction for that relationship type in some cases.
I’m hoping to see more people sharing their cM values in the future and I’ll be sure to save them. Also, you could test this out yourself. It’d be great to look at dozens of matches who share between 2,120 and 2,130 cM. You’d see that about 40% of those matches would have a grandparent/grandchild relationship. More realistically, you could test your own known relationships with this tool.
Given the accuracy of the above relationship predictor, it’s pretty surprising to see other ones being used in ways that they weren’t designed for. The probabilities for the below tool come from AncestryDNA simulations. For that reason, it doesn’t work well for 23andMe data. And yet you can frequently see people using it like in the below case.
The above DNA match is not a full-sibling. But “full-sibling” gets the highest probability by far in the output of the relationship predictor shown. The problem is that this relationship predictor wasn’t designed to be used for 23andMe data. For parent/child or full-sibling relationships, it’s very important to look at the label that the testing company gives you. They assign full-sibling labels based on fully-identical regions (FIR). Full-siblings share 25% FIR, on average, while half-siblings share 0%, provided that there’s no additional relationship. If there were enough shared FIR DNA to be full-siblings, 23andMe would’ve labeled it as such.
The above relationship predictor was designed for AncestryDNA only, which reports cM in half-identical regions (HIR), while 23andMe reports it as FIR + HIR. It does have a percentage input box, but that’s only to be used in a last resort. The percentage input box in the above tool assumes that everyone has about 7,440 cM in their genome. That’s because the calculations include two copies of X-DNA for each DNA tester, even though males only have one copy. Even when comparing the DNA of two females, X-DNA should not be included when entering the cM value into a predictor built from autosomal only simulations. Luckily there are other options. There’s now a relationship predictor that has different input boxes for different companies. Additionally, there’s a percentage box that can be used for any companies not listed, as percentages are universal across sites.
These are the accurate relationship predictions for 2,341 cM at 23andMe. There is no 82% probability that this match is a full-sibling. In fact, you can even ignore the 0.3% probability of a full-sibling directly above. If the match were a full-sibling, they would’ve been labeled as such. I believe in this case that the match was actually a paternal half-sibling. The person was asking about the relationship prediction that they were a full-sibling, and I can see why they were confused by it. If people use the more accurate relationship predictor, then there will be far less confusion. Over half a year after we’ve had more accurate relationship prediction available to us, it’s just silly to use AncestryDNA simulations to try to tell the difference between full-siblings and half-siblings at 23andMe.
Here’s one more example. Data from AncestryDNA simulations can’t be used to predict full-siblings at 23andMe. That’s because, as mentioned above, 23andMe includes the total amount of IBD sharing while AncestryDNA only reports half-identical regions. Here’s a perfectly normal full-sibling match at 23andMe entered into a probability calculator made from AncestryDNA simulations.
And below are the true probabilities for a 3,384 cM match at 23andMe.
There’s another great benefit to using the relationship predictor at dna-sci.com: it comes with a whole suite of unique tools:
- The only relationship predictor that gives probabilities based on the Are Your Parents Related tool at GEDmatch
- The only predictor known to exclude population weights for cases when you think you already know how the match is related to you
All of these tools have come out since April of 2021. Has relationship prediction drastically improved in the past year? That’s a rhetorical question.
Feel free to ask a question or leave a comment. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. Or, try a tool that lets you find the amount of an ancestor’s DNA you cover when combining multiple kits. I also have some older articles that are only on Medium.