A New Relationship Predictor for Genetic Genealogy

by DNA-Sci | Apr 6, 2021 | Blog, DNA Science | 38 comments

Relationship predictions are now available showing differences in maternal and paternal relationships, in-group differences, and accurate predictions for 23andMe data

Updates: You can now get much better predictions by entering the # of segments along with total cMs! Please also submit data to this new DNA match survey that will greatly help improve and build new relationship prediction tools.

Traditional relationship predictions (added 3 Feb. 2023)
A double cousin relationship predictor (added 9 Mar. 2022)
Relationship predictions to help validate known relatives (no population weights)
Relationship predictions for X-DNA matches (ignoring atDNA, added 29 June 2022)

Update: This relationship predictor has been incorporated at GEDmatch on 15 Dec. 2021.

As of today, you can get relationship predictions that include probabilities for sex-specific relationships, you can see the differences that are sometimes larger within groups (e.g. grandparent vs. half-sibling), and you can get accurate relationship predictions for 23andMe data for the fist time. I’ve previously published exact averages and very accurate ranges of shared DNA for many genealogical relationships, including double cousins. These data, which were also used to develop the relationship predictor, are validated by the standard deviations of Veller et al. (2019 & 2020).

Probability curves for different relationship types

The most striking thing about the figures shown here is the curve for grandparent/grandchild relationships, which features two distinct peaks. Who would’ve thought that those relationships are so different from avuncular and half-sibling relationships? Genetic genealogists have been treating them all the same. We now see that treating them as a homogenous group is an unnecessary oversimplification.

Figure 1. Probability curves for relationship types 5C1R to full-siblings at AncestryDNA. The y-axis shows the probability of each relationship type relative to all others included. All types here are sex-averaged, although the calculator gives sex-specific probabilities for half-avuncular, 1C, avuncular, half-sibling, and grandparent/grandchild relationships. 1C1R = 1st cousin, once removed; cM = centiMorgan, HIR = half-identical regions. The second cousin (2C) curve is higher because it’s the first curve to be the only one from its group (it has little competition near its center).

The first thing that came to mind when I saw the probability curves in Figure 1, other than surprise, was a discovery that I had made and written about just one week earlier. At that time, I had found that a person is actually more likely to share 22% or 28% DNA with a grandparent than 25%, despite 25% being the expected value. But it turns out that that rule isn’t the reason for the two peaks on the grandparent/grandchild curve, at least not directly. In fact, the two peaks are actually much farther apart than 22% and 28%. And the histogram for grandparent/grandchild relationships only has one peak, as shown in Figure 2.

Figure 2. Normalized histogram for 500,000 grandparent/grandchild pairs. These are the same data points that went into the probability calculator. The individuals were simulated as 250,000 paternal grandparent/grandchild pairs and 250,000 maternal grandparent/grandchild pairs, but the fractions of shared DNA for each were not differentiated when creating the histogram. For that reason, despite not being labeled as paternal or maternal, values near 0.25 on the x-axis are more likely to come from maternal grandparent/grandchild pairs and values at the far ends of the histogram are much more likely to be from paternal grandparent/grandchild pairs.

The reason for the two peaks in Figure 1 is that grandparent/grandchild relationships have far more variance than all other relationships (Veller et al., 2019 & 2020). Since this subject of relationship probabilities concerns the relative probabilities of relationship types, a gap between two curves has to be filled by one or more other relationship curves. And the largest gaps occur between the group that includes grandparents and the two groups on either side of it. The difference is even more striking when looking at IBD data such as in Figure 3. (IBD stands for identical by descent. It’s the total amount of DNA that two people are reported to share. It can be contrasted with half-identical region (HIR) sharing, which counts fully-identical regions (FIR, or IBD2) as if they are HIR). Reporting the total amount of DNA that full-siblings share moves the curve for that relationship even farther to the right of grandparent/grandchildren relationships.

Figure 3 Probability curves for relationship types 5C1R to full-siblings at 23andMe. IBD = identical by descent, which includes both HIR and FIR shared DNA. All other parameters and abbreviations are the same as in Figure 1.

Figure 3 shows a drastic increase in the height of the right-most peak for grandparent/grandchild relationships when compared to Figure 1. The probability of this relationship type peaks at 78.7% around 2,510 cM as would be reported by 23andMe. This is due to moving the full-sibling curve far to the right, from the 37.5%, on average, that would be reported by AncestryDNA to the 50%, on average, that full-siblings actually share. In contrast, half-siblings are only 12.1% likely and avuncular relationships only 3.2% likely at 2,510 cM. An added benefit of IBD sharing platforms is that half-siblings are more easily distinguished from avuncular relationships, which is very apparent from about 2,200 cM to 2,500 cM.

Is it really possible for the likelihood that you’ve found a grandparent at 2,510 cM to be that much greater than a half-sibling, aunt, or uncle? Because of how unlikely it is for half-siblings or avuncular pairs to share 2,510 cM, the answer is yes. The caveat to that is that a grandparent/grandchild might be less likely because of age or representation in the population. But, as time progresses and DNA kits remain in the database, the likelihood of finding grandparents will likely increase. You would have to weigh the probabilities against those other factors. And, of course there are other relationship types that are possible at this number of cM. It could be 3/4 siblings_{ranges, predictions}, for example, and the amount of FIR sharing should be analyzed separately in cases such as this.

Comparison to a previously used probability curve

I calculated these probabilities presumably the same way that it was done in the AncestryDNA white paper. Their probability curves from that paper have been the most widely used method of determining relationship probabilities. However, in their methodology, relationship types are lumped into groups, and sex-specific probabilities aren’t calculated.

I wasn’t sure what to expect once I developed a way to compare my model results to AncestryDNA’s model results. Very few details are given about their methods or data, including anything that could be used to validate their methods or probability results. I find that the white paper probability curves look very similar to the curves that I plotted. Since the simulation I use is validated by standard deviations from Veller et al. (2019 & 2020), this means that the AncestryDNA numbers are probably fairly good. That’s because they used a simulation. Despite my love for data, in genetic genealogy bad data is the name of the game.

Figure 4. Relationships probabilities from my simulations on the left compared to those from AncestryDNA on the right. Units are the same for both graphs. The y-axes for both graphs are on a logarithmic scale. This was done at AncestryDNA in order to show the differences in more distant relationships, which were otherwise bunched-up.

The differences for distant cousins can be accounted for by the fact that the probabilities in my dataset were calculated against other, more distant relationships that are not shown here in order to correspond to the AncestryDNA chart. The 3C1R, 4C, etc. probabilities on my graph now don’t add up to 1. They did when 4C1R, 5C, and 5C1R were included, but those are now left out. For relationship types such as the half-sibling/grandparent group, I was able to add up all of the probabilities to make one curve. I could go back and re-calculate the probabilities for 3C1R, 4C, etc. without including more distant relationships, but I think the comparison of graphs is clear as-is.

Methodology

To calculate probabilities for the new tool, 500,000 individual pairs were compared from each relationship type. Each pair will share a certain number of cM. Bins 1 cM wide were created, centered on integer values, and the number of pairs for each relationship type were counted for each bin. Those counts are then used to determine the probability of each relationship type at a given cM value. For 500,000 half-siblings, 250,000 paternal and 250,000 maternal half-sibling pairs were included. That would allow half-siblings to be equally weighted against grandparent/grandchild relationships, which share the same mean. First cousins include four different sex-specific paths, therefore each type consisted of 125,000 pairs. Sex-specific probabilities were calculated for relationships including 1st cousins and closer. Sex-specific probabilities are not as different for more distant relatives, plus the number of sex-specific paths increases exponentially (16 types of 2nd cousins), so those differences weren’t included.

The amount of shared DNA between individuals is highly variable. Smoothing of the data was very much necessary, and it was by far the hardest step of the process. Figure 5 shows how un-smooth the curves are for raw data. These curves are actually less realistic than the smoothed curves. For a given set of assumptions and parameters, even in real life, there is some definite probability for each relationship type at each cM value. It is not a fuzzy probability. If I increased the number of individual pairs for each relationship type, perhaps to one million or several million, then the probability curves wouldn’t require smoothing. Imagine trying to get an empirical database that large, which would then contain a lot of erroneous data and/or be missing a lot of data erroneously labeled as “outliers.”

Figure 5. Un-smoothed probability curves for relationship types 5C1R to full-siblings at AncestryDNA. The y-axis shows the probability of each relationship type relative to all others included. All types here are sex-averaged, although the calculator gives sex-specific probabilities for half-avuncular, 1C, avuncular, half-sibling, and grandparent/grandchild relationships.

I ensured that the smoothing didn’t flatten the curves. I only applied as much smoothing as was necessary to get the curves monotonic over the applicable ranges and then ensured that the probability values were unchanged from what would be expected if you were to draw a curved line along the center of the above probability curves. It’s easy to see in the un-smoothed graph: Grandparent/grandchild relationships are quite different from avuncular and half-sibling relationships.

Advantages of this relationship predictor

Some relationship types within a group are too different to be treated the same: Grandparent/grandchild and paternal half sibling relationships are far different from half-siblings and avuncular relationships. This calculator treats them differently. This and the next point make this calculator especially accurate for close relatives.

There are significant differences between paternal and maternal recombination rates. This results in much wider ranges of shared DNA between paternal relatives than for maternal relatives. The probability calculator used here allows for those differences.

The data for IBD probability curves, such as that for 23andMe data, come from IBD data. This is an exceedingly important point. It is not a good idea to use an AncestryDNA graph to try to distinguish between relationships at 23andMe

The data used to calculate the probabilities are from the same model and version that made the most accurate tables of shared DNA currently published.

The probabilities used in this calculator can’t be influenced by erroneous data, whether mislabeled, affected by endogamy, or potentially includes multiple unknown relationships.

There are important differences that can be seen with this tool.

For AncestryDNA data, 1,272 cM is the value at which grandparents and great-grandparents are equally likely, at about 25.6% probability each. Half-avuncular relationships are 18.6% likely, half-siblings are 11.9% likely, and avuncular relationships are 7.8% likely. This makes a total of 46.3% for the group that includes grandparents, half-siblings, and avuncular relationships and leaves 53.7% for the next group. This is similar to the 50/50 split that AncestryDNA reports, except the former values are broken down by multiple relationship types (including paternal and maternal, which aren’t shown in this example but are included in the calculator), and are validated by peer-reviewed statistics. AncestryDNA hasn’t released any kind of statistics to validate their data.

Other important notes

All probabilities are for autosomal DNA only. Please subtract any X-DNA before using the calculator. Also, I recommend subtracting any shared DNA from segments less than 7 cM that may have found their way into your total. Family Tree DNA includes very small segments in their total cM calculations.

The above probabilities assume no endogamy or other pedigree collapse. Those cases should be treated separately.

Multiple cousin relationships are not included here, but you can see the averages and ranges or use a multiple cousin relationship predictor for double 1st cousins and 3/4 siblings.

Parent/child relationships are not included here. They are easy to distinguish from other relationships, including full-siblings. Parent/child relationships consist of a half-identical match across the whole length of the genome. Full-siblings share 25% fully-identical regions, on average. Genotyping sites will take this into account in their relationship prediction. If a relationship is predicted to be parent/child, full-sibling is not a possible relationship and there is no need to analyze the shared DNA amount here.

Relationships more distant than 1C1R and half-1C are grouped together by those with the same average shared DNA. Also, half-avuncular relationships are treated the same as siblings of grandparents, which are called great- or grad-avuncular relationships. They are treated the same because the curves are the same, as are any other relationship types that share the same curve. For each curve shown in the figure at the bottom of the page, 500,000 pairs were simulated. Therefore, relative probabilities of each relationship type are based on the assumption that an equal number of each are possible in the population. While this assumption isn’t true, it’s the best way to generate probabilities. Age and other factors, such as the likelihood that your unknown great-grandparent or great-grandchild is the DNA match you’ve found, should be taken into consideration. It’s probably more likely that a 1,200 cM match is a half-avuncular relationship than a great-grandparent, despite the fact that, if they were equally likely relatives to find as DNA matches, the cM value alone suggests great-grandparent is more likely.

These probabilities are only calculated as far back as 5C1R. The huge advantage of this tool, other than the accuracy of the data, is that it treats close relatives as not being in the same group because the curves are significantly different. For distant relatives, there’s much less certainty about the genealogical relationship for your DNA matches. Matches as low as 8 cM are allowed here, however the relationship may be farther back than 5C1R. However, the relative probabilities may be accurate even at those low values. Indeed, any of the probabilities shown above are only relative to the other relationships listed, therefore they’re only meaningful in comparison to the other relationships. And there’s no cM value at 8 cM or above at which even a 4C1R is the most probable relationship. So, while the probability of an 8 cM match may be higher for “4C1R or more distant,” listing each relationship type separately would not result in more useful information. Not only are very low cM values difficult to assign to a recent ancestor, but segments of 20 cM or 30 cM may be on pile-up regions and therefore come from very distant ancestors.

Totals will not always add up to 100%. When multiple relationship types are present, the chances of rounding errors increases. I don’t believe that the totals are ever off by more than 0.2 percentage points.

This is not the first tool to show relationship probabilities based on a user input of shared DNA. Jonny Perl has done amazing work at DNA Painter, including probability calculations that can be built-in to your family tree.

Here’s a list of the relationship prediction tools now available on this site:

Predictions based on both # of segments and total cMs

The multiple cousin relationship predictor

Predictions based on the Are Your Parents Related (AYPR) tool at GEDmatch

Predictions excluding population weights for cases when you think you already know how the match is related to you

DNA-Sci — advancing the science of relationship predictions. Please also submit data to this new DNA match survey that will greatly help improve and build new relationship prediction tools. You can also find mobile apps. for relationship predictions in the Apple Store and on Google Play. Feel free to ask a question or leave a comment. You might also like this tool to visualize how much DNA full-siblings share. DNA-Sci is also the original home of DNA coverage calculations.

38 Comments

Evert-Jan Blom on April 6, 2021 at 6:54 am

Interesting stuff Brit, perhaps it can be used by DNA Painter/WATO as an alternative to the probabilities that are currently used?

I might get back to you concerning an upcoming project I am planning, your data might the right fit for that. Keep up the good work.
Reply
- Brit Nicholson on April 6, 2021 at 7:01 am
  
  Thanks! I’m glad to share the data.
  Reply
  - yourDNA.family on April 9, 2021 at 2:02 pm
    
    I second EJ’s comment and it would also help the “Your DNA family” app for a new feature that we’ve recently launched that currently only shows centiMorgan values. Using your more accurate prediction would certainly help in adding more clarity to the users as to what relationship is most likely.
    Reply
Larry Jones on August 17, 2021 at 3:13 pm

Brit, this is Brilliant. I have been factoring in AtDNA drop off but did not account for gender, although it has been showing up as a significant factor particular female to female. Id love to correspond (email attached).

Thanks EJ for pointing me to this information. 🙂
Reply
Mary Riser on December 17, 2021 at 9:34 am

Can you put this in book form so I can underline stuff and take it with me when I travel?
Reply
- Brit Nicholson on December 17, 2021 at 12:12 pm
  
  Hi Mary. Is it the pop-up with relationship predictions at GEDmatch that you’d like to have on paper? You should be able to print to a PDF or screenshot any webpage if you want a copy.
  Reply
Ted Toal on December 17, 2021 at 12:42 pm

Great stuff! I’m wondering if using the number, mean length, and length variance of shared segments would be useful to make prediction even more accurate?
Reply
- Brit Nicholson on December 18, 2021 at 11:42 pm
  
  Hi Ted. Thanks! Segment information could definitely be useful for predicting paternal and maternal sides. And the largest segment size would help with endogamy. Unfortunately, I haven’t ever kept data on segment size. But I’d be interested in studying that in the future.
  Reply
Angie on December 19, 2021 at 10:04 pm

my question is that total cM on my profile and that of my brothers shows one dna relative to be a half sibling to me but a grandchild or grandparent of my brothers that just not seem right at all. so here is the question How is that even possible?
Reply
- Brit Nicholson on December 20, 2021 at 7:17 pm
  
  Hi Angie. Half-sibling and grandparent/grandchild relationships share the same average: 25%. Aunt/uncle/niece/nephew relationships are also in the same group. So a prediction of half-sibling or grandparent/grandchild based on cM is almost always a guess at one of the possibilities. Some of the predictions at DNA testing sites often don’t make too much sense. They might usually be based on age, but if you and your brother are close in age, then I would’ve expected them to give you two the same prediction. If you got that information from my relationship prediction tool, there are almost always possibilities other than the most likely relationship. One thing that’s possible is a value so low or high that grandparent/grandchild is possible but half-sibling isn’t. For example, a match of over 2,500 cM is very unlikely to be a half-sibling or grandparent/grandchild. But if you had to choose between only those two options, half-sibling is almost impossible, making grandparent/grandchild far more likely, despite being very unlikely compared to something like 3/4 or full siblings.
  Reply
Tim Forsythe on December 30, 2021 at 5:14 pm

Brit, I have a parent/son relationship that shares 3456 cM in Ancestry, or what I calculate to be 49.73%, which seems reasonable, but the calculator generates an error for values above 46.684% HIR or 3245 cM. The DNA Painter tool does not start generating errors until we get above 50.006%. I wonder if there is a problem with the calculator?
Reply
- Brit Nicholson on December 31, 2021 at 12:27 pm
  
  Hi Tim,
  
  I think you’re talking about the predictor on my site (https://dna-sci.com/tools/brit-cim/), right? I didn’t put parent/child relationships into that one from the start for a few reasons. One reason is that I think it’s kind of silly. The same goes for full-siblings most of the time, but I’ve included them. For both relationship types, it’s very easy to see what the relationship is without using a relationship predictor. A match that’s about 50% IBD and entirely comprised of half-identical regions (HIR), i.e. one and only one copy of the entire genome, is a parent/child relationship. A match that’s about 50% IBD or 37.5% HIR, but that includes about 12.5% fully-identical regions (FIR), is a full-sibling match. While there’s some overlap between 3/4 siblings and full-siblings some of the time, the average FIR is much lower (6.25% FIR). For either parent/child or full-sibling relationships, just trust the label given at the original testing site. Or, it’s very easy to see from the One-to-One matching page. Or from the One-to-Many total cM, although self or identical twin will show the same there as for a parent/child.
  
  The reason I included full-siblings is to differentiate from 3/4 siblings, although it isn’t really needed except on the multiple cousin predictor (https://dna-sci.com/tools/multiple-cousin-cim/). But it doesn’t hurt to include full-siblings on all predictors. It had to be one or the other with regard to parent/child or full-sibling, and I think it’s better to include full-siblings. That’s for IBD predictions, because then there’s significant overlap between the two, i.e. the average for full-siblings (50%) is exactly where the parent/child relationships should be. However, for HIR relationship prediction, it’s possible to call anything higher than the range of full-siblings a parent/child relationship. That’s what I’ve done with the new GEDmatch predictions. And I may integrate that into my own relationship predictor soon. But there is no solution for the IBD predictions, which are the default for the 23andMe and percentage input boxes.
  
  The DNA Painter tool includes parent/child because it only works for AncestryDNA data, which is always HIR. So they don’t have the issue of overlap between full-siblings and parent/child. But I’ll note that IBD predictions give much more conclusive results. And I’ll also note that the DNA Painter tool is completely unusable for IBD full-siblings, and thusly unusable for 23andMe total cM or percentages for full-siblings (https://dna-sci.com/2021/11/05/has-relationship-prediction-drastically-improved/). So, for now, different predictors bring different things to the table. I’ve chosen what I deem to be the most important ones for the relationship predictors at this site, but I hope to make improvements where possible.
  
  I hope that helps.
  Reply
  - Tim Forsythe on December 31, 2021 at 2:18 pm
    
    Got it, thanks.
    Reply
Liz on March 27, 2022 at 5:51 pm

Hi, I’ve ended up here following a link from GEDmatch on the new Autokinship tool. Of the 50 Autoclusters generated, some (13) didn’t make it through to the AutoKinship analyses stage (they had no AutoKinship predictions, or fell short of the Autosegments etc).

Of the 37 that made it through, 2 had autokinship trees. One was blank and the other, cluster 21 used my (and another user’s) gedcom. With this cluster 21 Autokinship tree, the probability is said to be 1.930E … and I don’t know if that’s high or low probability? Can’t find a chart anywhere to let me know and was wondering if you covered this anywhere?
Many thanks,

[For info: the cluster 21 autokinship tree says it drew – heavily, I think – from other “Segment Clusters partially linked to cluster 21” and unfortunately the result is that my known maternal and paternal matches are combined to generate the cluster 21 tree – which may be why the probability scores low, if it scores low.]
(Parent’s are not related but this is Wales, so many of us share about 5 surnames and the tree is based around the most common of these, which is Jones!)
Reply
- Brit Nicholson on March 28, 2022 at 2:21 pm
  
  Hi Liz,
  
  I’ve found genealogy to be very difficult in Wales!
  
  For your AutoKinship trees, the probabilities can be pretty low, but it’s the most probable one that’s displayed prominently in the folder for each cluster. You can see other possible trees by opening the folder labeled “autokinshipTrees.”
  
  Dna-sci provided the probabilities for the AutoKinship tool. But the tool itself was developed by Genetic Affairs. The best place to ask questions about AutoKinship would be on the Facebook user group for Genetic Affairs or on the website contact page: https://members.geneticaffairs.com/contactus
  Reply
John Gauss on April 1, 2022 at 12:09 pm

I was interested to see that grandparental proportions are more likely to be 22%/28% than 25%/25%.

From matching to my sister and 52 people with identifiable common ancestors, I’d calculated my percentages (using SNPs rather than cM) to be 19/31, 26/24 and my sister’s 25/25, 22/28. I’d been surprised to have inherited so much more of my paternal grandmother than paternal grandfather. But my daughter shows greater divergence, with 7/22 of these two great-grandparents.
Reply
Maria Witt on August 5, 2022 at 10:16 am

I have a question I am A+ blood type my mother was O+ and my dad is O+, genetically impossible. I show that he is my father via DNA (not a paternity test) but from the Ancestry DNA and here. My paternal uncle was A type blood. Would he and my dad share enough DNA that my DAD shows up as my DAD? Does my question make sense? My uncle passed away last year so I can’t test him and both of his bio kids my cousins also passed and none of them did the Ancestry DNA nor are their kids willing. So here I am with a blood type that is impossible based on my parents. I do know I am DNA matched to my maternal family.
Reply
- Brit Nicholson on August 5, 2022 at 3:27 pm
  
  Hi Maria,
  
  I’m no expert on blood types, but I understand your conundrum. If you share about 3,475 cMs (50%) with your father, then that’s normally conclusive. A parent/child relationship is the easiest to detect and can be done more accurately than any other relationship. You would only share 25%, plus or minus about 7% with your uncle. The normal caveat to that is that, if your father had an identical twin, either one could be your father and an AncestryDNA test likely wouldn’t be able to tell you which one. I would think that your uncle being an identical twin to your father could be an explanation, but identical twins usually have the same blood type. Have you or your father received a bone marrow transplant? There’s also a very rare condition known as chimerism. Either of those two situations lead to a person having two sets of DNA in their body, with a somewhat random chance of either being picked up by a DNA test.
  Reply
  - Maria Witt on August 5, 2022 at 9:16 pm
    
    Thank you for the reply, I will keep you posted. I am going to have my sister test as she is the only one besides me and my father left of our generation. My daughter is taking Genetics this semester I may have her pick her professor’s brain about the blood typing. Unless there is a more detailed test that myself and my father could complete he is getting pretty old, 92. I was born very late in his life.
    Reply
Evan Meiskin on August 15, 2022 at 12:15 am

Hi Brit This, is my DNA with my brothers from 23 and me. When I plug in the numbers in the tool it shows that it leans towards full brother based on CM, but 23 and me are telling me it is a half-brother. they said only half identical. I’m all confused
Evan Meiskin
evan.meiskin@gmail.com
Shared DNA
30.59%
2277cM
Reply
- Brit Nicholson on August 15, 2022 at 8:47 am
  
  Hi Evan,
  
  I’m glad you asked this question. It will provide an opportunity to discuss the best ways to use the predictor, in order.
  
  The first thing to do is to make sure that you’re using the most up-to-date and accurate relationship predictor, which can be found here: https://dna-sci.com/tools/orogen-wtd/
  
  1. If you and your brother share X-DNA, which is likely if you’re maternal half-brothers, you want to use the percentage input box and enter “30.59.” Make sure to change the default from two female testers to two male testers. Recent discoveries have shown that including X-DNA helps relationship predictions: https://dna-sci.com/2022/04/27/new-option-to-include-x-dna-in-relationship-predictions/. This gives a 28.6% chance of half-siblings–significantly higher than uncle/nephew and with no probability of full-siblings.
  
  2. There’s a separate cM input box titled “23andMe cMs.” It seems as though you used the input box titled simply “cMs,” which is used for Ancestry, MyHeritage, and FTDNA. If you use either of these input boxes, make sure that the cM total you enter doesn’t include X-DNA. You can see that, while the first cM input box gives you 91.7% chance of full-siblings, entering it into the correct box gives a 23.8% chance of half-siblings and only a 0.1% chance of full-siblings. This is where other predictors fall short. Most are based only on Ancestry data, which will have a much lower cM value for full-siblings because they only report cMs for half-identical regions. And the GEDmatch predictor is only designed for kits compared at GEDmatch, not 23andMe. You can read more about the differences between metrics used at different sites here: https://dna-sci.com/2021/03/03/why-does-23andme-show-that-i-share-an-unusually-high-amount-of-dna-50-with-my-full-sibling/
  
  The prediction from 23andMe is correct. The differentiation between half-siblings and full-siblings based on fully-identical regions is very easy, so except for when two testers are 3/4 siblings or double cousins, the companies’ labels get it right.
  
  I hope that helps.
  Reply
tracy-lee on December 13, 2022 at 10:39 am

hi, my nephew has a paternal match at 235 cMs over 13 segments, my nephew is 33 and his match is 83. what is the most likely relationship probability please?
Reply
- DNA-Sci on December 13, 2022 at 11:18 am
  
  Hi Tracy-Lee,
  
  The most accurate predictions (https://dna-sci.com/tools/orogen-wtd/) put the 2nd cousin group as the most likely. That group also includes relationships such as 1C2R (1st cousin two times removed) and Half-1C1R. And then the 2nd cousin once removed group also has a decent probability.
  Reply
Cupcake on July 26, 2023 at 4:33 am

Would 633cM 9% shared be a match for a half aunt? I am 99% sure my dad and his sister shared only the same mother… different fathers…

what would be your opinion?

I also have a guy who is convinced he is part of the family but DNA says otherwise. (I have no reason to not believe him I don’t know him) he would be my dads nephew.

It shows 9cM across 2 segments <1% shared
Reply
- DNA-Sci on August 2, 2023 at 9:29 am
  
  Hello,
  
  How many segments does the 633 cM match share? It’s best to enter the total cMs and number of segments here: https://dna-sci.com/tools/segcm/
  
  A 9 cM match is far too small to be your dad’s nephew. That would be your 1st cousin and they typically share about 12.5% DNA.
  Reply
Joseph Coppeto III on August 4, 2023 at 12:53 am

I would appreciate help doing my analysis run. would going with defaults work or what should I do? I do have 23 and me data. It had shown nieces and nephews as cousins and an uncle that
“may” be a half brother” or i am not sure,half relative of some sort. It turns out my paternal line is NOT what I had thought my whole life. If I can get 2nd great or great great grandparents from 2nd/3rd cousins on paternal line I may be able to work backwards. I am in fact now awaiting ancesatry.com dna results, but found this site and too from a friend. thanks
Reply
- DNA-Sci on August 4, 2023 at 7:37 am
  
  Hi Jopseph,
  
  How many cMs and segments do you share? For now it’s best to subtract X-DNA from both of those.
  
  Then you’ll get the best predictions if you enter your number of segments along with total cMs here: https://dna-sci.com/tools/segcm/
  Reply
SlvrQn on August 19, 2023 at 12:45 am

I am new to using/understanding much of the tools to analyze the dna data and am not sure if I’m understanding the relationship predictions/probabilities but a One to Many shows a person listed at the top of my list with 3577.3 total cm & largest segment 206.6. Testing kit listed is Migration V3-M. Does this mean we are full siblings? This could not be my child for certain and near positive it would not be either my mother or father. Would a parent of mine have a twin they did not know about?
Reply
- DNA-Sci on August 19, 2023 at 4:07 pm
  
  Hello,
  
  It’s alwaybs best to enter your number of segments along with total cMs here: https://dna-sci.com/tools/segcm/
  
  GEDmatch only shows half identical regions in the One-to-Many tool. That means that full siblings will share 37.5% on average rather than 50% in that tool. These are the only relationships that will show 50% like you’re seeing: parent, child, self, or identical twin.
  
  But there could be another explanation. A mistake that people often make when they’re looking at their One-to-Many results is that they click on a kit number of one of their matches. That then takes them to their match’s One-to-Many page, but people will still think they’re look at their own matches. That’s one reason that it’s best to check all of your matches in the One-to-One tool if they’re a match of interest. Another reason is that you’ll get a more accurate cM count.
  
  If you do so and you’re sure that this is still a match of about 50% DNA, then they’re likely one of your parents or you somehow have a duplicate kit on GEDmatch, whether you uploaded it or someone without your knowledge.
  Reply
Rosa Wegener on October 14, 2023 at 12:30 am

on March 12, 2023, for a total DNA of 125 cM, the relationship calculator on GEDmatch.com calculated a probability of 14.5% for the 3C1R group, but today, October 8, 2023, it produces a probability of 6.4% for the same group with the same 125 cM input.
Please explain this change.
Reply
- DNA-Sci on October 14, 2023 at 11:02 am
  
  Hi Rosa,
  
  Earlier this year GEDmatch updated their probabilities to reflect changes I made on my site in February of 2022. At that time, I began using a data source from a peer-reviewed science journal and in September of 2022 I published an article in a science journal that describes the methodology for the predictions. The main differences will be from the amount of variance in the source data. Everything about the newer predictions will be better. You can see similar predictions in the SegcM tool, especially if you use the 23andMe input box (23andMe has a fairly similar genetic map length to GEDmatch): https://dna-sci.com/tools/segcm/
  Reply
  - rosa wegener on October 14, 2023 at 1:58 pm
    
    Thanks for your explanation. Please note that when I tried the tool you linked, I happened to count the length of each of the segments from the GEDmatch autosomal one-to-one comparison tool, and discovered that whereas GEDmatch adds them up to total 125cM (all are >7cM BTW, so switching to the 3 or 5cM filter doesn’t change the total), they actually add up to 126.9cm (which the tool then shows as 127cM). And this actually lowers the probability for the 3C1R group from 5.3 to 4.9%.
    
    PS I have some more technical, and maybe more interesting, questions, but am frustrated by the apparent lack of any way to upload an attachment that would make such issues much easier to explain and grasp.
    Reply
    - rosa wegener on October 14, 2023 at 2:06 pm
      
      PS. I forgot to mention also that just adding the number of segments shared (7) lowered the original calculated pobability from 6.4% to 5.3%
      Reply
rosa wegener on October 14, 2023 at 2:55 pm

Have just tried to submit another GEDmatch.com autosomal one-to-one comparison to the same tool, with the following parameters: total matched DNA 197cM 49 segments. The filter was set to 3 cM, only one segment exceeds 6.9cM (13.3), and only 5 exceed 4.9cM.

Submission produced the following popup:

‘dna-sci.com says’
Sorry, this is outside of the tested parameter values. Please check to make sure that you’ve entered them correctly.

I checked, and they were entered correctly. What are the “tested parameters”?
Reply
rosa wegener on October 16, 2023 at 11:10 pm

Have today entered the 125cM match mentioned above into the calculator at:

https://dna-sci.com/tools/orogen-unw/

and obtained an even lower probability of my known 3CR1 cousin relationship than any of the previous ones: 1.4%

I was uncertain whether to check IBD or the HIR box, so tried both, and neither, with the same result in every case.
Reply
rosa wegener on October 16, 2023 at 11:49 pm

After reading your recommendation of another calculator of yours above:

https://dna-sci.com/tools/orogen-wtd/

I tried it, entering first 125cM and then 127cM (GEDmatch One-to-One total), and obtained 6.4% and 6.0% probability for the 3C1R group, which seems to be the same result given by the tool which asks for both cM and number of segments. Should this be considered the more accurate result, or the 1.4% offered by the Known relatives calculator?
Reply
- DNA-Sci on October 17, 2023 at 3:47 pm
  
  Hi Rosa,
  
  SegcM is the preferred predictor: https://dna-sci.com/tools/segcm/
  
  For relationships more distant than about 3rd cousins, the number of segments won’t change the predictions much and you could use the weighted Orogen tool, but it’d be better just to always use SegcM.
  
  Whether or not you use the unweighted predictor depends on how you came across your match. If you start looking at your DNA matches sorted by highest cMs like almost everyone does, then you wouldn’t want to use an unweighted predictor. If you asked your known 3CR1 to test and then went to look at the result when their test finished processing, then you might want to use the unweighted predictor. But why use a relationship predictor when you know the relationship? The purpose is mainly to show the effect of population weights for scientific curiosity.
  Reply
  - rosa wegener on October 17, 2023 at 4:46 pm
    
    Thanks for your explanation. My supplemental question is – how can one “know” a relationship of 3C1R?
    
    In this case, I was struck by the fact that the shared DNA was more than twice as large as any other match in a field 1.4 million, if I understand the GEDmatch home page correctly, and then found that the published pedigree showed 3rd G-grandparents matching 2nd G-grandparents in a genealogy of my maternal grandfather. The only other “corroboration” was provided by a “known” (but untested) 3C1R of mine who shares the same 3rd G-grandparents, who knows the DNA match’s uncle.
    
    So now I’m wondering whether what I’ve been considering solid evidence of my genetic connection to my maternal grandfather is nothing more than a remote possibility which can never be proven, since I and my offspring are the only identified (or should I say “identifiable” – since he and his father died in 1936 and 1905, respectively?) surviving offspring of his father.
    
    Is this the correct interpretation of the situation?
    Reply