We can now easily differentiate between 3/4 siblings and full-siblings over 94% of the time
Update 27 Apr. 2023: Please note that there are no restrictions on sharing screenshots of any articles from this site, including this one. The only restriction applies to submitting data from relationship predictors to surveys that compare group probabilities of one tool to individual probabilities of another tool. The reason is that group probabilities will always be higher than the individual relationships that make up the groups, as explained in this article and as apparent when using this tool.
Three-quarter siblings have been confounding genetic genealogists for years. The problem is that their shared DNA overlaps significantly with other sibling types. And, while you can usually tell the difference between 3/4 siblings and half-siblings pretty easily, it was thought to be very difficult for full-siblings. Until now.
But first it’s important to understand exactly what a 3/4 sibling is and what types there are.
Here’s a common scenario that has resulted in 3/4 siblings: After a man’s wife passes away, he marries her sister. If both women had a child with that man, those children are 3/4 siblings to each other. It can also happen where 3/4 siblings share a mother and the fathers are full-siblings. Here are the ways in which you could end up with a 3/4 sibling:
- Your father had a child with your maternal aunt
- Your mother had a child with your paternal uncle
- Your father had a child with your maternal half-sister
- Your father had a child with your maternal grandmother (opposite perspective of the immediately above type)
- Your mother had a child with your paternal half-brother
- Your mother had a child with your paternal grandfather (opposite perspective of the immediately above type)
In none of these scenarios are the parents related to each other. This is a type of double relationship without pedigree collapse. I’ve already listed averages and ranges for 3/4 siblings here.
A common refrain among genetic genealogists is that 3/4 siblings can share DNA anywhere between half-siblings and full-siblings. That is far from true. 3/4 siblings have their own range, and while it is between the other two, a 3/4 sibling won’t have shared DNA as low as the minimum for half-siblings or the maximum for full-siblings.
When discussing 3/4 siblings, I usually mention that half-identical regions (HIR) are the worst way to differentiate between siblings types. Unfortunately, this is the default metric reported at every company except 23andMe, and it’s the only metric available in centiMorgans (cM) reported at AncestryDNA and MyHeritage. Fully-identical regions (FIR) are a much better tool for this task. While half-siblings share zero FIR, 3/4 siblings share 12.5%, on average, and full-siblings share 25%, on average. This is what’s normally reported in scientific journals. It consists of both the HIR and FIR segments. Since the usual solution to see FIR is to upload to GEDmatch and since cM are different at each site, GEDmatch numbers will be used in this article unless otherwise noted. A third and final metric that will be discussed here is identical by descent (IBD) sharing.
Above I claimed that I’ve found a way to tell the difference between 3/4 siblings and full-siblings over 94% of the time. Let’s see how well these different metrics do.
I’m going to start by finding how well the HIR metric predicts sibling types. I used a dataset of 500,000 full-siblings and 500,000 3/4 siblings. Each were labeled with the true relationship. Then I guessed at a value of HIR cM that would split them as equally as possible. I thought that it would be between 32.8% and 36%, since that’s where the lower limit of the 95% confidence interval for full-siblings is near the upper limit for the 95% confidence interval for 3/4 siblings. Anything over the limit would get a predicted label of “full-sibling” and anything under the limit would get a predicted label of “3/4 sibling.” I would then count how many of the predicted labels matched the true labels and divide by one million data points. After guessing at a value for the limit, I would try other values and see if the fraction of correct predictions improved or got worse, continuing in the direction of improvement until finding the best value. The best HIR predictions came from a cutoff of 2,476.0 cM. Using this value, one will predict the right sibling relationship 89.74% of the time, which is surprisingly good. This value roughly splits the zone of overlap shown in Figure 1.
Figure 1. HIR cM vs. FIR cM for full-siblings and 3/4 siblings. The red line shows the best HIR cutoff value to use in order to differentiate between full-siblings and 3/4 siblings.
Next up is the FIR metric. You can see FIR at 23andMe, or at GEDmatch by clicking the “Show only Full-Match (FIR) segments” checkbox in the One-to-one comparison tool. The methodology was the same as for HIR, only I anticipated a value between 16.4% and 20% FIR. The best prediction was made at 645.1 cM, which appears to split the overlap zone horizontally in Figure 2. Guessing that 645.1 cM or greater is a full-sibling match and that a lower value is a 3/4 sibling match will be correct a whopping 94.39% of the time.
Figure 2. HIR cM vs. FIR cM for full-siblings and 3/4 siblings. The red line shows the best FIR cutoff value to use in order to differentiate between full-siblings and 3/4 siblings.
It’s hard to imagine an improvement on the last method. But, then, Figures 1 & 2 show that a diagonal line would probably be a better predictor than a horizontal line, which could be obtained by using a total IBD cutoff rather than HIR or FIR. It turns out that IBD sharing does provide us with more accurate results. A cutoff value of 3,126 cM will predict the right relationship 94.49% of the time. That IBD amount is shown by a diagonal line in Figure 3.
Figure 3. HIR cM vs. FIR cM for full-siblings and 3/4 siblings. The red line shows the best IBD cutoff value to use in order to differentiate between full-siblings and 3/4 siblings.
It seems as though a curved line—maybe one with a backwards “s” shape—would make an even better prediction. But I’m going to save that for another day. There is one more method I’d like to explore, though. K-means is a powerful machine learning technique used for clustering. I ran a k-means program on all one million data points. It turns out that it does only slightly better. The K-means algorithm predicts the correct sibling type 94.56% of the time. This is a slight improvement over using an IBD cutoff, but it probably isn’t worth it for a person to do a k-means cluster on one million data points and then check to see which cluster a match falls into.
Figure 4. HIR cM vs. FIR cM for full-siblings and 3/4 siblings. Data points have been clustered by the k-means algorithm. Since the labels are predicted, not all labels are correct.
In case anyone finds this helpful, these are the coordinates of the centroids (cluster centers) in Figure 4:
- Full-Sibling: (HIR, FIR) = (37.72%, 25.56%) = (2,706.0 cM, 917.0 cM)
- 3/4 Sibling: (HIR, FIR) = (31.4%, 12.7%) = (2,251.9 cM, 453.8 cM)
Another method that has been used to differentiate between the two relationships is a log likelihood ratio. That and the k-means clustering method might not be accessible to most genetic genealogists. However, the cutoff values and the tools used here are very accessible.
Table 1 below summarizes the best cutoff value to use along with the accuracy achieved for each method.
Table 1. Comparison of all three metrics discussed above along with the accuracy for each method. †Please remove any X-DNA from 23andMe data before doing the analysis described here as X-DNA only confounds relationship prediction. *FIR percentages are reported as a proportion of one copy of the genome, unlike HIR percentages. To convert from FIR percentage to cM at GEDmatch, for example, one would have to multiply 17.985% by 7,174 cM, divide by 100%, and also divide by 2.
Getting predictions with over 94% accuracy is great, but if a value is very close to the cutoff then there’s a lot less certainty. For this reason, I always recommend plugging these values into a double cousin relationship predictor to see exactly how likely the options are. The cutoff values shown above are good for a quick check—maybe some people will even memorize that 2,399 cM is the cutoff for AncestryDNA and will get an idea of what to expect before using the relationship predictor. And hopefully you have an IBD value close to one of the means, as shown in Figure 5. (These screenshots were taken from an older version of the double cousin predictor.)
Figure 5. Results displayed in the multiple cousin relationship predictor when the IBD value equals the average for the most likely relationship: full-sibling on the left and 3/4 sibling on the right.
If you share anything near the average with your 3/4 or full sibling, you’re going to get a very strong prediction for the correct relationship. That’s relieving. We see that relationship predictors really can tell the difference. And we’ve learned that choosing the most likely option each time will result in over 94% accuracy in differentiating between full-siblings and 3/4 siblings!
DNA-Sci — advancing the science of relationship predictions. Feel free to ask a question or leave a comment. And make sure to check out these ranges of shared X-DNA, shared atDNA percentages, and shared atDNA centiMorgans. Would you like help visualizing how much DNA full-sibling share? Or, try a tool that lets you find the amount of an ancestor’s DNA you cover when combining multiple kits. I also have some older articles that are only on Medium.