We can now easily differentiate between 3/4 siblings and fullsiblings over 94% of the time
Click here to use the only relationship predictor that includes 3/4 siblings and double 1st cousins.
Threequarter siblings have been confounding genetic genealogists for years. The problem is that their shared DNA overlaps significantly with other sibling types. And, while there are methods to tell the difference between 3/4 siblings and halfsiblings, the same can’t always be done for fullsiblings.
But first it’s important to understand exactly what a 3/4 sibling is and what types there are.
Here’s a common scenario that has resulted in 3/4 siblings: After a man’s wife passes away, he marries her sister. If both women had a child with that man, those children are 3/4 siblings to each other. It can also happen where 3/4 siblings share a mother and the fathers are fullsiblings. There are four other types of 3/4 siblings. All six are listed below. You have a 3/4 sibling if …

 Your father had a child with your maternal aunt
 Your mother had a child with your paternal uncle
 Your father had a child with your maternal halfsister
 Your mother had a child with your paternal halfbrother
 Your father had a child with your maternal grandmother
 Your mother had a child with your paternal grandfather
In none of these scenarios are the parents related to each other. This is a type of double relationship without pedigree collapse. I’ve already listed averages and ranges for each type of 3/4 sibling here.
A common refrain among genetic genealogists is that 3/4 siblings can share DNA anywhere between halfsiblings and fullsiblings. That is far from true. 3/4 siblings have their own range, and while it is between the other two, a 3/4 sibling won’t have shared DNA as low as the minimum for halfsiblings or the maximum for fullsiblings.
When discussing 3/4 siblings, I usually mention that halfidentical regions (HIR) are the worst way to differentiate between siblings types. Unfortunately, this is the default metric reported at every company except 23andMe, and it’s the only metric available in centiMorgans (cM) reported at AncestryDNA and MyHeritage. Fullyidentical regions (FIR) are a much better tool for this task. While halfsiblings share zero FIR, 3/4 siblings share 12.5%, on average, and fullsiblings share 25%, on average. This is what’s normally reported in scientific journals. It consists of both the HIR and FIR segments. Since the usual solution to see FIR is to upload to GEDmatch and since cM are different at each site, GEDmatch numbers will be used in this article unless otherwise noted. A third and final metric that will be discussed here is identical by descent (IBD) sharing.
Above I claimed that I’ve found a way to tell the difference between 3/4 siblings and fullsiblings over 94% of the time. Let’s see how well these different metrics do.
I’m going to start by finding how well the HIR metric predicts sibling types. I used a dataset of 500,000 fullsiblings and 500,000 3/4 siblings. Each were labeled with the true relationship. Then I guessed at a value of HIR cM that would split them as equally as possible. I thought that it would be between 32.8% and 36%, since that’s where the lower limit of the 95% confidence interval for fullsiblings is near the upper limit for the 95% confidence interval for 3/4 siblings. Anything over the limit would get a predicted label of “fullsibling” and anything under the limit would get a predicted label of “3/4 sibling.” I would then count how many of the predicted labels matched the true labels and divide by one million data points. After guessing at a value for the limit, I would try other values and see if the fraction of correct predictions improved or got worse, continuing in the direction of improvement until finding the best value. The best HIR predictions came from a cutoff of 2,476.0 cM. Using this value, one will predict the right sibling relationship 89.74% of the time, which is surprisingly good. This value roughly splits the zone of overlap shown in Figure 1.
Figure 1. HIR cM vs. FIR cM for fullsiblings and 3/4 siblings. The red line shows the best HIR cutoff value to use in order to differentiate between fullsiblings and 3/4 siblings.
Next up is the FIR metric. You can see FIR at 23andMe, or at GEDmatch by clicking the “Show only FullMatch (FIR) segments” checkbox in the Onetoone comparison tool. The methodology was the same as for HIR, only I anticipated a value between 16.4% and 20% FIR. The best prediction was made at 645.1 cM, which appears to split the overlap zone horizontally in Figure 2. Guessing that 645.1 cM or greater is a fullsibling match and that a lower value is a 3/4 sibling match will be correct a whopping 94.39% of the time.
Figure 2. HIR cM vs. FIR cM for fullsiblings and 3/4 siblings. The red line shows the best FIR cutoff value to use in order to differentiate between fullsiblings and 3/4 siblings.
It’s hard to imagine an improvement on the last method. But, then, Figures 1 & 2 show that a diagonal line would probably be a better predictor than a horizontal line. It turns out that IBD sharing does provide us with more accurate results. A cutoff value of 3,126.2 cM will predict the right relationship 94.49% of the time. That line is drawn as a diagonal in Figure 3, since IBD amounts can be calculated by simply adding HIR and FIR values.
Figure 3. HIR cM vs. FIR cM for fullsiblings and 3/4 siblings. The red line shows the best IBD cutoff value to use in order to differentiate between fullsiblings and 3/4 siblings.
It seems as though a curved line—maybe one with a backwards “s” shape—would make an even better prediction. But I’m going to save that for another day. There is one more method I’d like to explore, though. Kmeans is a powerful machine learning technique used for clustering. I ran a kmeans program on all one million data points. It turns out that it does only slightly better. The Kmeans algorithm predicts the correct sibling type 94.56% of the time. This is a slight improvement over using an IBD cutoff, but it probably isn’t worth it for a person to do a kmeans cluster on one million data points and then check to see which cluster a match falls into.
Figure 4. HIR cM vs. FIR cM for fullsiblings and 3/4 siblings. Data points have been clustered by the kmeans algorithm. Since the labels are predicted, not all labels are correct.
In case anyone finds this helpful, these are the coordinates of the centroids (cluster centers) in Figure 4:

 FullSibling: (HIR, FIR) = (37.72%, 25.56%) = (2,706.0 cM, 917.0 cM)
 3/4 Sibling: (HIR, FIR) = (31.4%, 12.7%) = (2,251.9 cM, 453.8 cM)
Another method that has been used to differentiate between the two relationships is a log likelihood ratio. That and the kmeans clustering method might not be accessible to most genetic genealogists. However, the cutoff values and the tools used here are very accessible.
Table 1 below summarizes the best cutoff value to use along with the accuracy achieved for each method.
Table 1. Comparison of all three metrics discussed above along with the accuracy for each method. †Please remove any XDNA from 23andMe data before doing the analysis described here as XDNA only confounds relationship prediction. *FIR percentages are reported as a proportion of one copy of the genome, unlike HIR percentages. To convert from FIR percentage to cM at GEDmatch, for example, one would have to multiply 17.985% by 7,174 cM, divide by 100%, and also divide by 2.
Getting predictions with over 94% accuracy is great, but if a value is very close to the cutoff then there’s a lot less certainty. For this reason, I always recommend plugging these values into the multiple cousin relationship predictor to see exactly how likely the options are. The cutoff values shown above are good for a quick check—maybe some people will even memorize that 2,399 cM is the cutoff for AncestryDNA and will get an idea of what to expect before using the relationship predictor. And hopefully you have an IBD value close to one of the means, as shown in Figure 5.
Figure 5. Results displayed in the multiple cousin relationship predictor when the IBD value equals the average for the most likely relationship: fullsibling on the left and 3/4 sibling on the right.
And now we know that choosing the most likely option each time will result in over 94% accuracy in differentiating between fullsiblings and 3/4 siblings!
If you had access to the most accurate relationship predictor, would you use it? Feel free to ask a question or leave a comment. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peerreviewed standard deviations. Or, try a calculator that lets you find the amount of an ancestor’s DNA you have when combining multiple kits. I also have some older articles that are only on Medium.
Recent Comments