A year ago I wrote an article about the best way, theoretically, to use DNA kits from multiple siblings. How does it hold up to testing?

In Part 1 I described what you would need to do in order to use all of the information found in multiple siblings’ kits. It requires a chromosome browser. So if you want to use this method, you’ll have to be on platforms other than AncestryDNA. I’ve also now added to this series Part 3: Testing the methods with empirical data.

If you have a DNA match and you want to know how you’re related, you’re probably going to plug in the total shared centiMorgans (cMs) into a relationship predictor and look at the relationship possibilities. But what if you also have access to a DNA kit from one of your siblings? The conventional wisdom has been that “twinning” is best. This involves multiplying the individual probabilities from both matches to the cousin. People have also guessed that you could average the two total cM values or that you could enter both separately and only look at relationships that show up for each one. Let’s see how these methods compare to each other.

I simulated 1,000 trials of two children, their father, and his paternal paternal 1st cousin. The reason I chose to use a father’s patrilineal cousins throughout this analysis is that there’s more variability in those relationships. If a particular method can perform well in this case, then it will always perform well. I saved the amount of shared DNA for each child with the father’s 1st cousin (1C) as different columns in a new row for each trial into an output file. The father’s shared DNA with his 1C went into another column. I added several columns to this file using different methods to approximate the shared DNA of either the children or of the father.

Then, for each of the children in each of the trials I wrote a program that looks up the probability that this match to one of the siblings is a 4th degree* (Group 4) relationship, which includes 1st cousins once removed (1C1R). I also looked up the probability that the cousin and the father are in Group 3, which includes 1C.

There were three additional methods that required looking up the probabilities for Group 3: Averaging the total cM from both siblings and multiplying the amount by two, adding up distinct segments only and multiply that total cM by 100/75, and using a multiple calculated from the amount shared between the siblings to find out how much of the father’s DNA the two siblings actually have. (Yes, you can totally do that.)

Two other methods used the Group 4 probabilities rather than approximating the amount of DNA the father’s shares with the 1C. One is the average of the total cM shared between each sibling and the 1C. The other is the probability that you get from “twinning,” by multiplying the two probabilities from siblings for Group 4. I normalized that probability relative to not predicting Group 4 (1 – the Group 4 probability). All of these probabilities were saved in a dataframe.

Finally, I took the difference between the probability from each of three methods and the probability of the father and the match being 1C. Then I averaged that difference across all trials for the three different methods. The resulting number, in percentage points, tells you how far off, on average, a relationship prediction will be for each method. The results can be found below.

comparing several different methods of using the kits of two siblings to correctly predict their father's relationship to his paternal paternal first cousin

Table 1. Methods of combining two siblings’ kits and their results in the case when a DNA match is actually their father’s paternal paternal 1C. The methods are ordered from worst-performing at the top to best-performing at the bottom. Measurement units are differences, in percentage points, from the 1C group probability based on the father’s and 1C’s kits. Methods that require using the siblings’ kits as a proxy for the father, and thus use 1C group probabilities, are highlighted in purple. Labels with white highlighting indicate methods that use the probabilities for the 1C1R group.

The top-performing method was the one that was predicted to be the best a year ago: summing the distinct segments and then using a multiple based on the actual amount of the father’s DNA that the children have, when combined. Methods that approximate the father’s shared DNA with the cousin appear to perform better. Counting up the distinct segments and multiplying by 100/75 was the next best. You might think that this would be difficult to do, but I describe in this article how Jonnys Perl’s tools make it very easy. They’re designed to find the distinct segments you share with a match and will even calculate the total for you.

Averaging the total cM shared for each sibling and assuming that that was the total cM shared by the father was better than over half of the methods. Using the kit of one child acts as a baseline to compare other methods and is the only option when neither the parent (nor other ancestors) nor other siblings have tested. Worthwhile methods must consistently beat this baseline.

Counting up the distinct segments from both siblings’ kits and using that as a proxy for the father’s shared cM with his 1C had the highest difference in this case, but we’ll need to see how it performs with more siblings tested. First, to help illustrate how this metric of comparison works, let’s look at an example where the method was off by a near-average amount.

If you just want to see how all of the methods compare for different scenarios, you can skip ahead to the next major section.

Methodology

Amount shared and probabilities

There was one case when the top-performing method was off by 11.3%, which was the same as the average for all 1,000 trials. In this case, the first sibling (Sib1) shared 5.20% with their 1C1R, Sib2 shared 5.07%, and their father shared 9.81% with the match, who’s his 1C. The amount of shared DNA for each method as well as the probabilities of picking the correct relationship are given in Table 2.

One example of a case where the top-performing metric was off by almost exactly an average amount. Testing twinning and other methods.

Table 2. One example of how the differences in Table 1 were calculated. This is a case where the top-performing method (the sum of distinct segments times 100/66.4) was off by a near-average amount. For this table, n = 1. Please don’t take these numbers seriously; as this table just shows you how to find the difference in prediction probability for one row out of the 1,000 trials that I actually conducted.

Table 2 shows the values that are needed to calculate the differences in Table 1. The values here are from just one of the 1,000 trials conducted for the case of two siblings and a father’s 1C. Methods highlighted in white show the probability of predicting Group 4 (including 1C1R). Purple highlighting indicate methods that show the probability that a match is in Group 3 (including 1C). Rows in grey are only used to calculate values for the method in the bottom row, which is the top-performing method.

How to apply the methodology

In the Table 2 case the combined, but distinct, DNA that the siblings shared with the cousin was 7.56%. Multiplying by 100/75 gives 10.1%, which is closer to the father’s 9.81% than the raw combined amount, but overshoots a bit. (Note in Table 1 that this method normally doesn’t overestimate or underestimate by much, as it was second best in the test of 1,000 cases above.) In this case the siblings shared 25.9% of their father’s DNA with each other, rather than the expected 25%. This means that, when combined, the siblings only have 100% – 25.9% = 74.1% of their father’s DNA, rather than the expected 75%. The additional information results in a better multiple than 100/75, which means that we should use 100/74.1 instead if we can manage to find the 25.9% value above.

Let’s see what these values would look like in a relationship predictor.

Relationship predictioon for the best-performing method in only one of 1,000 cases, but with an average difference from the father’s prediction.

Figure 1. The relationship predictions you would see for the best-performing method in an average case for two children of a father and his 1C.

Figure 1 shows the relationship predictions for the best-performing method in Table 1, which approximates the amount of DNA that the father would share with the cousin based on two children’s kits. The top prediction is “1st cousin,” which is the actual relationship between the father and the cousin, so it predicted correctly. Note that this is only one case out of 1,000, though, and shouldn’t be taken too seriously. This is only an example of one case to show how one would perform these methods. It’s the averages in Table 1 that are important.

The difference between the method’s probabilities and actual probabilities

The next step in the process is to find the difference between the probability from each method and the 84.1% actual probability that the father’s match was in Group 3. For Sib1, this is 55.4% – 51.8% = 3.6 percentage points, which is a whole lot better than the 26.7 percentage point average difference that Table 1 shows after doing this 1,000 times. The average of the two kits resulted in a 53.4% chance of predicting Group 4 in this one case. That results in a difference of 1.6 percentage points, which is much better than the average difference of 14.1. The average times two as a proxy for the father’s kit normally performs much better than the average as a proxy for the children’s kits. However, in this case, the average times two is off by 13.2 percentage points, over twice the value 6.10 in Table 1. The sum of the distinct segments with no multiple performs very poorly with only two siblings and their 1C1R. In this case it was off by 45.8 percentage points, but it will get much better as we add kits and increase the degree of relationship. The distinct segments times the theoretical multiple (100/75) was only off by 8.2 percentage points in this trial, which is a little better than usual. This one example was chosen such that the top-performing metric would be off by an average amount, which was 11.3 percentage points.

After conducting this experiment 1,000 times, we get the average values found in Table 1. We then have a clear idea of which methods perform best.

This concludes the methodology portion of the article.

Back to the analysis

So far we’ve only covered the case for when a father has a 1st cousin and when two of his children have had their DNA tested. We’ve also used that as a test case to demonstrate how all of the metrics are calculated. We don’t know if these methods will rank the same for different cases, so let’s see what happens when adding more siblings’ kits for a father’s 1C.

comparing several different methods of using the kits of three siblings to correctly predict their father's relationship to his paternal paternal first cousin

Table 3. Methods of combining three siblings’ kits and their results in the case when a DNA match is actually their father’s paternal paternal 1C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

Table 3 shows that the methods that approximate the shared DNA between the father and the cousin all perform better than the methods that approximate the children’s shared DNA with the cousin. This trend will generally hold for the rest of this analysis. What that means is that we should be putting a little thought into how we take advantage of multiple siblings’ kits. So far, people haven’t been getting much out of multiple siblings’ kits. The most popular method is the worst-performing one.

comparing several different methods of using the kits of four siblings to correctly predict their father's relationship to his paternal paternal first cousin

Table 4. Methods of combining four siblings’ kits and their results in the case when a DNA match is actually their father’s paternal paternal 1C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

The best-performing method isn’t used with four or more children’s kits. It’s an interesting phenomenon of mathematical set theory that calculating the “universal set” of three kits is pretty easy, while four or more is exceedingly difficult. I explain more in part one of this article. Even if you figured out which terms to add and subtract, you wouldn’t want to then go and find all of those values.

Twinning again performs very poorly when four children have DNA kits. The only method it beats is the baseline. While the average difference between twinning and the father’s shared DNA with his cousin is very high, twinning performs better on the median than it does on the average. So twinning will often not be off by too much, but sometimes twinning produces a result wildly different than the parent’s shared DNA with their cousin. That makes twinning an unsafe method to use. Still, while twinning performs better in the median than the average, there’s only one other method that it beats: taking the average of the siblings’ kits, which is incidentally the second-most popular method after twinning.

comparing several different methods of using the kits of five siblings to correctly predict their father's relationship to his paternal paternal first cousin

Table 5. Methods of combining five siblings’ kits and their results in the case when a DNA match is actually their father’s paternal paternal 1C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

With the addition of siblings’ kits, twinning continues to perform the worst, on average. It again only beats one other method on the median.

Along with Tables 1-5 I’m including a graphical representation of how the methods compare. This will allow people to more easily and quickly understand how each method compares to the rest.

comparing several different methods of using the kits of one to five siblings to correctly predict their father's relationship to his paternal paternal first cousin

Figure 2. Methods of combining one to five siblings’ kits, including oneself, and their results in the case when a DNA match is actually their father’s paternal paternal 1C. Methods, measurements, units, and highlighting are the same as in Table 1. The dark grey line serves as a baseline and is the expected value of the difference between the predictive power of one child’s kit vs. having the father’s DNA tested. A method should be able to beat this value. Lower values are better.

Interestingly, the worst-performing method for only two children’s kits appears to be the best method as the number of children’s kits is increased above five and is considerably easier than the top-performing metric in the cases of 2-3 tested children. Twinning, which is the commonly recommended method, is generally the worst-performing method.

Five tested children is probably enough for us to understand the case of 1C. Now let’s move on to more distant cousinships and see if these methods rank any differently.

comparing several different methods of using the kits of two siblings to correctly predict their father's relationship to his patrilineal second cousin

Table 6. Methods of combining two siblings’ kits and their results in the case when a DNA match is actually their father’s patrilineal 2C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

We now have the most popular method, twinning, performing worse than the baseline for 2C of the father. Twinning at least beat having no siblings tested for 1Cs of the father. But since the vast majority of our matches are more distant than 1C, this means that twinning is exceptionally bad. It would be better to not have any siblings tested than to use the twinning method. The ranks of methods are the same when 3-4 children’s kits are used.

comparing several different methods of using the kits of three siblings to correctly predict their father's relationship to his patrilineal second cousin

Table 7. Methods of combining three siblings’ kits and their results in the case when a DNA match is actually their father’s patrilineal 2C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

comparing several different methods of using the kits of three siblings to correctly predict their father's relationship to his patrilineal second cousin

Table 8. Methods of combining four siblings’ kits and their results in the case when a DNA match is actually their father’s patrilineal 2C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

comparing several different methods of using the kits of one to four siblings to correctly predict their father's relationship to his patrilineal second cousin

Figure 3. Methods of combining one to four siblings’ kits, including oneself, and their results in the case when a DNA match is actually their father’s patrilineal 2C. Methods, measurements, and units are the same as in Figure 2. The dark grey line serves as a baseline and is the expected value of the difference between the predictive power of one child’s kit vs. having the father’s DNA tested. A method should be able to beat this value. Lower values are better.

Figure 3 shows that, when a match is 2C or more distant, combined distinct segments with no multiple starts out at less of a disadvantage for two kits and gets much better with three or more kits. But, at this genealogical distance, using a multiple is still better. And, at least for the theoretical multiple, it’s very easy to apply it in order to get a better result.

Surprisingly, twinning gets worse with the addition of more kits at this genealogical distance.

comparing several different methods of using the kits of two siblings to correctly predict their father's relationship to his patrilineal third cousin

Table 9. Methods of combining two siblings’ kits and their results in the case when a DNA match is actually their father’s patrilineal 3C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

comparing several different methods of using the kits of three siblings to correctly predict their father's relationship to his patrilineal third cousin

Table 10. Methods of combining three siblings’ kits and their results in the case when a DNA match is actually their father’s patrilineal 3C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

comparing several different methods of using the kits of four siblings to correctly predict their father's relationship to his patrilineal third cousin

Table 11. Methods of combining four siblings’ kits and their results in the case when a DNA match is actually their father’s patrilineal 3C. Methods, measurements, units, and highlighting are the same as in Table 1. Lower values are better.

comparing several different methods of using the kits of one to four siblings to correctly predict their father's relationship to his patrilineal third cousin

Figure 4. Methods of combining one to four siblings’ kits, including oneself, and their results in the case when a DNA match is actually their father’s patrilineal 3C. Methods, measurements, and units are the same as in Figures 2-3. The dark grey line serves as a baseline and is the expected value of the difference between the predictive power of one child’s kit vs. having the father’s DNA tested. A method should be able to beat this value. Lower values are better.

Combined distinct segments with no multiple appears to be the best method when a match is a parent’s 3C or more distant. Unfortunately, we don’t know our genealogical relationship to these matches, as finding that out is the purpose of our analysis. So what do we do? A recursive operation is possible, whereby you start with one method and adjust which one you use based on these charts and what relationship the first method suggests. But, even if you don’t do that, using one of the top three methods that take advantage of combined distinct segments is going to get you a very accurate result. And multiplying by 100/87.5 for three kits is such a small number that it won’t affect your results very much anyway.

The methods have ranked the same for different numbers of kits whether we’ve been analyzing 1st, 2nd, or 3rd cousins. Now let’s see how these methods compare while holding the number of siblings constant and plotting across those three degrees of cousinship.

comparing several different methods of using the kits of two siblings to correctly predict their father's relationship to his first to third cousin

Figure 5. Methods of combining two siblings’ kits, including oneself, and their results in the case when a DNA match is their father’s patrilineal 1st to 3rd cousin. Methods, measurements, and units are the same as in Figures 2-4. The dark grey line serves as a baseline and is the expected value of the difference between the predictive power of one child’s kit vs. having the father’s DNA tested. A method should be able to beat this value. Lower values are better.

Figure 5 shows the same trends that we’ve seen throughout this analysis, with certain methods always performing better. Many of the methods seem to stall in their performance around 2nd cousins. The twinning methods even performs worse than for 1st cousins. It would be interesting, for that reason, to see how the methods perform for a father’s 1C1R and 2C1R.

It looks as if all methods perform great in Figure 5. You can see that even the baseline does much better with increasing degree of cousinship. But that’s meaningless in an analysis where we’re trying to find out how much better we can do by including siblings’ kits. The baseline doesn’t include additional siblings’ kits. The reason the baseline appears to do better to the right of the graph is that relationship prediction probabilities are lower with increasing degree of cousinship. That’s because, at lower cM values, more relationships are possible, which means that more relationships have to share the relative probabilities. So it would be useful to see the performance of these methods relative to the baseline, i.e. with a horizontal baseline, which would show the differences better to the right of the graph. Alas, I’m going to save that for another day.

comparing several different methods of using the kits of three siblings to correctly predict their father's relationship to his first to third cousin

Figure 6. Methods of combining three siblings’ kits, including oneself, and their results in the case when a DNA match is their father’s patrilineal 1st to 3rd cousin. Methods, measurements, and units are the same as in Figures 2-5. The dark grey line serves as a baseline and is the expected value of the difference between the predictive power of one child’s kit vs. having the father’s DNA tested. A method should be able to beat this value. Lower values are better.

We continue to see the same trends. The “combined from distinct segments” method with no multiple performs poorly for a father’s 1C, but much better for more distant relationships.

comparing several different methods of using the kits of four siblings to correctly predict their father's relationship to his first to third cousin

Figure 7. Methods of combining four siblings’ kits, including oneself, and their results in the case when a DNA match is their father’s patrilineal 1st to 3rd cousin. Methods, measurements, and units are the same as in Figures 2-6. The dark grey line serves as a baseline and is the expected value of the difference between the predictive power of one child’s kit vs. having the father’s DNA tested. A method should be able to beat this value. Lower values are better.

The shapes of the curves for four siblings’ kits are very similar to those for three siblings’ kits. Each method performs a little bit better with four siblings’ kits. And the “combined from distinct segments” method with no multiple performs well regardless of degree of cousinship, whereas it didn’t perform well for three siblings and a father’s 1C. But the ranks of the methods in Figure 7 are the same as for Figure 6.

I don’t think most people have been using the best methods to use multiple siblings’ kits. In fact, the methods that they’re using are giving them worse results than if they didn’t have any siblings tested. Leah Larkin recommends twinning and recently said that the best performing methods are “inventing data for the [parent].” What Larkin failed to understand is that twinning is not only also inventing probabilities, but it’s doing so incorrectly. The methods that I developed a year ago perform far better than others. And there’s at least one person who has used them. Here was his reply to me on a group post after trying my method:

There is no doubt in my mind that you have helped me solve this mystery. I can now put all three DNA matches into my tree and I have solved who was adopted out and why. Thank you. You deserve [an applause].

But that’s a dataset of one. Far more important is the analysis above, where you can see which methods perform the best. Adding up the distinct segments that siblings share with a cousin and then multiplying that by a simple fraction is much more accurate than the methods currently being used.

I hope you get a chance to take advantage of these methods. It would be an especially good idea to reconsider any results you’ve gotten from twinning. I know that the above subject matter is complicated and that you might find it confusing. I recommend concentrating on the graphs and keeping in mind that lower numbers are better. But if you have any questions, I’m glad to answer them!

The next article in this series is Part 3: Testing the methods with empirical data.

If you had access to the most accurate relationship predictor, would you use it? Feel free to ask a question or leave a comment. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. Or, try a tool that lets you find the amount of an ancestor’s DNA you cover when combining multiple kits. I also have some older articles that are only on Medium.

*I’m using the same convention to categorize relationships that is found in the scientific literature, eg. here. When concepts are easy to understand, such as this one, I’d like to see genetic genealogist adopting the same language as scientists. These conventions are already established, so there’s often no need to invent new ones. Research articles intuitively call relationships that share 50% (whether always or on average) “1st order relationships.” Each new degree of relationship has half the average shared DNA. So 2nd order relationships share 25%, on average. And it’s 12.5% for 3rd order, 6.25% for 4th order, etc.