People often ask a question similar to this one: If my ethnicity report is showing 5% Iberian, how far back should I expect to find an ancestor of mine who was Iberian? There’s a mathematical way to estimate the answer to that question. I’ll discuss that below and show some of the results I’ve calculated.

The first thing to note, because people will definitely point it out if I don’t, is that all of that Iberian didn’t necessarily come from just one ancestor. In that case, you might have a particular ancestor n generations back who was 75% Iberian and another ancestor in that generation who was 10% Iberian. There could even be more than two and their percentages of Iberian could be anything from 0% to 100%, although if there were only two, both of whom were 100% Iberian, and they’re one of your ancestor pairs, then you should be looking one generation more recent. Their child, who is one of your ancestors, also would have been 100% Iberian. Another thing to point out is that a small percentage such as 0.1% of a certain ethnicity is very likely false.

There’s one more caveat, and this is one I actually *want* to mention. There are issues with ethnicity reports that have been widely discussed. Many people call them inaccurate and that’s probably a fair argument. They also say that they’ll get better, but that’s a pretty misleading statement. They will get better at categorizing the DNA that you have in your genome, but the DNA that you have in your genome will always be abysmally bad at predicting your ancestors’ ethnicity. My favorite thing to say about ethnicity reports is that I’m missing 31/32 of the DNA from my ancestors who were born in the early 1800s. They may have had a lot of different ethnicities on segments that I didn’t get. Chances are that that’s true for the vast majority of DNA testers. The only way it isn’t true for a person is if all of the ancestors from a generation in question had 100% of their own ethnicity, which is exceedingly unlikely, and even then it would have been impossible for the descendant who tested to get the same amount of DNA from each of them.

In genetic genealogy, we enjoy the number of matches that come into databases because of people who are interested in seeing an ethnicity report, so please don’t be discouraged from getting your DNA genotyped. We think that you’ll really come to enjoy the field, or at least get an unhealthy addiction to it.

I’m going to ignore the possibility that a particular ethnicity came from more than one ancestor in a given generation because it’s good to first answer a question by using the simplest case. Here’s an equation I developed back in 2015. This was the first time I ever combined mathematics and genetic genealogy. Where n is the number of generations back from you and perc. is the percentage of a given ethnicity in your report,

So, if you’re wondering how many generations back it is for 25%, the percentages cancel out. Once simplified, you’re taking the natural log of 4 and dividing it by the natural log of 2 in that case. The table below shows what we get if we apply this equation to a bunch of different percentages that you might find in your ethnicity report.

**Table 1**. The number of generations back you’d expect to find a given ancestor based on the percentage of a given ethnicity given to you in an ancestry report, all after making the likely poor assumption that all of this ethnicity came from only one ancestor.

As you can see, the case of 25% of a given ethnicity gives us exactly the number of generations that we’d expect. It’s two generations ago, i.e. one of your four grandparents, who each gave you 25% of your DNA, on average. Obviously, an ancestor can’t be a decimal number of generations away from you. In those cases, the best we can do is round the number to the nearest whole integer. If that doesn’t lead to an ancestor you’re looking for, you could try rounding it in the opposite direction.

Please note that the results in Table 1 are also true for any two ancestors in a given generation if their percentages of the ethnicity in question add up to 100%. The number of generations will be the same. For example, if you have an ancestor 7 generations back who was 50% Iberian and another ancestor from that same generation who was 50% Iberian, you’d expect your report to tell you that you’re about 0.8% Iberian, just like in Table 1. It works the same if one is 25% and another is 75%, or for any other percentages that add up to 100%. I think that that makes the above equation pretty useful for those who are interested in ethnicity reports.

Of course we could also calculate the expected percentage of ethnicity based on a given number of generations. Table 2 shows those values for whole integer numbers of generations.

**Table 2**. The expected percentage of ethnicity you’d get from one ancestor who had 100% of that ethnicity and is a whole integer number of generations back from you. The first five rows include enough decimal points to show the exact average for the respective generation.

The way to calculate the values in Table 2 is shown as the first equation below. In fact, this is how I developed the equation used for Table 1. So, starting with the equation used for Table 2, each right arrow points to the next step until the equation for Table 1 is reached.

I hope you find these results useful. Hopefully you can now calculate the number of generations for a given percentage in your ethnicity results. There are online calculators that include natural logs, as well as the ones on your phone and computer, or if you still have a physical calculator you could use that.

*Feel free to ask me about modeling & simulation, genetic genealogy, or genealogical research. And make sure to check out these ranges of shared DNA percentages or shared centiMorgans, which are the only published values that match peer-reviewed standard deviations. That model was also used to make a very accurate relationship prediction tool. **Or, try a calculator** that lets you find the amount of an ancestor’s DNA you have when combining multiple kits. I also have some older articles that are only on Medium.*

First of all, great post! I love the math your using to ilustrate this. I have a question.

1. Where does endogamy factor into this? For example, the Romani are highly endogamous, and they have 20 to 30% south asian “ethnicity” in most dna companies, yet the migration out of india happened more then 1k YA.

Hi Gerald,

Thanks for your comment! It seems in that case that a Romani person couldn’t use these formulas, especially since Romani heritage appears to be reported as a combination of other ethnicities. But I think there might be some simpler cases in which we can use a formula regardless of endogamy. For example, if a person had a grandparent who was 100% Ashkenazi and their other grandparents were completely different ethnicities, the expected percentage in the tester’s report might be 25% Ashkenazi. Or some people have 100% Ashkenazi for their own ethnicity report.

If there’s a certain “signature” for an endogamous group, i.e. people who are fully of that heritage always have a combination of two or more ethnicities with a fixed proportion, we might be able to do some similar calculations. For example, if Visigoths were always of 50% Iberian ethnicity and 50% French ethnicity, then a report that shows 12.5% Iberian and 12.5% French might suggest one grandparent who was a Visigoth. But, of course you might be able to get X and Y from other ways, depending on what those actually are.

You’re right, though: If a person’s ethnicity is Romani and it’s reported as 25% South Asian, that doesn’t indicate a South Asian grandparent.