You can now get relationship predictions from number of segments and total cMs. This tool often tells you the exact relationship for close family members, including paternal or maternal.
People often ask a question similar to this one: If my ethnicity report is showing 5% Iberian, how far back should I expect to find an ancestor of mine who was Iberian? There’s a mathematical way to estimate the answer to that question. I’ll discuss that below and show some of the results I’ve calculated.
The first thing to note, because people will definitely point it out if I don’t, is that all of that Iberian didn’t necessarily come from just one ancestor. In that case, you might have a particular ancestor n generations back who was 75% Iberian and another ancestor in that generation who was 10% Iberian. There could even be more than two and their percentages of Iberian could be anything from 0% to 100%, although if there were only two, both of whom were 100% Iberian, and they’re one of your ancestor pairs, then you should be looking one generation more recent. Their child, who is one of your ancestors, also would have been 100% Iberian. Another thing to point out is that a small percentage such as 0.1% of a certain ethnicity is very likely false.
There’s one more caveat, and this is one I actually want to mention. There are issues with ethnicity reports that have been widely discussed. Many people call them inaccurate and that’s probably a fair argument. They also say that they’ll get better, but that’s a pretty misleading statement. They will get better at categorizing the DNA that you have in your genome, but the DNA that you have in your genome will always be abysmally bad at predicting your ancestors’ ethnicity. My favorite thing to say about ethnicity reports is that I’m missing 31/32 of the DNA from my ancestors who were born in the early 1800s. They may have had a lot of different ethnicities on segments that I didn’t get. Chances are that that’s true for the vast majority of DNA testers. The only way it isn’t true for a person is if all of the ancestors from a generation in question had 100% of their own ethnicity, which is exceedingly unlikely, and even then it would have been impossible for the descendant who tested to get the same amount of DNA from each of them.
In genetic genealogy, we enjoy the number of matches that come into databases because of people who are interested in seeing an ethnicity report, so please don’t be discouraged from getting your DNA genotyped. We think that you’ll really come to enjoy the field, or at least get an unhealthy addiction to it.
I’m going to ignore the possibility that a particular ethnicity came from more than one ancestor in a given generation because it’s good to first answer a question by using the simplest case. Here’s an equation I developed back in 2015. This was the first time I ever combined mathematics and genetic genealogy. Where n is the number of generations back from you and perc. is the percentage of a given ethnicity in your report,
So, if you’re wondering how many generations back it is for 25%, the percentages cancel out. Once simplified, you’re taking the natural log of 4 and dividing it by the natural log of 2 in that case. The table below shows what we get if we apply this equation to a bunch of different percentages that you might find in your ethnicity report.
Table 1. The number of generations back you’d expect to find a given ancestor based on the percentage of a given ethnicity given to you in an ancestry report, all after making the likely poor assumption that all of this ethnicity came from only one ancestor.
As you can see, the case of 25% of a given ethnicity gives us exactly the number of generations that we’d expect. It’s two generations ago, i.e. one of your four grandparents, who each gave you 25% of your DNA, on average. Obviously, an ancestor can’t be a decimal number of generations away from you. In those cases, the best we can do is round the number to the nearest whole integer. If that doesn’t lead to an ancestor you’re looking for, you could try rounding it in the opposite direction.
Please note that the results in Table 1 are also true for any two ancestors in a given generation if their percentages of the ethnicity in question add up to 100%. The number of generations will be the same. For example, if you have an ancestor 7 generations back who was 50% Iberian and another ancestor from that same generation who was 50% Iberian, you’d expect your report to tell you that you’re about 0.8% Iberian, just like in Table 1. It works the same if one is 25% and another is 75%, or for any other percentages that add up to 100%. I think that that makes the above equation pretty useful for those who are interested in ethnicity reports.
Of course we could also calculate the expected percentage of ethnicity based on a given number of generations. Table 2 shows those values for whole integer numbers of generations.
Table 2. The expected percentage of ethnicity you’d get from one ancestor who had 100% of that ethnicity and is a whole integer number of generations back from you. The first five rows include enough decimal points to show the exact average for the respective generation.
The way to calculate the values in Table 2 is shown as the first equation below. In fact, this is how I developed the equation used for Table 1. So, starting with the equation used for Table 2, each right arrow points to the next step until the equation for Table 1 is reached.
I hope you find these results useful. Hopefully you can now calculate the number of generations for a given percentage in your ethnicity results. There are online calculators that include natural logs, as well as the ones on your phone and computer, or if you still have a physical calculator you could use that.
DNA-Sci — advancing the science of relationship predictions. Feel free to ask a question or leave a comment. And make sure to check out these ranges of shared X-DNA, shared atDNA percentages, and shared atDNA centiMorgans. Try a tool to visualize how much DNA full-siblings share? DNA-Sci is also the original home of DNA coverage calculations.
First of all, great post! I love the math your using to ilustrate this. I have a question.
1. Where does endogamy factor into this? For example, the Romani are highly endogamous, and they have 20 to 30% south asian “ethnicity” in most dna companies, yet the migration out of india happened more then 1k YA.
Hi Gerald,
Thanks for your comment! It seems in that case that a Romani person couldn’t use these formulas, especially since Romani heritage appears to be reported as a combination of other ethnicities. But I think there might be some simpler cases in which we can use a formula regardless of endogamy. For example, if a person had a grandparent who was 100% Ashkenazi and their other grandparents were completely different ethnicities, the expected percentage in the tester’s report might be 25% Ashkenazi. Or some people have 100% Ashkenazi for their own ethnicity report.
If there’s a certain “signature” for an endogamous group, i.e. people who are fully of that heritage always have a combination of two or more ethnicities with a fixed proportion, we might be able to do some similar calculations. For example, if Visigoths were always of 50% Iberian ethnicity and 50% French ethnicity, then a report that shows 12.5% Iberian and 12.5% French might suggest one grandparent who was a Visigoth. But, of course you might be able to get X and Y from other ways, depending on what those actually are.
You’re right, though: If a person’s ethnicity is Romani and it’s reported as 25% South Asian, that doesn’t indicate a South Asian grandparent.
I took the ancestry DNA test and have 12% Southern Chinese ethnicity. My cousin has 26% of the same. This is traced to our paternal grandfather. The records indicate that at least four generations were born in the UK. It seems that the 26% for my cousin is a high score for Southern Chinese ethnicity for someone whose records indicate their ancestors bak to the mid-1700s were from the UK. I am a novice at this but does’t that seems unusual? Thanks!
Hi Holly,
You cousins results indicate that the most likely scenario is that one grandparent was Southern Chinese, assuming that all of the ethnicity came from one ancestor in a particular generation and that that ancestor had 100% of that ethnicity. If instead the ethnicity came from one generation farther back, the most likely scenario would be that it was two great-grandparents. If this cousin is your first cousin, you’d expect the source of your ethnicity to be the same: one grandparent, two great-grandparents, etc. But your 12% is more indicative of one great-grandparent. Are you saying that your paternal grandfather is suspected to have a high percentage of Southern Chinese but that his ancestors had been in the UK for four generations? It wouldn’t be unreasonable if the family were Southern Chinese. Sometimes people continue to marry into the same ethnicity despite living elsewhere.
Thanks for you response. I was wrong about one thing: the person with 26% southern Chinese ethnicity was my uncle (father’s brother). Mine is 12% and my cousins are all close to that. You are saying that it is likely that one of my uncle’s grandparents (or two great grandparents) were from China, correct? The birth records for both grandparents’ ancestors indicate UK births back to the late 1700s. UK births go back to my great-great grandparents and my uncles’s great grandparents. So that data is out of sync with this model. I’d love to know if you have any further thoughts about these percentages. I think I need to focus on the earliest ancestors of my grandfather because of what he looked like and since his name which is Spanish. I think I need to look for the birth certificates of his grandparents to see if their parents are named (my great great-great). People weren’t hopping planes back then but I know that Britain and Spain were trade rivals in the South China Sea for centuries. Thanks!
Actually I skipped a generation – sorry. Its’ confusing since they named their kids after mother and father. The UK births go back another generation. So my uncle’s great great grandparents were born in the UK according to birth records.
My DNA is 32% Scottish mostly from my mother and smaller amount from my father
Also 16% Irish all from my mother
My brother is 53% Scottish and 13% Irish all from our mother
We know ancestors on mother side were Scottish, Irish and English
I want to know with my brother 53% Scottish me 32% Scottish
We are both around 13% Irish
If it is possible our 4th great grandfather on mother side came directly from Ireland ? Thinking he was probably Scots Irish
My grandfather on mother side wrote in a letter that his great grandfather came here from Ireland
In our ancestry family tree we all have our Scottish ancestors coming here in early 1700 to 1750 ?
Thanks
Are Basques of northern Spain & southern France classified as Iberian or French ?
Hi
My Grandson had 25per cent scottish dna (he is my sons son) I did my dna and I was 33 per cent scottish which generation of my family would this come from? I am 79 years old.
Would be good to know where to start exploring.
Hi Jenny,
33% would typically be one grandparent, but it could also be three great-grandparents, five 2nd great-grandparents, eleven 3rd great-grandparents, etc. But it doesn’t have to be eleven 3rd great-grandparents. It could maybe be anywhere from eight to fourteen 3rd great-grandparents. Also, all of that assumes that each ancestor in a given generation is fully Scottish. That probably isn’t the case. When you go back to 3rd great-grandparents, you might have some ancestors who were fully Scottish and others who are just partially Scottish.
So my first cousin has 29% indigenous American and my son has zero.
I have always had a question on my paternity. Our common relationship is my cousins mother and my “father “ are brother and sister
Hi Suzanne,
If you know your cousin and your son have DNA tested and you know their ethnicity estimations, I’m thinking you probably had your DNA tested. How many cMs do you share with your cousins? That would give a much more accurate picture than ethnicity estimations.