This is a guest post by DM_Walsh
I specialise in hard-to-crack unknown parentage UK cases where there is enough DNA evidence, but only just. (Some of these cases take months to resolve, and so apologies if I can’t take yours on.)
I thought it would be helpful to pull together some charts showing clusters where the top usable match was considerably less than 100 centiMorgans (cMs). Sometimes these clusters solved the case, other times, they were great supporting evidence.
The examples I’m giving are free, as far as I know, from “founder effect” or localised endogamy.
Nobody else tested?
This is my ancestor, Benjamin, with one of his 50 grandchildren, born 1880s. But how many of these grandchildren have family who DNA tested? Four! For a long time, at AncestryDNA, it was just me, five generations down. Good luck reconstructing Benjamin’s DNA from my few cMs!
Perhaps this explains why you can sweep back five generations in a family line and find tundra – no other descendants testing. You’ll see plenty of that in this British-based article.
In the examples below, working with low cMs, we are turning off the hypothesising feature at Jonny Perl’s DNA Painter site, and looking to identify the right line by our own eyesight and experience. Let’s begin.
Finding connections on each line in unknown parentage work
I like to link the searcher to their unknown father’s four grandparents where possible. In the above example, we have matches at ~55 cM against three of the birth father’s grandparents. These are low numbers, right?
We will walk through cases where the top matches in each cluster are lower still. Sometime these matches will be needed as supplementary evidence in an unknown father case: maybe they are the only evidence.
OK this is ridiculous, how on earth can I pin down the searcher’s line?
Here is our first example of a low-match cluster. Do we just throw it out and await something better?
A great surname helps us hugely with unknown parentage searches. In a few clicks, we can see that several matches have the Purl surname on their tree (marked P). Let’s allow ourselves to assume the cluster is genuine. Which route would you favour as the searcher’s likely line?
- Purl brother #3? Notice they are (mostly) in the United States. I downweighted these matches as a greater percentage of the population have tested than in the UK. And, furthermore, I’d expect a greater number of matches if this was the ancestral line.
- Purl brother #2? We put our finger in the wind. His descendant doesn’t belong to the rest of the cluster. Not convinced about this line at all.
- Purl brother #1? Needing to begin our search somewhere, line #1 got my attention, and turned out to be correct. We dug around for geographic clues as well.
With the randomness of DNA, and so few matches, we can’t expect a computer to be able to make firm predictions from low cMs. From a tree we can start to make our own conjectures. We will simply have to “put our paws into the breeze”.
Finding the right branch quickly
Here I’m searching for deaths on Ancestry trees which occurred in Somerset, England (for example), where the deceased’s mother had the last name Purl.
Believe it or not it worked. It took me straight to the right branch of Purls that had left their native area and migrated to where we needed them to be. Foolishly I didn’t pursue this and ultimately relied on another small cluster to close the case.
A surname that kept appearing
In this chart we have low matches again; and not that many of them.
With only three matches in the tiny blue cluster, we are now dipping well below 20 cM. As luck would have it the two matches without trees had ‘Starts’ as a maiden name or mother’s maiden name. This information was enough to solve an unknown parentage case.
Notice we have a lot of males in a row on the chart on some lines. This will yield fewer instances of chromosome crossover per generation (on average) in the paternal line, and might account for segments of DNA ‘surviving’ several male generations in a row.
The orange matches emerged through ThruLines at a much later date. But to emphasise, it was the tiny blue cluster, three people with a top match 31 cM, that resolved the unknown parentage.
With such a woefully small cluster, and the paucity and inaccuracy of many associated online trees, I’m certain that this search could not yet have been automated.
How far down the clusters must I go?
On this case we are again trying to find clues about the paternal side. There is a single close match who will prove to be descended from a great-grandparent through an illegitimate route.
It will transpire after 18 months’ work, that we must go down to the eighth cluster on the paternal side. That’s 160 positions from the top of the match list, quite far down when you’re looking at British DNA. This search would have been impossible without AncestryDNA’s ‘by parent’ algorithm. The algorithm did get it wrong on cluster 10, but due to its overall positive performance here, there are no hard feelings.
There are tools to automate the Leeds clustering process online, for example a good list is at DNA Painter. This may not be necessary if you have close matches, but for those of us scraping the barrel for clues, an automated clustering tool could help.
The eighth cluster of the case
The eighth cluster was a long time in appearing, 18 months as stated above. It’s not the most impressive chart of the bunch – frankly bijou – but it led us home.
Once we knew the surname of the MRCA, there was enough information to find the intersects with two other clusters. Luckily, a son of the MRCA moved to the expected geographical area, so we didn’t need to look too hard. I suspect that the two original brothers were half-kin.
Once again a 31 cM match (in the absence of anything better) turns out to be the closest documented relative. Make sure you smile at your 31 cM matches. They could just be changing someone’s life.
A wrong turn
This chart looks very convincing for a few seconds. Should I be looking at descendants of George and Sarah to resolve my tree?
In the UK, unknown parentage puzzles depend on properly clustering (starting at about 400 cM), and on finding common ancestors. The two ‘Smart’ matches have nothing in common with each other.
I had abandoned common sense and started combing through matches’ trees in the hopes of spotting repeated names of people’s ancestors.
George Smart, having married a Mould, looked like good ancestral material. There was even a descendant of his wife’s sister in the match list.
But the whole ‘connection’ was a load of smoke and mirrors. This cost me a day, before I went back to the clustering.
Clustering below 20 cMs
Why on earth would I wish to cluster below 20 cMs?
Actually, I am a big advocate of clustering below 20 cMs. Yes, there are false matches, tree errors, non-paternal events, people being related through multiple genealogical routes, and events happening outside a genealogical timeframe.
However, if I want these searches themselves to be concluded within ‘a genealogical timeframe’, I need to buckle in, and head below the waterline to those sub-20 cM matches.
The big operators won’t facilitate you clustering below 20 cM, or even 25 cM. (As to whether the 20 cM cut-off for shared matching should be increased or decreased, I am an advocate for ‘leave it as it is’.)
In the above diagram, we have six lovely matches, but no clues emerging. Several in this group have questions in their own ancestry.
Look what happens when we add in the matches below 20 cM into our cluster diagram.
I tend to stop at 11 cM, but why not take the elevator down to 8 cM if you have the time or resources.
The three mini-clusters are revealed as being one healthy-looking cluster. We are not looking at large number of matches. (That would suggest a ‘pile up region’ where the group shares a single tiny segment, that is often of little use genealogically.)
Did the fuller cluster diagram and accompanying tree help with our case? Let’s review.
Making headway with a huge tree
Thanks to clustering below 20 cM and engaging with the matches, we were able to build up the following tree.
This tree was exceptionally daunting. Lots of descendants have simply not tested. Sadly, none of the second generation spouses could be linked up to known matches. What next?
Here are some notes:
- The 38 cM match is comparatively high, and it required iron nerve to ignore the number. Yes it’s offputtingly high for a 4C1R position, but within range, so nothing useful can be concluded.
- The 16 cM match stalks their corner of the tree like a powerful chess piece. There was no way the searcher could be among their close family, and still be 16 cM to this match. That was useful.
- Six targeted letters were sent to descendants of Sister #3. Replies were very slim on the ground. Nobody actively volunteered to get involved in the project.
- Matches provided information about cMs shared, which helped us pin down some NPEs and introduce us to a 0 cM match. Ancestry Circles formerly provided such information.
- The matches at fewest removes from the MRCAs had the most matches within the cluster. Logic suggests that might be the case, but logic is not always our friend.
Ultimately this was a very fun cluster, all the more so as it was so well hidden, deep below 20 cM. But the following factors snarled up its potential usefulness:
- three generations marrying young, back-to-back (giving too many generations and spousal marriages to chase between Sister #3 and Searcher)
- some very small family units in recent years (giving few close matches)
- some very large family units in the Victorian era (giving an overwhelming number of lines to pursue)
- no huge clues based on geography
- no proven spousal connection for two generations in a row
- a general disinterest in DNA testing among this population (harrumph!)
This conspired to prevent us from solving the unknown parentage case with this cluster. Though when we came to study the intersects, the relevant people in the tree leapt about a foot off the page. The case was solved, but no thanks to this cluster. It was the easily-missed eighth cluster (above) that brought us into land.
Are you really a half-sibling?
Often a half-sibling agrees to test, to cement together the information found, and hopefully to resolve the unknown parentage case. At this point, the number of cMs for half-siblings can vary hugely.
Perhaps the most hairy was 1385 cMs with 35 shared segments. The family were very nervous as this was not the expected relationship between these two individuals, previously thought to be much more distant. Could they be a half-sibling? Clustering seemed to suggest so.
By running SegcM, it’s apparent that paternal half-sibling was second on the top of the list at 23.1%. (And the top option of paternal full-blood nephew could be eliminated through studying the clusters.) It’s much harder to measure against different relationships elsewhere. Sure enough, a third half-sibling tested, whose test confirmed the relationships.
Guest blogger DM_Walsh is the author of Being a Search Genie in the UK (2022), available in paperback, or wherever you get your e-books. He also has his own blog here. Opinions expressed in guest blogs belong to the author and do not necessarily reflect the opinions of DNA-Sci.
DNA-Sci — advancing the science of relationship predictions. Please also submit data to this new DNA match survey that will greatly help improve and build new relationship prediction tools. You can also find mobile apps. for relationship predictions in the Apple Store and on Google Play. Feel free to ask a question or leave a comment. You might also like this tool to visualize how much DNA full-siblings share. DNA-Sci is also the original home of DNA coverage calculations.