OK, that could be one of the worst blog titles I’ve written, but it’s intentional. When people share this post, I want the title to clearly convey the lesson.
Small Segments are Poison
We know that many small segments are false, and thus that many distant matches are false positives. I have written about small segments and distant matches many times. For a few background articles, see the following:
- “Small Matching Segments – Friend or Foe?“
- “The Danger of Distant Matches“
- “The Effect of Phasing on Reducing False Distant Matches (Or, Phasing a Parent Using GEDmatch)“
The (most current as of September 2017) definitive article on the nature of false versus true small segments is “Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis.” The paper is available online for free (http://mbe.oxfordjournals.org/content/31/8/2212). In the paper, the researchers found that more than 67% of all reported segments shorter than 4 cM are false-positive segments. At least 60% of 4cM segments were false-positive, and at least 33% of 5 cM segments were false-positive. The number of false-positives decreased fairly rapidly above 5 cM. See my analysis of this paper here.
Additionally, I’ve found that 32% of my matches are not shared by either parent. See more here. And see a concise summary of similar analysis and blog posts by Debbie Kennett at “Comparing parent and child matches at AncestryDNA.”
Accordingly, I use “poison M&Ms” as an analogy (copied from my own comment at WikiTree):
I consider small segments to be “poison,” in that too many of them are false matches and we can’t tell the difference between the false segments and the real segments. I use “Poison M&Ms” as an example. If I handed someone a bowl of M&Ms and told them that 30% are poisoned and there’s no visual difference (similar to 30% of small segments of 5 cM or smaller), no one would eat the M&Ms. Similarly, we can’t use small segments because they poison our genealogical conclusions.
If the best science we have suggests that a significant percentage of small segments are false, and these small segments are not labeled “FALSE” or “TRUE,” is there any way we can use these segments? Or does the fact that so many of these segments are false poison all small segments?
Can We Use Small Segments?
Unfortunately, there is a pervasive belief in the genealogical field that there are ways to use small segments. The following are the two most common hypotheses I see:
- Hypothesis #1 – Sharing one or more large segments with a match means that the small segments shared with that match are real segments; AND/OR
- Hypothesis #2 – Knowing the genealogical connection with a match means that the small segments shared with that match are real segments.
However, there is (currently as of September 2017) no evidence that these hypotheses are correct, and some evidence that they are not correct.
Using Small Segments – A Case Study
So let’s test these hypotheses with a case study. To be clear a case study is only anecdote, but we can draw some hypotheses from the case study, and perhaps it will lead to a larger study.
In this example, I am using a cluster of individuals with known genealogical relationships (see the chart below) that share both large segments and small segments. Accordingly, I am testing both Hypothesis #1 and #2.
Blaine and Mitchell are Second Cousins. They’ve both tested at AncestryDNA and have transferred to GEDmatch. Doing a default One-to-One comparison at GEDmatch (with SNP threshold of 500 and cM threshold of 7 cM), they share the following segments:
Total sharing of 253.8 cM is perfectly in the range for Second Cousins (see “August 2017 Update to the Shared cM Project“). The smallest segment is 7.4 cM.
If I lower the thresholds for the One-to-One Comparison to 250 SNPs and 3 cM, we share the following segments:
Because I lowered the thresholds, there are now 7 segments of 5 cM or smaller.
Are these valid segments? After all, there’s a known genealogical relationship, and I share many large segments with this match.
I have the advantage of being able to phase my results, as I’ve tested both my parents. When I compare my phased paternal kit to Mitchell using the same thresholds (SNP = 250 and cM = 3), I see the following segments:
With one exception, each of the small segments has disappeared. Phasing, therefore, eliminated almost all the small segments despite the known relationship and the existence of large matches.
The 4.4 cM segment that remains may or may not be a valid segment (see below), but the point is that many or most of the small segments are eliminated by phasing. Unfortunately, the vast majority of people are not using phased kits when analyzing segments, usually because they cannot do so.
Note that I also compared my maternal phased kit to Mitchell using the same thresholds, and there were no segment shared; so these are truly false segments and not maternal segments.
This analysis also overlooks the fact that small segments are likely to be very old, and therefore could have been inherited from any of a number of possible lines, which may or may not be the line we’ve identified.
The 4.4 cM Segment
The 4.4 cM segment that remained after phasing could be a true segment created by a recombination event. When I compared Fred to Mitchell, they also share the 4.4 cM segment in addition to many other segments:
Blanche, my aunt, also shares DNA in this region with Mitchell, but it is a larger segment:
And sure enough, when I compare Blanche and Fred (remember that they are siblings), it appears they may have a recombination event between them that explains why the segment Fred shares with Mitchell is so much smaller than the segment Blanche shares with Mitchell in the same location:
Using the full resolution view, we see a recombination event that occurred between 12,000,000 and 13,000,000:
I haven’t yet visually phased this chromosome to confirm that the segment leading up to 12.5M is the paternal segment, but there’s little doubt that it will prove to be.
What is important here, however, is that I took MANY additional steps to analyze this small segment, and I have a hypothesis as to why the 4.4 cM segment exists between Fred and Mitchell. This was only made possible, however, by having another tested individual with a larger segment at that location, where I could show there was recombination that created the smaller segment. If all I had was another distant cousin that shared the same 4.4 cM segment, I would not be able to make any conclusions about the small segment.
There is a common misconception that sharing one or more large segments with a match means that the small segments shared with that match are real segments, and that knowing the genealogical connection with a match means that the small segments shared with that match are real segments.
However, as we saw above, phasing eliminates many false matches. Even with a phased kit, we need additional information (such as additional closely-related test-takers) to analyze any small segments that may survive phasing.