If you aren’t already a member of the coolest Facebook group ever, Genetic Genealogy Tips & Techniques, you really should be! We have a friendly and engaging environment, and everyone learns something new every day!
This post is meant to answer a question or issue that is raised almost daily in the group, and that is the issue of small shared DNA segments. Although these small segments are alluring, they are the mythological sirens of the genealogical world!
Small Segments Executive Summary
Here’s a bite-sized summary of the content below:
- Many to most small segments (at least 7 cM and smaller) are FALSE, meaning they are NOT actually shared by the two matches, and therefore do NOT indicate shared ancestry;
- This is supported by a 2014 paper by 23andMe scientists showing that at least 33% of 5 cM phased DNA segments are false-positive (and it’s much worse for unphased segments or segments smaller than 5 cM);
- This is further supported by evidence that anywhere from 20-35% of distant matches at a testing company are not shared with either tested parent;
- This is further supported by evidence that phasing your DNA with two tested parents significantly reduces the number of matches below 10 cM (with proportionally more matches reduced as the segment size gets smaller);
- There is currently no evidence that triangulating segments or finding a paper trail provides a mechanism for distinguishing between false segments and valid segments;
- Since we can’t tell the difference between false small segments and valid small segments, we must avoid these small segments to avoid poisoning our genealogical conclusions with false data; and
- Beware any research or conclusion that uses these small segments without specifically addressing the issues that are known – based on all the scientific research and evidence gathered to date – to surround small segments.
If you’re interested in learning more, keep reading!
Small Segments In Detail
One of the most common questions in the group has to do with small segments. There’s no exact definition of “small” when it comes to small segments, but many of us define them as being a single segment of DNA of 7 cM or smaller. Others use 5 cM or smaller, while others use 10 cM or smaller. Personally, I consider segments of 7 cM or less to be “small,” although when I’m being very conservative I use a definition of 10 cM or smaller.
The issue of small segments often arises due to GEDmatch, where we can mine our matches for these small segments. Although GEDmatch sets most of its matching thresholds at 7 cM (meaning you have to share a segment of at least 7 cM with a match to be considered “a match”), we can sometimes lower that threshold.
For example, here’s a One-to-One comparison of two people that I have no reason to believe are related:
Sure enough, no DNA shared using the default 7 cM threshold. But what happens when I lower the threshold to say 2 cM (and the SNP threshold to 300 SNPs per segment):
Now they share 9 segments! If I continue to lower the cM and SNP thresholds, I can see even more shared segments. Try it for yourself: my GEDmatch Kit # is A812216. How many segments do we share when you lower the thresholds? It’s almost impossible NOT to find shared segments if you lower the SNP and cM thresholds!
Does this mean that D.C. and L.W. are related? Not from this data, no. We can make absolutely NO conclusion with this data, since we don’t know whether any of these shared segments are valid (not to mention that they’d be ancient segments, but let’s take one issue at a time).
Here’s our hypothesis about small segments, based on ALL the available science:
Many small segments are FALSE, meaning they are not actually shared by the two matches, and therefore do not indicate shared ancestry.
Why do I say that? Well, let’s look at this a bit more. And below there are a bunch of additional posts if you’re interested in reading more (which, as a genetic genealogist, you should be!).
Some of the best science we have on this subject came from researchers at 23andMe, from a paper they published in 2014. In this paper, the researchers found that more than 67% of phased DNA segments shorter than 4 cM are false-positive segments! At least 60% of 4cM phased DNA segments were false-positive, and at least 33% of 5 cM phased DNA segments were false-positive. The paper is available online for free (http://mbe.oxfordjournals.org/content/31/8/2212).
I note that much more research is needed in this area. However, this paper from 23andMe is currently the best peer-reviewed research that deals with these small segments directly. In the meantime, many of us in the genealogical community have done studies ourselves to look at the phenomenon of small segments, and we’ve found some concerning issues that lead us to ignore these segments. For example, I did the following studies and published the results here on the blog:
- I showed that as much as 30% of my matches at AncestryDNA do not match either parent (and I’ve found similar results at Family Tree DNA). This suggests that there are many false positives and/or false negatives, and I hypothesize that this is most likely due to false small segments (see “The Danger of Distant Matches“).
- I showed that phasing DNA significantly reduces the number of matches or matching segments we have, which suggests that many unphased matching segments are false (see “The Effect of Phasing on Reducing False Distant Matches (Or, Phasing a Parent Using GEDmatch)“).
Many people have repeated these studies (see the links below).
Unfortunately, despite the published research and these duplicated studies, many people continue to use small segments.
Often, these small segments are “pseudosegments,” created by the matching algorithm weaving back and forth between unphased parental chromosomes, as shown in the following diagram. If the DNA were phased (“Phased Results”), then there would be no match to the Compared Segment.
Phasing significantly reduces false matching, although as the 23andMe paper shows, even many phased segments appear to be false matches.
I commonly refer to small segments as being “poison,” since they potentially poison our genealogical research. If we know that a significant percentage of small segments are false (and it is likely MOST small segments when we’re talking about small, unphased segments below 7 cM), and we don’t have any discernible way of deciphering between valid small segments and false small segments, we must consider all small segments to be poison.
To illustrate this, I use a “poison M&M” analogy:
If you’ve picked one of the M&Ms in the bowl, click HERE to see if you were poisoned!
Unfortunately, false segments are not labeled red; they “look” just like every other segment.
But, But, I Have a Paper Trail!
There have been many – very good – suggestions about how to filter out false segments and use the valid small segments.
The Verified Paper Trail – For example, one suggestion is that a small segment is more likely to be real if there’s a verified paper trail showing a common ancestor. Unfortunately, there’s no support for this hypothesis. Additionally, since these small segments are likely to be many generations and many hundreds of years old, having a recent verified paper trail isn’t very meaningful. Indeed, it becomes increasing difficult to reliably show that two people aren’t related once you go back many generations and many hundreds of years old, especially since most trees get significant holes once you go back a few generations.
Shared with a Close Relative – Another suggestion is that a small segment is more likely to be real if it is shared with a close relative. Unfortunately, there’s no support for this hypothesis. It’s entirely possible that the segment is a false segment for your relative for the same reason(s) it’s a false segment for you. Or, that it might be a valid segment for you and for your cousin, but it’s a pseudosegment for your shared match. One enormous problem with small segments is that it can be a pseudosegment for anyone that reportedly shares the segment.
Triangulation (Sharing with Distant Relatives) – Another suggestion is that a small segment is more likely to be real if it is triangulated with two or more distant matches. Once again, there’s no support for this hypothesis. Ann Turner recently showed in an experiment that triangulation increases the likelihood that a segment will survive phasing, although it’s impossible to know without phasing which of these triangulated segments will survive phasing. Additionally, the 23andMe paper showed that even phased segments can be false.
I do believe that there is hope for small segments. New tests and new methodologies may be able to identify which small segments are valid and which are false. Phasing, for example, is very good at eliminating many false segments, which helps ‘concentrate’ valid segments.
In the meantime, we must be careful to avoid small segments. If you see someone using small segments without specifically addressing the issues raised herein, be extremely cautious with their research and their conclusions.
Here are some links to read much, much more about the dangers of small segments:
- Small Matching Segments – Friend or Foe? (2 December 2014)
- Small Matching Segments – Examining Hypotheses (8 December 2014)
- GUEST POST – What a Difference a Phase Makes (30 March 2015)
- The Danger of Distant Matches (6 January 2017)
- The Effect of Phasing on Reducing False Distant Matches (Or, Phasing a Parent Using GEDmatch) (26 July 2017)
- The Folly of Using Small Segments as Proof in Genealogical Research (CeCe Moore, 3 December 2014)
- Tracking DNA segments through time and space (Debbie Kennett, 26 April 2015)
- When is a Match a False Positive? (Ann Raymont, July 2016)
- Comparing match tallies for family members with Family Tree DNA’s Family Finder test (Debbie Kennett, 29 July 2017)
- Comparing parent and child matches at AncestryDNA (Debbie Kennett, 6 August 2017)
- ISOGG Wiki, Identical By Descent