A Small Segment Round-Up

If you aren’t already a member of the coolest Facebook group ever, Genetic Genealogy Tips & Techniques, you really should be! We have a friendly and engaging environment, and everyone learns something new every day!

This post is meant to answer a question or issue that is raised almost daily in the group, and that is the issue of small shared DNA segments. Although these small segments are alluring, they are the mythological sirens of the genealogical world!

Small Segments Executive Summary

Here’s a bite-sized summary of the content below:

  • Many to most small segments (at least 7 cM and smaller) are FALSE, meaning they are NOT actually shared by the two matches, and therefore do NOT indicate shared ancestry;
  • This is supported by a 2014 paper by 23andMe scientists showing that at least 33% of 5 cM phased DNA segments are false-positive (and it’s much worse for unphased segments or segments smaller than 5 cM);
  • This is further supported by evidence that anywhere from 20-35% of distant matches at a testing company are not shared with either tested parent;
  • This is further supported by evidence that phasing your DNA with two tested parents significantly reduces the number of matches below 10 cM (with proportionally more matches reduced as the segment size gets smaller);
  • There is currently no evidence that triangulating segments or finding a paper trail provides a mechanism for distinguishing between false segments and valid segments;
  • Since we can’t tell the difference between false small segments and valid small segments, we must avoid these small segments to avoid poisoning our genealogical conclusions with false data; and
  • Beware any research or conclusion that uses these small segments without specifically addressing the issues that are known – based on all the scientific research and evidence gathered to date – to surround small segments.

If you’re interested in learning more, keep reading!

Small Segments In Detail

One of the most common questions in the group has to do with small segments. There’s no exact definition of “small” when it comes to small segments, but many of us define them as being a single segment of DNA of 7 cM or smaller. Others use 5 cM or smaller, while others use 10 cM or smaller. Personally, I consider segments of 7 cM or less to be “small,” although when I’m being very conservative I use a definition of 10 cM or smaller.

The issue of small segments often arises due to GEDmatch, where we can mine our matches for these small segments. Although GEDmatch sets most of its matching thresholds at 7 cM (meaning you have to share a segment of at least 7 cM with a match to be considered “a match”), we can sometimes lower that threshold.

For example, here’s a One-to-One comparison of two people that I have no reason to believe are related:

Sure enough, no DNA shared using the default 7 cM threshold. But what happens when I lower the threshold to say 2 cM (and the SNP threshold to 300 SNPs per segment):

Now they share 9 segments! If I continue to lower the cM and SNP thresholds, I can see even more shared segments. Try it for yourself: my GEDmatch Kit # is A812216. How many segments do we share when you lower the thresholds? It’s almost impossible NOT to find shared segments if you lower the SNP and cM thresholds!

Does this mean that D.C. and L.W. are related? Not from this data, no. We can make absolutely NO conclusion with this data, since we don’t know whether any of these shared segments are valid (not to mention that they’d be ancient segments, but let’s take one issue at a time).

Here’s our hypothesis about small segments, based on ALL the available science:

Many small segments are FALSE, meaning they are not actually shared by the two matches, and therefore do not indicate shared ancestry.

Why do I say that? Well, let’s look at this a bit more. And below there are a bunch of additional posts if you’re interested in reading more (which, as a genetic genealogist, you should be!).

SCIENCE!

Some of the best science we have on this subject came from researchers at 23andMe, from a paper they published in 2014. In this paper, the researchers found that more than 67% of phased DNA segments shorter than 4 cM are false-positive segments! At least 60% of 4cM phased DNA segments were false-positive, and at least 33% of 5 cM phased DNA segments were false-positive. The paper is available online for free (http://mbe.oxfordjournals.org/content/31/8/2212).

I note that much more research is needed in this area. However, this paper from 23andMe is currently the best peer-reviewed research that deals with these small segments directly. In the meantime, many of us in the genealogical community have done studies ourselves to look at the phenomenon of small segments, and we’ve found some concerning issues that lead us to ignore these segments. For example, I did the following studies and published the results here on the blog:

Many people have repeated these studies (see the links below).

Unfortunately, despite the published research and these duplicated studies, many people continue to use small segments.

Pseudosegments

Often, these small segments are “pseudosegments,” created by the matching algorithm weaving back and forth between unphased parental chromosomes, as shown in the following diagram. If the DNA were phased (“Phased Results”), then there would be no match to the Compared Segment.

Phasing significantly reduces false matching, although as the 23andMe paper shows, even many phased segments appear to be false matches.

Poison Segments

I commonly refer to small segments as being “poison,” since they potentially poison our genealogical research. If we know that a significant percentage of small segments are false (and it is likely MOST small segments when we’re talking about small, unphased segments below 7 cM), and we don’t have any discernible way of deciphering between valid small segments and false small segments, we must consider all small segments to be poison.

To illustrate this, I use a “poison M&M” analogy:

 

If you’ve picked one of the M&Ms in the bowl, click HERE to see if you were poisoned!

Unfortunately, false segments are not labeled red; they “look” just like every other segment.

But, But, I Have a Paper Trail!

There have been many – very good – suggestions about how to filter out false segments and use the valid small segments.

The Verified Paper Trail – For example, one suggestion is that a small segment is more likely to be real if there’s a verified paper trail showing a common ancestor. Unfortunately, there’s no support for this hypothesis. Additionally, since these small segments are likely to be many generations and many hundreds of years old, having a recent verified paper trail isn’t very meaningful. Indeed, it becomes increasing difficult to reliably show that two people aren’t related once you go back many generations and many hundreds of years old, especially since most trees get significant holes once you go back a few generations.

Shared with a Close Relative – Another suggestion is that a small segment is more likely to be real if it is shared with a close relative. Unfortunately, there’s no support for this hypothesis. It’s entirely possible that the segment is a false segment for your relative for the same reason(s) it’s a false segment for you. Or, that it might be a valid segment for you and for your cousin, but it’s a pseudosegment for your shared match. One enormous problem with small segments is that it can be a pseudosegment for anyone that reportedly shares the segment.

Triangulation (Sharing with Distant Relatives) – Another suggestion is that a small segment is more likely to be real if it is triangulated with two or more distant matches. Once again, there’s no support for this hypothesis. Ann Turner recently showed in an experiment that triangulation increases the likelihood that a segment will survive phasing, although it’s impossible to know without phasing which of these triangulated segments will survive phasing. Additionally, the 23andMe paper showed that even phased segments can be false.

The Future

I do believe that there is hope for small segments. New tests and new methodologies may be able to identify which small segments are valid and which are false. Phasing, for example, is very good at eliminating many false segments, which helps ‘concentrate’ valid segments.

In the meantime, we must be careful to avoid small segments. If you see someone using small segments without specifically addressing the issues raised herein, be extremely cautious with their research and their conclusions.

Reading More

Here are some links to read much, much more about the dangers of small segments:

 

 

 

 

 

8 Responses

  1. Kathleen Reed 29 December 2017 / 8:56 pm

    Blaine,
    I really appreciate this post. For the past couple of weeks, I have been reading everything I can on the validity of small segments in relation to a geographical study I was working on. It had become abundantly clear that I could not rely on small segments. This post CLEARLY outlined the reasons why. Thanks for this.

  2. Claire Woods 31 December 2017 / 2:04 am

    Hi Blaine,
    I really appreciate all your articles and this one is no exception. I realise that my question which follows is not on the topic of this article. However,what can one do in terms of phasing if both parents are deceased.
    Thanks

  3. dermot balson 7 January 2018 / 5:17 am

    Blaine,

    thank you for a great article, explained simply and clearly

    dermot

  4. Cathy 8 January 2018 / 10:59 pm

    Hi Blaine,

    Does it mean anything if the cm’s are under 7, but the SNP’s are extremely high (4000/5000)? I have noticed this a lot on Chr. #6. Thank you!

Leave a Reply

Your email address will not be published. Required fields are marked *