The Shared cM Project

For reference, here are all posts for the Shared cM Project:

Older Posts:

The portal for data submission is HERE.

TGG’s Top Posts in 2017

I started The Genetic Genealogist on February 12, 2007 with my first post, “New estimates for the arrival of the earliest Native Americans.” There were few educational resources for genetic genealogy back then, and all testing was Y-DNA and mtDNA. Although 23andMe would launch the first large-scale atDNA test a few months later in November of 2007 (see “23andMe Launches Their Personal Genome Service” announcing the $1,000 test), it would be a couple of years until they used the results for cousin matching. Today, almost 11 years later, there are 617 posts with more than 310,000 words.

Here’s a screenshot from the blog in December 2007:

This year I posted about 30 times about a wide variety of topics. Here are the most popular posts in 2017: ... Click to read more!

A Small Segment Round-Up

If you aren’t already a member of the coolest Facebook group ever, Genetic Genealogy Tips & Techniques, you really should be! We have a friendly and engaging environment, and everyone learns something new every day!

This post is meant to answer a question or issue that is raised almost daily in the group, and that is the issue of small shared DNA segments. Although these small segments are alluring, they are the mythological sirens of the genealogical world!

Small Segments Executive Summary

Here’s a bite-sized summary of the content below:

  • Many to most small segments (at least 7 cM and smaller) are FALSE, meaning they are NOT actually shared by the two matches, and therefore do NOT indicate shared ancestry;
  • This is supported by a 2014 paper by 23andMe scientists showing that at least 33% of 5 cM phased DNA segments are false-positive (and it’s much worse for unphased segments or segments smaller than 5 cM);
  • This is further supported by evidence that anywhere from 20-35% of distant matches at a testing company are not shared with either tested parent;
  • This is further supported by evidence that phasing your DNA with two tested parents significantly reduces the number of matches below 10 cM (with proportionally more matches reduced as the segment size gets smaller);
  • There is currently no evidence that triangulating segments or finding a paper trail provides a mechanism for distinguishing between false segments and valid segments;
  • Since we can’t tell the difference between false small segments and valid small segments, we must avoid these small segments to avoid poisoning our genealogical conclusions with false data; and
  • Beware any research or conclusion that uses these small segments without specifically addressing the issues that are known – based on all the scientific research and evidence gathered to date – to surround small segments.

If you’re interested in learning more, keep reading!

Small Segments In Detail

One of the most common questions in the group has to do with small segments. There’s no exact definition of “small” when it comes to small segments, but many of us define them as being a single segment of DNA of 7 cM or smaller. Others use 5 cM or smaller, while others use 10 cM or smaller. Personally, I consider segments of 7 cM or less to be “small,” although when I’m being very conservative I use a definition of 10 cM or smaller. ... Click to read more!

An X-DNA Case Study

This summer, I finally asked my maternal great-aunt (we’ll call her “Victoria”) if she was interested in testing her DNA for me. She agreed, and we sent off a sample of her DNA to Family Tree DNA for mtDNA and atDNA analysis. A few weeks later, the results were back and ready for digging in!

I found a close match that shared 201 cM (including 47 cM on the X chromosome). This close match had a tree, and I immediately found the most likely common ancestor. The shared X-DNA interested me the most, and I decided to review that a little closer.

Mom and My Great-Aunt Victoria

My mother is the daughter of Victoria’s brother. Now, that means that my mother has an exact copy of her father’s X chromosome (daughters receive a copy of their father’s single X chromosome). Any X-DNA that my mother and her aunt Victoria share, is the same X-DNA that her father and Victoria shared. ... Click to read more!

How Many Segments Do You Share?

I have told people in the past that we share a single segment of meaning IBD DNA with the vast majority of our genetic matches (where IBD means Identity-by-Descent, or a valid matching segment of DNA from a recent genealogical relationship). I usually say that we share a single segment of DNA with 99% of our matches, but that’s been an off-the-cuff estimate. I wanted to have better data to cite, so I took a closer look at this issue.

At FTDNA, you can download a list of all of your matches:

I downloaded my list and removed all of my targeted test-takers (anyone that I tested or I asked to test). These close test-takers would skew the data.

After removing them from my match list, I have a total of 2,491 matches at Family Tree DNA.

Family Tree DNA also allows you to download a list of all the segments you share with your matches: ... Click to read more!

Using Shared Matches – A Quick Example

I logged into my results at AncestryDNA today, and I had new fourth cousin match: Vivian Reese Wescott (ALL names in these screenshots are changed to protect privacy unless noted otherwise). This is a significant match to me, my 22nd closest match (not counting family members that I’ve tested). The relationship is estimated to be fourth cousin.

When I open up Vivian’s profile, I can see that she’s a new member, and likely hasn’t seen her results yet (I check frequently, but she hasn’t logged in since September 23rd). I also see that she doesn’t have a tree associated with her profile:

The first thing I would normally do is review her tree for clues as to our relationship.

Since I can’t do that, I’ll skip that step and now I’ll look to see how much DNA we share in common: ... Click to read more!

Inheritance of DNA Segments

DNA is randomly inherited. As a result, a match that shares 100 cM DNA with a parent will likely NOT share exactly 50 cM with the parent’s child; rather, there are a range of possibilities (100 cM, 50 cM, 0 cM, and everything in between, for example). On average it will be about 50%, but there is lots of room for variation.

Prompted by a great question in the Genetic Genealogy Tips & Techniques Facebook group, I used the “People who match one or both of 2 kits” tool at GEDmatch to look at the random inheritance pattern of DNA between my father and myself with regard to matches sharing about 35 cM (the examples here worked out great, but you can pick any size).

We can see the randomness of inheritance in this table. And we see a surprise (that I just discovered today with this exercise!) that reminds of the fact that matching DNA can come from BOTH parents! ... Click to read more!

Sharing Large Segments With a Match Does Not Validate Small Segments Shared With That Match

OK, that could be one of the worst blog titles I’ve written, but it’s intentional. When people share this post, I want the title to clearly convey the lesson.

Small Segments are Poison

We know that many small segments are false, and thus that many distant matches are false positives. I have written about small segments and distant matches many times. For a few background articles, see the following:

The (most current as of September 2017) definitive article on the nature of false versus true small segments is “Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis.” The paper is available online for free (http://mbe.oxfordjournals.org/content/31/8/2212). In the paper, the researchers found that more than 67% of all reported segments shorter than 4 cM are false-positive segments. At least 60% of 4cM segments were false-positive, and at least 33% of 5 cM segments were false-positive. The number of false-positives decreased fairly rapidly above 5 cM. See my analysis of this paper here. ... Click to read more!

August 2017 Update to the Shared cM Project

The Shared cM Project is a collaborative data collection and analysis project created to understand the ranges of shared centiMorgans associated with various known relationships. As of August 2017, total shared cM data for more than 25,000 known relationships has been provided. To add your data, the Submission Portal is HERE. I am always collecting data, and perhaps the next update with have 50,000 or 100,000 relationships!

This August 2017 update is the second update to the original data, released in May 2015, and includes many thousands of new submissions.

There is MUCH more about the project, including histograms and company breakdowns in the PDF download.

Figure 1. The Relationship Chart

Table 1. The Cluster Chart

Sample Histogram from the Shared cM Project (all histograms available in the PDF download): ... Click to read more!

Analyzing Segment Frequency at GEDmatch

This post was inspired by an excellent post from Lara Diamond today entitled “Long Segment–But No Close Connection“.

What is DNA Segment Frequency?

In addition to segment size, segment frequency may be another important consideration for genealogists.

There are two ways to think about segment frequency: The first measurement of segment frequency is the frequency of a DNA segment among all humans. This is a number that is currently unknown, and can’t yet reliably be estimated even with simulations; there just aren’t enough people in the world who have tested yet. This is especially true for a segment that might be found outside of people with European descent as testing of these populations has been minimal or practically non-existent. ... Click to read more!

Thinking About a BigY Test at Family Tree DNA?

What is BigY?

BigY is a Y-DNA test offered by Family Tree DNA. For a limited time only (August 2017), Family Tree DNA is offering BigY to existing customers for $395 (down from $575).

Typically, most Y-DNA tests taken by genetic genealogists are either STR or SNP tests. In contrast, BigY is a Y-DNA sequencing test available to male test-takers. The test sequences approximately 12 million base pairs of the Y chromosome, and identifies SNP results within those 12 million base pairs.

From the BigY FAQ (also see the 2014 BigY White Paper):

The Big Y product uses next-generation sequencing to reveal genetic variations across the Y chromosome:

  • Targeted Non-recombining Y-DNA sequencing.
  • Illumina HiSeq 200.
  • 55X to 80X average coverage.
  • Around 11.5 to 12.5 million base-pairs of reliably mapped positions of non-recombining Y chromosome.
  • Analyzed using Arpeggi genome analysis technology for improved variant calls.

All samples are processed in-house using our custom laboratory methods and informatics. Your sample never leaves our company and is never shared with outside vendors.. ... Click to read more!