The Shared cM Project

For reference, here are all posts for the Shared cM Project:

Older Posts:

The portal for data submission is HERE.

Sharing Large Segments With a Match Does Not Validate Small Segments Shared With That Match

OK, that could be one of the worst blog titles I’ve written, but it’s intentional. When people share this post, I want the title to clearly convey the lesson.

Small Segments are Poison

We know that many small segments are false, and thus that many distant matches are false positives. I have written about small segments and distant matches many times. For a few background articles, see the following:

The (most current as of September 2017) definitive article on the nature of false versus true small segments is “Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis.” The paper is available online for free ( In the paper, the researchers found that more than 67% of all reported segments shorter than 4 cM are false-positive segments. At least 60% of 4cM segments were false-positive, and at least 33% of 5 cM segments were false-positive. The number of false-positives decreased fairly rapidly above 5 cM. See my analysis of this paper here.

... Click to read more!

August 2017 Update to the Shared cM Project

The Shared cM Project is a collaborative data collection and analysis project created to understand the ranges of shared centiMorgans associated with various known relationships. As of August 2017, total shared cM data for more than 25,000 known relationships has been provided. To add your data, the Submission Portal is HERE. I am always collecting data, and perhaps the next update with have 50,000 or 100,000 relationships!

This August 2017 update is the second update to the original data, released in May 2015, and includes many thousands of new submissions.

There is MUCH more about the project, including histograms and company breakdowns in the PDF download.

Figure 1. The Relationship Chart

Table 1. The Cluster Chart

Sample Histogram from the Shared cM Project (all histograms available in the PDF download):

... Click to read more!

Analyzing Segment Frequency at GEDmatch

This post was inspired by an excellent post from Lara Diamond today entitled “Long Segment–But No Close Connection“.

What is DNA Segment Frequency?

In addition to segment size, segment frequency may be another important consideration for genealogists.

There are two ways to think about segment frequency: The first measurement of segment frequency is the frequency of a DNA segment among all humans. This is a number that is currently unknown, and can’t yet reliably be estimated even with simulations; there just aren’t enough people in the world who have tested yet. This is especially true for a segment that might be found outside of people with European descent as testing of these populations has been minimal or practically non-existent.

... Click to read more!

Thinking About a BigY Test at Family Tree DNA?

What is BigY?

BigY is a Y-DNA test offered by Family Tree DNA. For a limited time only (August 2017), Family Tree DNA is offering BigY to existing customers for $395 (down from $575).

Typically, most Y-DNA tests taken by genetic genealogists are either STR or SNP tests. In contrast, BigY is a Y-DNA sequencing test available to male test-takers. The test sequences approximately 12 million base pairs of the Y chromosome, and identifies SNP results within those 12 million base pairs.

From the BigY FAQ (also see the 2014 BigY White Paper):

The Big Y product uses next-generation sequencing to reveal genetic variations across the Y chromosome:

  • Targeted Non-recombining Y-DNA sequencing.
  • Illumina HiSeq 200.
  • 55X to 80X average coverage.
  • Around 11.5 to 12.5 million base-pairs of reliably mapped positions of non-recombining Y chromosome.
  • Analyzed using Arpeggi genome analysis technology for improved variant calls.

All samples are processed in-house using our custom laboratory methods and informatics. Your sample never leaves our company and is never shared with outside vendors..

... Click to read more!

The Effect of Phasing on Reducing False Distant Matches (Or, Phasing a Parent Using GEDmatch)

Genealogical autosomal DNA evidence relies on segments of DNA shared between two or more individuals. When they are true matching segments, they provide information about shared ancestry. One problem that genealogists are currently facing is the inability to decipher between “real” or “true” matching segments and “false” segments.

I won’t get too much into all the different terminology of “real” versus “false” here, because it isn’t important and takes away from the more important discussion. Genealogists, like patent attorneys, can be their own lexicographer, just so long as they are understood by the reader by providing a good definition. So here are my definitions for this post (and I typically use these elsewhere):

... Click to read more!

New DNA Webinars Coming From Legacy Family Tree’s “Webinar Subscriber Summer Spectacular!”

Legacy Family Tree announced today their “2017 Webinar Subscriber Summer Spectacular!“, a summer-long release of 28 new webinars from six different genealogists in six different areas (researching in archives, DNA, Texas, U.S. census records, photo restoration, and the Revolutionary War). The first batch, from Melissa Barker, are going live today.

My batch called “DNA: A Closer Look” releases next Thursday (July 16th). I already have a number of webinars available, including a 5-part series called “Fundamentals of DNA.” Legacy Family Tree has more than 500 archived webinars available to subscribers, on topics for every genealogist.

From Legacy Family Tree’s Official Announcement:

It’s our way of saying thank you to our webinar subscribers and inviting everyone else to preview these excellent classes!

... Click to read more!

The Y-DNA Mutation Rate Project

[Link for the survey is here:]

I’ve tested myself out to 67 Y-DNA markers (67-111 are pending!), and I’ve compared that to two 5th cousins. We have some genetic differences, as expected, and I’m always interested in learning more about my Y-DNA. One thing I’d like to do, for example, is compare my results to closer relatives (father and/or son) to look for mutations in father/son pairs.

What is the likelihood of a mutation arising between a father/son pair? What mutations are most likely to arise? Can we use this information to improve estimates to the MRCA?

There are existing studies that examine Y-STR mutation rates (see a great list on the ISOGG wiki: Mutation Rates). Most have dealt with SNP-derived haplogroup ages rather than close relatives, although some have indeed utilized father/son pairs. I’m always on the lookout for more data, however, and as scientists we want this information to be repeated with different father/son pairs. Once we have this data, we can incorporate it into haplogroup age estimates and many other areas of research, hopefully improving the outcomes.

... Click to read more!

AncestryDNA’s Genetic Communities are Finally Here!

Today (March 28, 2017), AncestryDNA launches a new tool called “Genetic Communities.” Genetic Communities (GCs) are groups of test-takers who are connected through their DNA because they descend from an identified recent and distinct population of ancestors (somewhere around 1750 to 1850, in my experience).

There is a lot to explore with these GCs, so this will be just an introduction rather than a complete guide.

At 2 PM EST this Thursday I’m doing a webinar for Legacy Family Tree Webinars called “Exploring AncestryDNA’s New Genetic Communities.” You can register at any time. If you’re reading this after March 30, 2017 and you missed the free webinar, you’ll be able to watch the webinar if you are a Legacy Family Tree member (and you should be!).

... Click to read more!

GUEST POST: The McGuire Method – Simplified Visual DNA Comparisons

EDITOR: Last summer while co-teaching a DNA course at IGHR, one of the students in the class had some questions about a mystery she was trying to solve in her own family. While discussing the brick wall, Lauren McGuire showed me a chart she had created with all the test-takers and their relationships to each other. Unlike most other methods of displaying names, relationships, and shared DNA, this chart was incredibly efficient and easy to understand. All the information was right there! It was dubbed “The McGuire Method” by the class, and it remains my favorite way to display shared cM data among a group of individuals.

For example, this method would have been perfect for displaying all the information in “A DNA Case Study: Revealing a Misattributed Parentage Event with DNA,” but I wanted Lauren to announce her method first. It would be an interesting exercise to go back, now, and re-plot that graph using the McGuire Method.

... Click to read more!

A DNA Case Study: Revealing a Misattributed Parentage Event with DNA

As DNA testing for genealogy becomes increasingly popular, more individuals are using the tool to examine and confirm their family trees. However, as more people are tested and comparing DNA to their paper trail, more people are discovering that their genetic ancestry is not what they expected it to be.

The Genetic Genealogy Standards were created to help educate people about the possible outcomes and limitations of genetic genealogy testing. One of the possible outcomes is misattributed parentage, or the discovery that a supposed genealogical ancestor is not in one’s genetic line. Many correctly point out that a misattributed ancestor is still be a social ancestor firmly rooted in one’s social tree, although it is potentially important to know when an ancestor is not one’s genetic ancestor.

... Click to read more!