How Many Segments Do You Share?

I have told people in the past that we share a single segment of meaning IBD DNA with the vast majority of our genetic matches (where IBD means Identity-by-Descent, or a valid matching segment of DNA from a recent genealogical relationship). I usually say that we share a single segment of DNA with 99% of our matches, but that’s been an off-the-cuff estimate. I wanted to have better data to cite, so I took a closer look at this issue.

At FTDNA, you can download a list of all of your matches:

I downloaded my list and removed all of my targeted test-takers (anyone that I tested or I asked to test). These close test-takers would skew the data.

After removing them from my match list, I have a total of 2,491 matches at Family Tree DNA.

Family Tree DNA also allows you to download a list of all the segments you share with your matches:

... Click to read more!

Using Shared Matches – A Quick Example

I logged into my results at AncestryDNA today, and I had new fourth cousin match: Vivian Reese Wescott (ALL names in these screenshots are changed to protect privacy unless noted otherwise). This is a significant match to me, my 22nd closest match (not counting family members that I’ve tested). The relationship is estimated to be fourth cousin.

When I open up Vivian’s profile, I can see that she’s a new member, and likely hasn’t seen her results yet (I check frequently, but she hasn’t logged in since September 23rd). I also see that she doesn’t have a tree associated with her profile:

The first thing I would normally do is review her tree for clues as to our relationship.

Since I can’t do that, I’ll skip that step and now I’ll look to see how much DNA we share in common:

... Click to read more!

Inheritance of DNA Segments

DNA is randomly inherited. As a result, a match that shares 100 cM DNA with a parent will likely NOT share exactly 50 cM with the parent’s child; rather, there are a range of possibilities (100 cM, 50 cM, 0 cM, and everything in between, for example). On average it will be about 50%, but there is lots of room for variation.

Prompted by a great question in the Genetic Genealogy Tips & Techniques Facebook group, I used the “People who match one or both of 2 kits” tool at GEDmatch to look at the random inheritance pattern of DNA between my father and myself with regard to matches sharing about 35 cM (the examples here worked out great, but you can pick any size).

We can see the randomness of inheritance in this table. And we see a surprise (that I just discovered today with this exercise!) that reminds of the fact that matching DNA can come from BOTH parents!

... Click to read more!

Sharing Large Segments With a Match Does Not Validate Small Segments Shared With That Match

OK, that could be one of the worst blog titles I’ve written, but it’s intentional. When people share this post, I want the title to clearly convey the lesson.

Small Segments are Poison

We know that many small segments are false, and thus that many distant matches are false positives. I have written about small segments and distant matches many times. For a few background articles, see the following:

The (most current as of September 2017) definitive article on the nature of false versus true small segments is “Reducing Pervasive False-Positive Identical-by-Descent Segments Detected by Large-Scale Pedigree Analysis.” The paper is available online for free (http://mbe.oxfordjournals.org/content/31/8/2212). In the paper, the researchers found that more than 67% of all reported segments shorter than 4 cM are false-positive segments. At least 60% of 4cM segments were false-positive, and at least 33% of 5 cM segments were false-positive. The number of false-positives decreased fairly rapidly above 5 cM. See my analysis of this paper here.

... Click to read more!

August 2017 Update to the Shared cM Project

The Shared cM Project is a collaborative data collection and analysis project created to understand the ranges of shared centiMorgans associated with various known relationships. As of August 2017, total shared cM data for more than 25,000 known relationships has been provided. To add your data, the Submission Portal is HERE. I am always collecting data, and perhaps the next update with have 50,000 or 100,000 relationships!

This August 2017 update is the second update to the original data, released in May 2015, and includes many thousands of new submissions.

There is MUCH more about the project, including histograms and company breakdowns in the PDF download.

Figure 1. The Relationship Chart

Table 1. The Cluster Chart

Sample Histogram from the Shared cM Project (all histograms available in the PDF download):

... Click to read more!

Analyzing Segment Frequency at GEDmatch

This post was inspired by an excellent post from Lara Diamond today entitled “Long Segment–But No Close Connection“.

What is DNA Segment Frequency?

In addition to segment size, segment frequency may be another important consideration for genealogists.

There are two ways to think about segment frequency: The first measurement of segment frequency is the frequency of a DNA segment among all humans. This is a number that is currently unknown, and can’t yet reliably be estimated even with simulations; there just aren’t enough people in the world who have tested yet. This is especially true for a segment that might be found outside of people with European descent as testing of these populations has been minimal or practically non-existent.

... Click to read more!

Thinking About a BigY Test at Family Tree DNA?

What is BigY?

BigY is a Y-DNA test offered by Family Tree DNA. For a limited time only (August 2017), Family Tree DNA is offering BigY to existing customers for $395 (down from $575).

Typically, most Y-DNA tests taken by genetic genealogists are either STR or SNP tests. In contrast, BigY is a Y-DNA sequencing test available to male test-takers. The test sequences approximately 12 million base pairs of the Y chromosome, and identifies SNP results within those 12 million base pairs.

From the BigY FAQ (also see the 2014 BigY White Paper):

The Big Y product uses next-generation sequencing to reveal genetic variations across the Y chromosome:

  • Targeted Non-recombining Y-DNA sequencing.
  • Illumina HiSeq 200.
  • 55X to 80X average coverage.
  • Around 11.5 to 12.5 million base-pairs of reliably mapped positions of non-recombining Y chromosome.
  • Analyzed using Arpeggi genome analysis technology for improved variant calls.

All samples are processed in-house using our custom laboratory methods and informatics. Your sample never leaves our company and is never shared with outside vendors..

... Click to read more!

The Effect of Phasing on Reducing False Distant Matches (Or, Phasing a Parent Using GEDmatch)

Genealogical autosomal DNA evidence relies on segments of DNA shared between two or more individuals. When they are true matching segments, they provide information about shared ancestry. One problem that genealogists are currently facing is the inability to decipher between “real” or “true” matching segments and “false” segments.

I won’t get too much into all the different terminology of “real” versus “false” here, because it isn’t important and takes away from the more important discussion. Genealogists, like patent attorneys, can be their own lexicographer, just so long as they are understood by the reader by providing a good definition. So here are my definitions for this post (and I typically use these elsewhere):

... Click to read more!

New DNA Webinars Coming From Legacy Family Tree’s “Webinar Subscriber Summer Spectacular!”

Legacy Family Tree announced today their “2017 Webinar Subscriber Summer Spectacular!“, a summer-long release of 28 new webinars from six different genealogists in six different areas (researching in archives, DNA, Texas, U.S. census records, photo restoration, and the Revolutionary War). The first batch, from Melissa Barker, are going live today.

My batch called “DNA: A Closer Look” releases next Thursday (July 16th). I already have a number of webinars available, including a 5-part series called “Fundamentals of DNA.” Legacy Family Tree has more than 500 archived webinars available to subscribers, on topics for every genealogist.

From Legacy Family Tree’s Official Announcement:

It’s our way of saying thank you to our webinar subscribers and inviting everyone else to preview these excellent classes!

... Click to read more!

The Y-DNA Mutation Rate Project

[Link for the survey is here: https://goo.gl/forms/oSpKuAfU4sGnShJ73]

I’ve tested myself out to 67 Y-DNA markers (67-111 are pending!), and I’ve compared that to two 5th cousins. We have some genetic differences, as expected, and I’m always interested in learning more about my Y-DNA. One thing I’d like to do, for example, is compare my results to closer relatives (father and/or son) to look for mutations in father/son pairs.

What is the likelihood of a mutation arising between a father/son pair? What mutations are most likely to arise? Can we use this information to improve estimates to the MRCA?

There are existing studies that examine Y-STR mutation rates (see a great list on the ISOGG wiki: Mutation Rates). Most have dealt with SNP-derived haplogroup ages rather than close relatives, although some have indeed utilized father/son pairs. I’m always on the lookout for more data, however, and as scientists we want this information to be repeated with different father/son pairs. Once we have this data, we can incorporate it into haplogroup age estimates and many other areas of research, hopefully improving the outcomes.

... Click to read more!