Examining Outliers in Shared cM Amounts

Sheryl, a member of the Genetic Genealogy Tips & Techniques group (which just broke 50,000 members!) recently commented on a thread about shared DNA outliers about a situation within her own family. I thought it would be a great opportunity to discuss outliers and how to deal with them. Sheryl kindly agreed!

For background, we examined an outlier situation once before on this blog, where second cousins once removed (2C1R) did not share DNA (see “Analyzing a Lack of Sharing in 2C1R Relationship“).

Identifying an Outlier

Sheryl indicated that she and her mother Grace appeared to be outliers with Sally, their first cousin (1C) and first cousin once removed (1C1R), respectively. Grace shared 482 cM with her 1C Sally, and Sheryl shared 215 cM with her 1C1R Sally. Not surprisingly, Grace and Sheryl share an expected amount for mother/daughter:

If you’ve worked with DNA testing for a while, or you’ve tested many relatives, you start to get a feel for the amounts of DNA that relatives of various relationships should share. For example, hearing someone shares 482 cM with an expected first cousin automatically raises red flags.

However, you don’t need to memorize expected amounts of shared DNA for every relationship. There are online resources and tools that allow you to look up ranges and probabilities for most genealogically relevant relationships.

Using the Shared cM Project

The August 2017 version of the Shared cM Project, for example, has a graphic with the ranges and averages for relationships through 8C based on 25,000+ submissions (click to enlarge):

But that’s not all! The Shared cM Project has a PDF with histograms, breakdowns by company, and other analyses that didn’t fit onto the graphic. I cannot encourage you too strongly to READ THE FULL PDF FOR THE SHARED CM PROJECT!

For the August 2017 Version of the Shared cM Project, there were a total of 1,512 submissions for 1C relationships. That’s a lot of submission! The 99th percentile range for 1C was 553 cM to 1225 cM, with an average of 874 cM:

The Shared cM Project PDF also has a histogram for the 1C relationship submissions, which is a graph that shows the distribution of these submissions (I added the red arrow manually):

Along the bottom are various “bins” or buckets with a small cM range (such as 870 to 915 cM) in the middle. The height of the bin is the number of submissions that are located within that bin or bucket. The 870-915 bin, for example, has 192 submissions. Because there are so many 1C submissions, the histogram shows a beautiful bell curve distribution, just as we would expect.

As shown by the red arrow, however, sharing of 482 cM by suspected 1Cs Grace and Sally falls far outside the range for 1C shown by this chart. Thus, a result of 482 cM is an OUTLIER according to the Shared cM Project. An outlier is “a statistical observation that is markedly different in value from the others of the sample,” according to Merriam-Webster. In other words, an outlier is a shared cM amount for a genealogical relationship that falls outside an expected range. Indeed, 482 cM falls far outside the range of 553 to 1225 cM.

Using the Relationship Tools at DNA Painter

In addition to the Shared cM Project, there are interactive tools at DNA Painter that allow you to evaluate a shared cM amount (see The Shared cM Project 3.0 tool v4). To use this tool, simply enter a shared cM amount in the empty field (shown by the manually-added red arrow below):

Entering 482 cM, for example, produces a chart of probabilities:

These probabilities were generated by Leah Larkin (The DNA Geek) using Figure 5.2 of the AncestryDNA Matching White Paper published on 31 March 2016. Leah used some elegant analysis to extract these probabilities, and programmer extraordinaire Jonny Perl of DNA Painter converted that information into this interactive probability tool.

As shown in the probability chart, there is a 3.92% probability that 482 cM is a 1C. Due to the slight mismatch between the Shared cM Project and The DNA Geek probabilities, there is a note indicating that “this relationship has a positive probability for 482cM in thednageek’s table of probabilities, but falls outside the bounds of the recorded cM range (99th percentile).

Looking at The DNA Geek extracted probabilities and ranges, the range for 1C is approximately 440 cM to 1500 cM. Thus, according to The DNA Geek probabilities a result of 482 cM is NOT an outlier. However, it is still extremely low and should be treated with caution; indeed, it should still be treated as if it is an outlier.

But is 482 cM for Grace and Sally really an outlier result? Or are Grace and Sally not in fact first cousins? That is the true question!

Is it Outlier? The Extreme Danger of Confirmation Bias

The danger here, and what I find most people do, is to assume that this is indeed an OUTLIER. However, it is only an outlier if Grace and Sally are in fact first cousins. If they are actually another relationship for which 482 cM falls within the range, then the result is not an outlier.

Proceeding as if Grace and Sally are simply first cousins with an outlier result, or trying to prove they are 1C without trying to disprove it, is confirmation bias. It assumes an outcome and clouds judgment, potentially leading one to ignore or devalue contradictory evidence.

When a shared cM amount is very low or very high, a red flag is raised and we must do our best to resolve that red flag using a combination of documentary research and additional DNA testing. Anything else is confirmation bias.

Are Grace and Sally First Cousins? So Far, We CANNOT Know!

So, to recap, so far all we know is that Grace shares 482 cM with Sally, that Sheryl shares 215 with Sally, and that Grace and Sheryl share 3,485 cM. But we don’t know whether these are outliers or whether there is another explanation such as a different relationship.

Indeed, even if there is a VERY well-documented tree showing that Grace and Sally are first cousins, the fact that they share an unexpected amount of DNA (as demonstrated by a scientifically solid analysis) means that there is potentially a conflict between the tree and the DNA. It is our job as genealogists – professional problem solvers – to resolve this conflict with additional evidence.

How do we resolve the conflict with additional evidence? Easy! We go out and identify or generate that additional evidence.

Formulating Hypotheses

To examine this possible outlier situation, we can formulate hypotheses to test.  We generate a hypothesis by taking the information we have so far, scant as it may be, and formulate some educated guesses to explain the information.

We then try to disprove (NOT prove!) the hypothesis. If we disprove a hypothesis we can discard it. If we FAIL TO DISPROVE a hypothesis, it remains the most likely explanation for the evidence and may ultimately become our standing conclusion.

There are (at least) two competing hypotheses here, which are likely to be mutually exclusive:

  1. The first hypothesis is that Grace and Sally are indeed 1C and the 482 cM they share is indeed an outlier result.
  2. The second hypothesis is that Grace and Sally are another relationship other than 1C into which 482 cM more solidly fits, such as half 1C or 1C1R (which had probabilities of 88.88% according to The DNA Geek probabilities).

We could conceivably come up with other hypotheses, although these are by far the two most likely scenarios. Since we don’t have unlimited time, money, and resources, we can’t disprove every possible hypothesis and thus we stick to the most likely hypotheses. For example, a possible hypothesis is that this data was falsely placed by aliens to achieve some otherworldly-goal, but generally speaking we are going to ignore that hypothesis for every analysis!

Testing Hypotheses

How do we test a hypothesis? To test a hypothesis, we need new evidence (and/or to analyze old evidence in new ways).

In this scenario, we should gather evidence in two ways:

First, we must reexamine the documentary trail. Is there any suggestion or evidence in the documents that Grace and Sally are not 1C? How strong is the evidence that they are indeed 1C? Since we were not present for each of these four conceptions (shown by the red arrows), we don’t know how accurately the records reflect the genetic reality.

Second, we must obtain additional DNA evidence by testing other family members. There are two ways to do this.

The first way is for both Grace and Sally to look for random matches in the database to “Grandfather” and “Grandmother.” If a leading hypothesis is that Grace and Sally might be half 1C, then either Grace or Sally would not match the same Grandmother or Grandfather matches (since Grace and Sally’s parents would be half-siblings and thus Grace and Sally would not share either the same Grandmother or (more likely) the same Grandfather).

The second way, and more likely to yield stronger evidence, is targeted testing of close family members to examine the specific situation. For example, family members such as the grandparents, the parents, the siblings of the parents, and others could be very useful. In this particular example, the grandparents and parents are not living, however there are many other family members that can shed light on the possible outlier situation.

Adding More DNA Evidence to the Analysis

Among other relatives, Sheryl has tested her uncle Earl (brother of Grace) and her great-aunt Ann (aunt of both Grace and Sally). Importantly, Grace and Earl share 2,678 cM and thus are full siblings. Additionally, Earl and Sally share 578 cM.

NOT an outlier (hypothesis #2): If Grace and Sally are half 1C according to the second hypothesis, then their parents would be half siblings. Thus, Ann would likely be a half sibling to one and a full sibling to the other (she could also be a half-sibling to both of them, if there were three different mothers or three different fathers). We would expect Ann to match one niece as a full aunt/niece and the other niece as half aunt/niece. There’s no indication so far of which it might be.

Outlier (hypothesis #1): If Grace and Sally are full 1C according to the first hypothesis, then we would expect Ann to match both Grace and Sally as full aunt/niece. However, we might expect Ann to share less DNA than average with either Grace or Sally in view of the outlier situation.

Below I’ve charted the DNA shared between Ann and everyone else in the family using a McGuire Chart created using the McGuire Method (see “GUEST POST: The McGuire Method – Simplified Visual DNA Comparisons“):

Here, Ann shares 1754 cM with Sally, 1602 cM with Earl, and 1459 cM with Grace (and 698 cM with Sheryl), which is solidly within the full aunt/niece/nephew range. Below, the sharing between Ann & Sally and Ann & Grace is plotted on the histogram for Aunt/Uncle/Niece/Nephew from the Shared cM Project (shown by the manually-drawn red arrows):

When we look at the histogram for Half Aunt/Uncle/Niece/Nephew, we see that 1459, 1602, and 1754 cM all fall outside the range:

If we pop these amounts of shared DNA into the probability tool at DNA Painter, we see the following, namely that while 1754 cM shared with Sally and 1602 cM shared with Earl show a 100% probability of full nibling (i.e., full aunt/niece and full aunt/nephew), the 1459 cM shared with Grace shows a less than 5% probability of being a half aunt/niece:

In addition to Aunt Ann, Sheryl has tested other relatives including another 1C of Sally, Grace, and Earl:Here, the 1C is indeed a full niece of Ann at 1751 cM. However, for a 1C relationship of 790, 789, and 773, there is a small chance (less than 10% according to the DNA Painter probabilities) that they could be a half 1C, but interestingly Grace, Earl, and Sally all fall within a 17 cM range with the 1C. That would suggest it would be the 1C that could be a half 1C, if they are half 1Cs rather than full 1Cs.

For me, it’s Grace’s brother Earl that most likely disproves hypothesis #2 (i.e., that Grace and Sally are another relationship other than 1C into which 482 cM more solidly fits, such as half 1C or 1C1R), and fails to disprove hypothesis #1. Let’s recap the important facts:

  • Earl and Grace share 2,678 cM and thus are likely full siblings;
  • Earl shares 1602 cM with his Aunt Ann; and
  • Cousin Sally shares 1754 cM with Aunt Ann.

If either Sally or Grace were a half niece to Ann, we would expect only one of them to share in the half Aunt/Niece range. While Grace’s sharing with Aunt Ann of 1459 cM could conceivably be a half Aunt/Niece relationship (although it would be a very extreme outlier according to the Shared cM Project), Earl would also have to be a half Aunt/Nephew relationship because Earl and Grace are full siblings. However, we see that Earl and Ann are solidly in the full Aunt/Nephew range, and very far outside the possible range for half Aunt/Nephew.

What do you think? Are there other (not crazy) hypotheses you would want to test?

What Does the Existence of Outliers Mean for the Shared cM Project?

Some are tempted to look at a result like 482 cM for a 1C, which is an outlier in the Shared cM Project, and declare the project to be flawed or incomplete. However, the Project utilizes many thousands of relationship submissions and statistical analysis to determine the best ranges. As more relationships are submitted the boundaries of the ranges are likely to change.

However, even with millions of submissions, there can still be outliers. Biology is a very random process (or we wouldn’t be here!), and thus there can always be outliers. Unfortunately, you see, statistics doesn’t care about the individual.

That’s why we, as genealogists, utilize statistics to generate testable hypotheses rather than incorrectly basing any conclusion on statistics alone.

Conclusions

This post is intended to provide helpful insight into how to approach possible outlier situations, including how to examine whether a result is actually an outlier or is an entirely different relationship. When faced with a possible outlier, it is important to formulate and then test several different competing hypotheses (in an attempt to disprove them!) with additional documentary research and DNA testing.

DNA evidence can be powerful, but only when it is used carefully and correctly.

Thus, if you test a first cousin that shares 500 cM with you and you point to this blog post to say that “it’s perfectly fine for 1Cs to share 500 cM,” you should go back and reread this post!

15 Responses

  1. John Ralls 17 December 2018 / 1:02 pm

    How many cM do Earl and Sally share?

  2. Barbara Shoff 17 December 2018 / 5:04 pm

    I think we need to have a chart that shows segment size as well as the number of segments that can be used in conjunction with the total cMs.

    • Blaine Bettinger 17 December 2018 / 5:11 pm

      For this analysis I’m only using AncestryDNA data. We could look at the # of segments, but it’s unlikely to factor into this particular analysis given the closeness of the relationships.

  3. Constance Knox 17 December 2018 / 9:51 pm

    OK Blaine, I’ve got an idea. First of all, I believe in the cM project and don’t think it’s flawed in any way. In fact, I think there are several possible answers to this.

    What if one of the grandparents had a child with the half sibling of their spouse (or the other grandparent)? For example, let’s say that the grandfather has a half-brother. The grandmother has a child from a relationship with a half-brother… Parent #1 (on the left, the grandparent of Grace).

    Then the rest of the kids are from the two grandparents (Ann and the two parents of Sally and 1C) are all parented by the original grandparents as you have them listed.

    What keeps catching my eye is the differences, mathematically, in the relationship between Ann and Grace 1459 cM) and the relationship between Ann and Sally (1754 cM). The difference is 295 cM (20% more one over the other). I keep focusing on that. That feels like a half granduncle/aunt was the parent of the Graces parent. Or in other words, Grace is the grandchild of the half sibling to one of the grandparents.

    I can’t find the calculators that would test this theory, but I’d be interested to know. I think the math works on paper.

    Look at it another way. If there is a half sibling to one of the grandparents, say as a third unknown parent to Parent #1 (Graces parent), then if I do the math right, Grace would share 18.75% of the DNA of her grandparents with her 1C’s as compared to 25% that the rest of her cousins have. Mathematically I think that make sense when you factor the cM’s. The range makes it difficult to calculate it exactly.

    If Ann and Parent #3 and parent #4 have all proven to be full siblings (based on Earl, Sally and 1C’s cM range) then Grace is the one cousin that has less DNA of a grandparent. Mathematically, I think that one grandparent is a half sibling to the others’ whole grandparent… if that makes sense.

    I’m suggesting Grace could have approximately 18.75% of the same DNA compared to one of the other cousins who should be at 25% of those grandparent’s DNA.

    Maybe I’m comparing apples to oranges here but mathematically, if you divide the cM of ANN to Grace (1459) and Ann to Sally (1754) I think it comes up to a 16.8% difference… which could possibly be in range of my theory of 18.75% of the difference if there is a half-sibling involved with one of the grandparents.

    I’m sure I’m not explaining it clearly and your head is spinning. I’m saying, explore the possibility there is a half sibling to one of the grandparents how fathered/mothered the first child (Grace’s parent).

    Thanks for all you do!
    Connie Knox

    • Blaine T Bettinger 18 December 2018 / 11:02 am

      Hi Connie! If Ann and “Grace/Earl’s Parent” are 3/4 siblings (i.e., they have the same mother but their fathers are brothers (or vice versa)), then Ann and “Grace/Earl’s Parent” would share about 2200 cM on average and we would expect Ann to share about 1100 cM with Grace and Earl. That probably doesn’t line up well with the known amounts, since Ann shares so much DNA with Earl (1600 cM) and Grace (1459). That high sharing with Ann would be hard to reconcile with the low sharing between Grace/Earl and the 1Cs.

      But I think you’re hypothesizing a different relationship, namely the same mother but their fathers are HALF-brothers (or vice versa). So thus, Ann and “Grace/Earl’s Parent” would be half-siblings and half 1Cs. That would be even more distant than the relationship discussed above, so I’m even more dubious of this possibility.

      If I’m misinterpreting your hypothesis, let me know.

  4. CathyB 18 December 2018 / 1:13 pm

    I have a similar situation, but at the 2C level. I share only 57 cM with a 2C (alias “Sally Sue”) per paper trail, and photographic resemblances so I can’t rule out that we might really be half-2C’s… still working on requesting others in the family to test. Interestingly, my full-blooded brother (2507 cM shared w/me at Ancestry; similar amounts w/me at the other 3 main vendors) shares 111 cM with “Sally Sue”, and my dad shares 279 cM with “Sally Sue”, nominally his 1C1R. (And my brother and I each share roughly 3450 cM with our dad — and our mom — at multiple vendors.)

    Which brings me to my real point — NONE of these comparisons can be looked at in isolation. (It’s clear from your diagrams and your post you do consider the whole cluster of relationships; I’m not always convinced everyone else does. 🙂 )

    Meaning, in my own case, regardless of how additional DNA test evidence plays out for determining how “Sally Sue” is specifically related to my brother and me, it won’t change the fact that it’s the randomness of inheritance that gave my brother twice as much shared DNA with “Sally Sue” as I have with her.

  5. Lisa Wooldridge 18 December 2018 / 8:48 pm

    Great article!
    I ran into something similar with a second cousin about three years ago but was able to confirm that he was a full second cousin by testing my sister and a couple other cousins.
    What really I thought was really odd in my case was that my grandparents were first cousins and I would have thought that maybe my second cousin and I, would have shared more DNA between us than less DNA

  6. Bill Powell 31 December 2018 / 8:44 am

    I just signed up for your newsletter and wanted to say thanks for this article.

    I am trying to help a cousin find his biological father. He has a new match on Ancestry at 1209 shared cms, a first cousin under normal shared cm probrabilities.

    As I have delved deeper into this, the known relationships of the new match who is helping us, and how she matches me and another cousin, has led me to at least consider another possibility. That they are not first cousins, but half siblings.

    Your article on this subject of outliers gives a great outline on how to work through this.

  7. Debbie Boxton 2 January 2019 / 7:09 pm

    I went through this scenario with my 1C, “E”. E tested 476 with me while she tested within range with my sister, “A” (778) and my other 1C, “T” (904) who are also her 1Cs. And E (1601), A (1447), T (1373) and I, “D” (1486) tested within range for our Aunt and Uncle (E 1831), (A 1779), (T 3582 – her dad), and I (D 1976), (who were siblings of our fathers). My sister is my full sibling at 2641. And I tested within range with my cousin T (845). We tested at Ancestry (Aunt’s numbers) and my deceased Uncle at FTDMA. E has come closest to me in numbers at Gedmatch where the DNA Detectives Green chart advises to add the autosomal to the X, so at 495 + 69 = 564. Anyway, I’ve hypothesized that she and I are outliers but 1C.

  8. Martin C Abrams 5 January 2019 / 12:34 pm

    I actually have a first cousin at 477 cM. At 23andMe, 477 autosomally plus 99 on the X.

    A consideration was that my cousin might be a half-cousin. But of our first nine Relatives in Common, 4 are connected to our grandfather and 5 are connected to our grandmother.

    My first cousin is a known person to me as we have always known each other.

  9. RJM 18 May 2019 / 2:58 am

    I have an outlier at the other extreme. I share 1513 cM with a 1C. The DNAPainter tool shows only a 0.37% probability that we are, in fact, 1C.

    However, I know the reason for this freakish result. Our mutual grandparents were full siblings.

    It seems more than likely that other extreme results, at the high end of the range, could be due to similar close relationships in the family tree.

  10. Jen M 24 May 2019 / 1:55 pm

    I also have an outlier situation that I would love some clarification on. My daughter is coming up with her paternal grandmother at 37.5% and 2787 cM’s. When I plug into DNA Painter is says they’re siblings. Have you seen a grandchild / grandparent relationship this high before?

  11. Cassandra 16 October 2019 / 8:05 pm

    My 1C and I share 354 CM. My dad, her uncle, shares only 740 CM with her. My other cousin, who is supposed to be my double cousin shares 1264 CM with me, 700 CM with my dad and 2100 CM with my mom. My dad and I have different shared dna matches to several members of another family from the town my grandmother grew up in. My 2 first cousins have a DNA match to a 2nd cousin on my presumed paternal grandfather’s side, however my dad and I do not match to him. We all match to cousins on my paternal grandmother’s side with expected DNA. So, I have pretty much concluded that my dad had a different father than his 4 older siblings (who have not been tested). Interestingly that would make my double cousin a 1.5 cousin, since we share 3 grandparents rather than 4? Does this sound like enough evidence to conclude this? There are other cousins and my brother who won’t or haven’t tested unfortunately.

Leave a Reply

Your email address will not be published. Required fields are marked *