August 2017 Update to the Shared cM Project

The Shared cM Project is a collaborative data collection and analysis project created to understand the ranges of shared centiMorgans associated with various known relationships. As of August 2017, total shared cM data for more than 25,000 known relationships has been provided. To add your data, the Submission Portal is HERE. I am always collecting data, and perhaps the next update with have 50,000 or 100,000 relationships!

This August 2017 update is the second update to the original data, released in May 2015, and includes many thousands of new submissions.

There is MUCH more about the project, including histograms and company breakdowns in the PDF download.

Figure 1. The Relationship Chart

Table 1. The Cluster Chart

Sample Histogram from the Shared cM Project (all histograms available in the PDF download):

Sample of Table 3. Analysis of endogamy and company breakdown for 1C (all company breakdowns available in the PDF download):

Sample from Table 4:

 

 

 

 

 

.

44 Responses

  1. Maureen 27 August 2017 / 10:37 am

    Wow! What a tremendous amount of work. Thank you, Blaine! I used your charts all the time.

    I have a lot of known cousins now whom I want to enter as data points into your project. But…I do not remember exactly whom I have already entered (very few I think). Will you be able to weed out duplicate entries (maybe based on my email address)? I will keep track now of whom I enter…

  2. Jim Davis 27 August 2017 / 1:58 pm

    An invaluable resource, THANK YOU!

    I haven’t read the pdf yet, but is there not enough significant difference between testing company results to warrant additional charts?

    • Jim Davis 27 August 2017 / 2:01 pm

      Ignore my question (not my thanks though). Some day I’ll learn to read a bit more before I post, lol.

  3. Curtis Rogers 27 August 2017 / 9:41 pm

    I frequently refer your chart to users of GEDmatch. This update will be a big help. It is an invaluable aid to the genealogical community and an instant classic tool .

  4. Liane Jensen 31 August 2017 / 9:14 pm

    Thank you so much for the time and effort you spend generating these references for the community.

  5. Kathy Rooney 1 September 2017 / 5:38 pm

    I have put your chart on my phone! Thank you. I have one question about the average for the 4c1r, is it supposed to be 29? That makes it higher than the 4c1r. Thank you!

  6. Rebecca 3 October 2017 / 12:32 pm

    We reported a 480 from FT for cluster #3 – which is literally off your chart! (We have no reason to suspect anything non-parental.) Gedmatch gives the same pair a sightly higher 541cM.

    aunt-nephew M-J 1,552 longest block 140 (gedmatch:1685.8 133.9) cluster 2
    father-son J-D 3,383 longest block 267 (gedmatch:3586.5 281.5) cluster 1
    greataunt-greatnephew M-D 480 longest block 72 (gedmatch:541.2 71.6) cluster 3

  7. Brad Hurley 6 October 2017 / 4:55 pm

    As you noted in your PDF, different testing companies calculate the total shared cM differently, depending on the minimum size segments that they include in that total. What minimum size did YOU use when creating this chart?

    Thank you very much for this extremely valuable resource.

    • Kathleen Hurley Doan 10 November 2017 / 3:36 pm

      Are you on GedMatch.com? Kathleen HURLEY Doan

  8. Judy Palmquist 13 October 2017 / 11:58 am

    Love your book The Family Tree Guide to DNA testing an Genetic Genealogy. I have entered several people I’m not sure if I entered all of them. When you get 50% of Father and 50% of Mother’s dna this is just an average right. How much can this vary? In the book between grandparent and 2 grandson’s the difference 22-28 for 1 and 17.7-32.3 for the other. Judy

  9. Sue Lambert 22 October 2017 / 7:17 am

    I don’t see any replies on here from the author but hopefully they are reading. I didn’t see any spot on the interactive chart for identical twin result. Identical twin is confusing as it appears to be the same cM’s as parent/child. You would think it would be much more. An identical twin article would be an interesting read for me and maybe others

  10. Linda R Horton 27 October 2017 / 9:56 pm

    Dear Blaine, thanks again for the great work. I am presently investigating the match to me, on AncestryDNA, of a half nephew (son of a half sister). Am I correct in seeing minor differences in the numbers on the Relationship Chart vs. the Cluster Chart? The Relationship Chart states that, for the relationship of half nephew, the average amount of shared cM was 891 and the range was from 500-1446 cM. relationship chart was 891 and the range was from 500-1446 cM. The cluster chart says for cluster 3 relationships the average is 884 and the range 619-1159. Am I missing something? Thank you

  11. Tom Ragusin 5 November 2017 / 4:45 pm

    Blaine,
    Great work with the shared cm segments because it is essential. We have a very valuable tool (autosomal dna testing), but the statistical nature of the results make it nearly unusable except for the closest matches. An accurate, well prepared genealogy is absolutely necessary to make sense of the results for more distant matches. The hope is that your work will refine the statistics for more distant matches. However, there are several assumptions that must be stated.
    The first assumption is that the genealogies used to determine our degree of relationship are accurate and well prepared. I will not use family trees from some genealogical websites because I am not always able to confirm the provided information. This problem becomes quite important when attempting to differentiate at some levels of relationship (for example 8th cousins and 8th cousins once removed).
    The second assumption is related to the first. For the statistics to be valid the two related persons must be related in only one way. This becomes less likely at more distant generations but is harder to prove. This would require knowing all 512 persons in our 10th generation (8th cousins) to provide valid information for your tables.
    The third assumption is that we all understand endogamy the same way. Incest and marriage not permitted by a church are obvious, but what of 2nd or 3rd cousin marriages? Second and third cousin marriages have a measurable and not insignificant impact on the collected evidence. Similarly, fourth and fifth cousin marriages have a measurable, but smaller, impact on the statics. Unfortunately, the last two examples are harder to prove without an extensive and accurate family tree. This is extremely important at the most distant relationships where the uncertainty, as currently reported, far exceeds the measured or predicted value.
    The assumptions mentioned must be evaluated for every set of data submitted.

  12. Anders Hjalmarson 22 November 2017 / 9:30 am

    I would be very interested in getting the percent for each cluster in small ranges of shared cM. Like fore the range 440cM-460cM: 2% cluster #3, 87% cluster #4 and 11% cluster #5. This would then easily be compared to this table https://i2.wp.com/thednageek.com/wp-content/uploads/2017/01/AncestryDNAs-Figure-5.2-Table-of-probabilities-2-1.png from the simulation done by AncestryDNA.

    http://thednageek.com/the-limits-of-predicting-relationships-using-dna/
    https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-White-Paper

  13. Kawansi 30 November 2017 / 5:33 am

    Hi,

    I recently received the results of a great-grandmother of mine on ancestry. We share 456 centimorgans on ancestry and 488 on gedmatch. We also have no shared matches on ancestry. This is substantially lower than the centimorgans I share with another great-grandmother of mine with whom I share 989 centimorgans on ancestry and 1157 on gedmatch. Why is this?

  14. Jonathan Berry 11 January 2018 / 4:00 am

    It is essential for science to state the assumptions and parameters used.

    So why is it not stated in the article, in the PDF, or in the portal whether X-DNA is considered in these numbers?

    At gedmatch.com, the major crossroads of amateur DNA study, there is a cM given for autosomal DNA, or a cM for X-DNA. You can click “A” to get a one-to-one autosomal comparison, or you can click “X” to get a one-to-one X-DNA comparison. There is no letter you can click to get a full one-to-one DNA comparison. You have to do the addition yourself. Instead of stating this parameter, the PDF, this article, and the portal studiously avoid the use of both “autosomal” and of “x-dna” … except on page 4 of the PDF where a click to a document https://isogg.org/wiki/Autosomal_DNA_statistics with “autosomal” in the title produces 34 instances of the word “autosomal”. Right near the top, we learn that “autosomal” excludes X-DNA with this statement: “Autosomal DNA is inherited equally from both parents.” So while the general theme is that the study deals with 1-23, the detail points to 1-22.

    The portal has no method to exclude duplicate reporting of results, no checking, and is ambiguous on what it is requesting. So kudos for the effort, but I believe that the methodology is irredeemably flawed. You need to start over.

    I posted the above a few days ago, but it disappeared without explanation. 11 Jan 2018.

  15. Sandra Byrnes 15 January 2018 / 6:03 pm

    I am using your portal to enter my data for your research. I have used 23andMe.
    How do I locate the longest block of DNA in cM for my entries? Is it available for that
    site? Thanks.

  16. Katie 16 January 2018 / 3:32 pm

    Curious, the averages you have for the more distant cousins presumably exclude zeroes, right? I mean, unless you make a probabilistic assumption about how many 6th cousins tested (for example) I don’t know how you’d know how many zeroes there were.

  17. Alasdair 22 January 2018 / 6:33 am

    Does the total amount of centiMorgans shown in the Shared cM Project at each relationship include the both the 22 autosomes and the X chromosome, or just the amount for the 22 autosomes?

    It is not clear from the website or the instructions for submitting. If it is stated can you please diret me to thre relevant text.
    Many thanks.

    • Blaine 23 January 2018 / 12:11 am

      Hi Alasdair!

      Report whatever the company reports to you, don’t anything else. Some companies report X, some don’t, but it’s already either included in the total or it isn’t. So no need to worry about any extra steps. Thanks for submitting!

  18. Simon Santa Cruz 17 March 2018 / 10:48 am

    First off – thanks. You have created a reference oint for everyone wondering how their “DNA matches” might plausibly be related to them.

    I have a couple of questions on methodology – others have been raised before in this feed (one a rather misdirected rant), the second on how to treat no-matches when giving average results I think is more problematic.
    My first question is how do you compensate for the vastly different matching results from different companies? I see you request the test provider information on your submission sheet – but do you try to allow for the differences between organisations and companies providing conservative matching values (GEDmatch, 23andMe, Ancestry) against those that give more liberal estimates (FTDNA, My Heritage)?
    Second question relates to submission bias. It is not more tempting to add results that are outliers than it is to dilligently enter every known relationship? Just a thought.

    Whether you think my points are valid or not, I salute you on your project.

    • Blaine Bettinger 18 March 2018 / 8:47 am

      1. The PDF contains a breakdown for each company (other than MyHeritage, which is too new to have enough submissions in the project). The variations are not as great as one would think, and of course they’re all just a subset of the full variation for each relationship.

      2. Submission bias is definitely an issue, as is data entry and other factors discussed in the methodology section of the PDF. But this is why scientists and statisticians love big data. The more data we receive for the project, the lower the probability that these are swinging the data significantly. For example, there are >1,500 submissions for 1C. The likelihood that submission bias is significantly biasing the data is very low.

  19. courtsaj 18 April 2018 / 7:22 pm

    Hi Blaine,

    Just wondering what settings you suggest for GEDmatch for one to one compare? Do we leave the settings as default or increase the threshold for segment size to 10cM?

    Thank you in advance.

  20. MizMdk 23 April 2018 / 2:00 pm

    I submitted data for a 2nd cousin, but i have my matches separated into 7+cMs and 7 category. Can I correct this, or is this the way you want the data?
    Thanks.

    • MizMdk 23 April 2018 / 2:05 pm

      Sorry, only part of what I wrote came through. The data I submitted for this 2nd cousin only included the 7+cM segments, but there are two smaller segments, so the total cM and the number of segments are greater than what I submitted. Is this what you want, or do you want all of the segments?
      Thanks again.

  21. John 29 May 2018 / 11:58 am

    Question about twins.

    I notice that the range for siblings is 2209 – 3384.
    Elsewhere I’ve seen identical twins are indicated as sharing 3400.

    If two twins come back as 3384, does this mean they aren’t identical, but instead fraternal sharing the absolute maximum number of centimorgans that is possible for siblings and still not be identical?

  22. Brit Nicholson 28 July 2018 / 3:16 pm

    Hello,

    This is great work! I made a simulation that can predict the amount of shared DNA between people. The only problem is that my predictions are an average based on the assumption that there’s no difference between the ways that men and women recombine chromosomes. Since DNA from fathers is recombined much less, the variance is much greater in shared DNA through paternal lines. I’m wondering if you have statistics on shared DNA between grandparents and their grandchildren in which the sex of both the grandparent and the parent are known–really just the standard deviation or variance would be great.

    Thank you,

    Brit

  23. Tomas Hailor 15 August 2018 / 9:30 am

    Great job, Blane! Thank you. I used your charts many times. Also there are a lot of information now which I want to recommend on https://pro-papers.com/gb/genetics-writing-service as data points into your project. I’m also wondering about statistics on shared DNA between grandparents and their grandchildren. If you have, it will be nice to see.

Leave a Reply

Your email address will not be published. Required fields are marked *