Version 4.0! March 2020 Update to the Shared cM Project!

The Shared cM Project (ScP) is a collaborative data collection and analysis project created to understand the ranges of shared cM associated with various known relationships. The ScP has been very successful, with more than 60,000 submissions from amazing genealogists like YOU! To add your data, the Submission Portal is HERE. I am always collecting data, and hopefully the next update will have more than 100,000 submissions!

The full PDF for Version 4.0 of the Shared cM Project is here and it is ESSENTIAL that you read the full PDF for all the details from the project: The Shared cM Project Version 4.0 (March 2020).

Today, the most recent version of the ScP, Version 4.0, goes live. I’ve taken nearly 60,000 submissions and analyzed the data for almost 50 different relationships. For each relationship the 100s or 1000s of submissions were analyzed to remove outliers, to provide minimum, maximum, average, and standard deviation values, and to generate a histogram for the distribution of the submissions. Here are some of the other differences between this new Version 4.0 and the previous version (click to enlarge):

Note that there ARE going to be some big changes to the minimum, maximum, and average values for some relationships compared to the previous version, as more data equals better data. From page 54 of the full PDF:

There are many changes to the minimum, average, and maximum values for relationships in Version 4.0 of the Shared cM Project relative to the prior Version 3.0. As the number of submissions for a relationship grows, the distribution of cM values for that relationship is more clearly defined. This allows for improved definition and elimination of outliers for each relationship. In some cases, the very large increase in submissions moved the minimum and/or maximum values further outward for a broader distribution in this version, and in other cases it moved the minimum and/or maximum values inward for a tighter distribution in this version.

There’s a table on pages 55 and 56 that show the percentage change for the minimum, maximum, and average values for the different relationships.

There is of course a brand new Relationship Chart with all of the new ranges and averages (click to enlarge):

Be sure to look for the gray bar at the top of the graphic that says “Version 4.0 (March 2020).” That’s how you’ll know you’re using the most recent version of the chart.

There are many more graphs and charts in the full PDF, so be sure to check it out!

The Interactive Shared cM Project at DNA Painter

In addition to the full report and the updated Relationship Chart, there’s another incredibly valuable aspect of this update. Since September 2017, Jonny Perl has hosted a web-based version of the ScP at DNA Painter. It allows you to put in a cM value and see which relationships that cM value falls into.

Jonny has graciously updated the tool with the new numbers, AND has added new features! If you click a relationship box, the histogram for that relationship will show in a pop-up! The histograms are now available for every relationship in the Relationship Chart, without having to refer back to the full ScP report.

Jonny has a blog post about the update with much more information at “Introducing the updated shared cM tool.”

It looks amazing and contains an incredible amount of information right at your fingertips. Thank you Jonny!

Thank YOU

Obviously, this project couldn’t happen without YOU! Every person that has ever submitted even a single relationship has helped create this tool for the benefit of the entire community. And I am will continue to collect data indefinitely, so feel free to visit the Submission Portal HERE. Again, a huge thank you to each and every one of you.

Please also be sure to check out page 4 of the full ScP report for some special thank yous.

I hope you enjoy the update! Thank you!

37 Responses

  1. Paul Baltzer 27 March 2020 / 11:50 am

    Blaine,

    Great job. You are a rockstar! Thanks for all the hard work which you do.

    Paul Baltzer

  2. Ed Williams 27 March 2020 / 6:48 pm

    In the midst of the Coronapocalypse, Blaine sends us a huge and eagerly awaited present! Thanks, Doc! I’ve already had a little back-of-the-napkin fun with the standard deviations. I consider adding that info a very big plus. If you want me to table out some actual confidence interval runs from the raw data, just let me know. I contributed to version 3.0, and only 16 of the 32,999 new datapoints in version 4.0. But if we all submit information about the absolutely-for-certain known relationships we’ve found, we could add 60,000 new bits of data for version 5.0…and keep Blaine REALLY busy! 🙂

    • Blaine T. Bettinger 29 March 2020 / 11:26 am

      Haha! I’m planning to take at least a short break, but if I reach 100K tomorrow, I’ll have to do what I have to do! Thank you for the submissions!

  3. Cathy D 28 March 2020 / 1:10 pm

    Cheers — and a huge thank you!! This is awesome, and I’m eager to dive into the details. I’ve already submitted over 200 data points (yes, I keep my own spreadsheet so I don’t double-submit) and I’ve got dozens more to contribute to the eventual ver 5.0 For now, thank you so much, Blaine, for all the work you’ve put into this (and to all your assistants).

  4. Terry H 28 March 2020 / 3:42 pm

    This is great, Blaine. Thanks to you and Jonny for your work on this. I have a question though. You have been collecting information on shared cM for relationships involving endogamy and pedigree collapse and I’m wondering if an updated chart will contain that data.

    • Blaine T. Bettinger 29 March 2020 / 11:30 am

      I have been collecting that data, but I don’t yet have enough data to compile. I envision it being a separate chart, but not quite sure yet.

      The “The Pedigree Collapse, Double/Multiple Cousin, and ROH Shared cM Project” is here: https://forms.gle/9U8SVsYQXLsoVwLo6

  5. Bill Greggs 28 March 2020 / 3:52 pm

    This is fantastic, Blaine! Thanks for your intensive work and for making this version bigger and better than last time. And for the collaboration with Jonny Perl to bring it alive on the web.

    One question – wondering about the slight differences in Parent and Child as well as Aunt/Uncle-Niece/Nephew. Shouldn’t those relationships be based on the same underlying data and be exactly the same as Great Aunt/Uncle-Great Niece-Nephew are?

    • Blaine T. Bettinger 29 March 2020 / 11:40 am

      I’ve fixed the Parent/Child but the graphic hasn’t updated yet. I will probably just leave the Aunt/Uncle/Niece/Nephew since it’s only off by 1 cM (updating everything means there are multiple versions of the chart floating around, and I don’t like that. Thank you for the feedback!

  6. Gary Buck 28 March 2020 / 9:18 pm

    fabulous and love the histograms. The transparency by giving us the distribution charts shows the issue with sample size is small. Most are pretty smooth….well done

  7. Jørgen Kim Kanters 29 March 2020 / 8:18 am

    This is the most valuable tool i use in my genetic genealogy. As a medical scientist I have a few minor comments. I cannot see (may be I miss it) some definitions. You mention the mean and expected value. Normally I would say that the arithmetic mean is the expected value. Do You mean the median or modus? You present the standard deviation. However since most of the distributions are skewed and not normal distributed, the standard deviation is not the best parameter. Much better is to use the 95% Confidence Interval (Not calculated as 1.96 * SD but from the actual values). Continue with the good work, I am so thankful and so is my genealogy colleagues.

    • Blaine T. Bettinger 29 March 2020 / 11:35 am

      Thank you for the kind words! I used confidence intervals in the previous version, but they turned out to be too confusing for people so I simplified it in this version. I don’t use the SD, and I’m doubtful that it adds anything useful to the project, but people asked for it so I provided it.

      The expected value is what would happen if every segment of DNA got exactly 50% smaller in each generation. So you would share exactly 12.5% with a first cousin, then exactly 6.25% with a 1C1R, and so on. That’s how we used to examine relationship possibilities before the Shared cM Project.

  8. Kelli Bergheimer 29 March 2020 / 5:27 pm

    Thank you, Blaine, for your tireless work to make tools understandable and accessible. You are a true leader in the genetic genealogy community!

  9. Linda R Horton 29 March 2020 / 5:38 pm

    Dear Blaine, I would like to submit shared cM data for my family, and my husband’s family, but the form provided for submission of this data does not appear to be well-suited for my situation. Would you accept a submission in the form of a table, if it contains the information sought? I am one of six children of my parents, and we all tested on FTDNA and so have detailed segment data. Also, we have a half sister, and numerous cousins have tested, some of them sponsored by me. If you would accept a table containing information on my family and our cousins, and a separate table for the family of my husband, I could email the table to you. Thank you.

  10. Alec van Helsdingen 9 April 2020 / 12:15 am

    Is the full sibling relationship number correct? Might it be 3613 not 2613? 2613 seems a bit low, especially when parents and children share around 3500-3600.

    • Durward Colquitt 28 April 2020 / 9:27 pm

      Alec, remember Blaine has arrived at his numbers from actual values of about 60,000 matches that have been provided to him. For instance, the results for the matches each of my two sisters and me are: J) 2,659 cM across 60 segments; T) 2,472 cM across 59 segments.

  11. Bob Davis 10 April 2020 / 8:18 am

    While I agree that the company breakdown was infrequently used, I tried to always point people with questions about company differences to Table 3. It was the only source for such information. I was looking forward to the 2020 update using more samples, so am disappointed in the company breakdown not being included in the 2020 version. Guess I will still use the 2017 version for such questions, especially in the < 80 or so cM region where it starts to matter.

  12. Juan Pablo Villaverde 13 April 2020 / 1:51 am

    Great job Blaine! In this version the outliers were removed using the 99th percentile method just like in v3? The v4 PDF does not mention it, I wonder if this version includes or excludes the entries over the 99th percentile.

  13. Mark E. Dunham 16 April 2020 / 11:18 pm

    As I work 2nd through 4th cousin trees via Ancestry, your chart has been invaluable. I would gladly contribute some $ if needed anywhere, even for a well deserved bottle of whiskey!

  14. Bob Davis 4 May 2020 / 5:47 pm

    It’s great to have this update. My big disappointment is the demise of Table 3. I have referred dozens of people that had questions about differences between companies to that often overlooked reference. Many people don’t understand the processing nuances for each company and don’t understand why the same pair of testers don’t have the same cMs at different companies. I forget if company was requested on this collection and if it is just a matter of generating Table 3 from existing data, or if the testing company was not asked for. Anyway, nice to have the update. Kudos.

  15. Klasters 17 July 2020 / 3:56 pm

    I remember when we were forced to write an essay in college about the DNA structure. It was very difficult, but it was possible to implement it. There’s nothing super complicated about it. Fortunately, I was able to find people who were able to provide me with professional help on this issue. It was the only way to help me do this difficult work. They provided me with examples and so on. My supervisor was very surprised at this level of work. I recommend that you also think about looking at https://pickthewriter.com/ to see what companies provide such services. There are professionals here who know what they should do and how. So make sure you take a look and get acquainted with them. I think it will be useful for anyone who writes the same work now. It’s easy to make a mistake here, because a professional approach is just necessary. Besides, it will help you to enrich your knowledge with information from different sources.

  16. Joe D 26 July 2020 / 5:49 pm

    I’m pretty confused at the moment as I’ve a 3C1R who recently took a DNA test and we share 230.1cM

  17. Dan Haas 28 July 2020 / 11:39 am

    This is very obviously great information and thank you so much for your efforts…one curveball occurs however when a situation like mine comes up where a couple of first cousins (and their kids and grandkids) are children of my Mom’s identical twin…the chart numbers don’t apply directly…

  18. Faith Burrell 13 August 2020 / 7:20 pm

    I did a dna test on my brother and received results from family tree dna and we matched at 2035cm. From the chart does this not mean that we could be half or full sibling as we fall into both categories in terms of min and max ranges? Maybe even 3/4 siblings.

  19. Cordero Thomas 28 September 2020 / 9:04 pm

    I have a connection in ancestry dna where I have 1827 cms shared over 32 segments. Is this person more likely my Half sister or Niece.

Leave a Reply

Your email address will not be published. Required fields are marked *