NOTE: This is for the (now outdated) June 2016 Update!
[EDIT: PDF edited on 31 July 2016 to correct the averages for 1C and 1C1R (hat tip to Andrew Millard, thank you!)]
The Shared cM Project is a collaborative data collection and analysis project created to understand the ranges of shared centimorgans associated with various known relationships. As of June 2016, total shared cM data for more than 10,000 known relationships has been provided.
This is the first update to the original data, released in May 2015. In this update there are more than 4,000 new entries. Additionally, the data for each relationship has been analyzed statistically to remove extreme outliers and produce a histogram to show the distribution.
As always, an enormous THANK YOU to everyone that took the time to provide the data for this project. This is YOUR data!
There are several known issues with the Shared cM Project data, although this update has significantly minimized and/or eliminated these issues:
- Data entry errors – some of the information entered by participants is clearly affected by data entry errors (for example, a longest segment begin greater than the total shared cM). When these errors could be definitively determined, they were removed.
- Incorrect relationships (known or unknown) – some relationships were almost certainly entered incorrectly, which might be due to misunderstandings of “removed” relationships in genealogy. Other relationship errors were clearly due to misattributed parentage events resulting in the believed relationship being incorrect.
The greatest danger posed by these errors is that outliers would significantly affect the maximum and minimum of a range. However, with outliers removed and the histograms provided, these errors are unlikely to have a significant effect on the ranges in the project.
This update utilizes a total of 9,417 submissions. Although more than 10,000 people have submitted data, this update began a few months ago, and some data points were removed as outliers.
For the first time, the Shared cM Project data also it also contains data from relationships at AncestryDNA. The following graph, for example, shows the bump in submissions in November 2015 when AncestryDNA began sharing that data and I sent out a request for submissions:
An updated graphic is provided, which includes many more relationships, out to 8C (although be sure to note the low number of entries for many of the distant relationships).
This update provides histograms, or distributions of the data entries for each relationship. The histogram for each relationship shows more clearly how your relationship compares to everyone else that submitted data for that relationship.
The file with histograms and other information (including ranges for additional relationships not shown in the chart above) is here: Shared-cM-Project-Version-2-UPDATED (PDF) (Updated 1C and 1C1R averages on 7/31/2016).
Some of the histograms are truly incredible, showing a perfect distribution:
The following is a histogram containing the data for both siblings and half-siblings. Although the outlier-removed ranges do not overlap, the difference between the highest sharing for half-siblings and the lowest sharing for siblings is about 15 cM. So there is the potential for overlap in that bin, according to the data:
For ALL the histograms and ranges, be sure to download the full file: Shared-cM-Project-Version-2-UPDATED (PDF).